Illume keyboard dictionary sorting and normalization
Carsten Haitzler (The Rasterman)
raster at rasterman.com
Tue Jan 6 11:57:48 CET 2009
On Tue, 6 Jan 2009 11:49:55 +0100 "Olof Sjobergh" <olofsj at gmail.com> babbled:
> Hi,
>
> I'm working on a Swedish dictionary and keyboard for Illume, but I'm
> having some trouble with sorting of utf8 chars in the dictionary. I
> can't seem to get the sorting right. Looking at the code, Illume sorts
> the dictionary after first normalizing the strings according to the
> internal normalization table. Is there any way to reproduce this
> sorting with the sort command? I've tried with a few different locales
> (C, en_US.utf8) which all make the unix sort command work differently.
> But no matter what I try words don't show up correctly.
sort -f i think does it... i think...
> Another issue I found is that the built in normalization table is not
> very good for typing Swedish text. On a standard Swedish qwerty
> layout, we have three additional letters (å, ä and ö). These are used
> very frequently in Swedish and there are many common words that have
> different meanings if spellt with a, å or ä (for example har, här and
> hår are all very common words). But in Illume these are all normalized
> to a. Writing Swedish with a US qwerty layout and then having to
> select aåä manually after the dictionary lookup is a pain, since many
> common words will have to be selected from the lookup list each time.
>
> Instead, what you want is a Swedish qwerty layout (which is very
> simple to implement as a .kbd file), and not normalize åäö for the
> Swedish dictionary lookup. So the normalization table would really
> need to be configurable, either as a part of the dictionary or the
> .kbd file. I suppose this problem exists for other languages as well.
> If I were to work on such a change, what would be the best approach?
hmm interesting i was just going of german/french and portuguese on this where
i thought i could get away with simple normalisation and a basic qwerty layout
- with selecting the matches (Vogel/Vögel for example). making the table part
of the dictionary does make a lot of sense of course. the dict format does need
to change to make it a lot faster and intl-char friendly. i avoided this at the
time as i'd need to efficiently encode a b-tree in the file and be able to mmap
() it efficiently and use it.
--
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler) raster at rasterman.com
More information about the community
mailing list