Illume keyboard dictionary sorting and normalization

Carsten Haitzler (The Rasterman) raster at
Wed Jan 7 00:18:44 CET 2009

On Tue, 6 Jan 2009 17:28:30 +0100 "Olof Sjobergh" <olofsj at> babbled:

> On Tue, Jan 6, 2009 at 11:57 AM, The Rasterman Carsten Haitzler
> <raster at> wrote:
> > sort -f i think does it... i think...
> Thanks, that seems to work.
> I created a package and uploaded to
> for anyone who is interested. The
> source is hosted at
> > hmm interesting i was just going of german/french and portuguese on this
> > where i thought i could get away with simple normalisation and a basic
> > qwerty layout
> > - with selecting the matches (Vogel/Vögel for example). making the table
> > part of the dictionary does make a lot of sense of course. the dict format
> > does need to change to make it a lot faster and intl-char friendly. i
> > avoided this at the time as i'd need to efficiently encode a b-tree in the
> > file and be able to mmap () it efficiently and use it.
> I understand it would make the dictionary format more complicated.
> Maybe it could be split into 2 files, one with general configuration
> data such as a normalisation table, an icon etc, and then a raw
> dictionary file like there is now.

it could be - but there needs to be a redo of the dict format. i need to at
least add the following:

1. actual match sting and display string should be different. i.e.:
(german) vogel -> Vogel,Vögel
(japanese) sakana -> さかな,サカナ,魚,肴,茶菓な

yes japanese is a silly example as there is no way to "type" matches in kanji
(chinese chars) - you NEED an input method (this is where vkbd and xim etc.
need to tie in eventually).

2. something much faster to just mmap() and use dynamically. it currently is
mmaped with a small lookup table built for faster access - but it still need to
parse whole lines on the fly to do matching currently even tho it's mmaped. so
if the format changes - then it doesn't matter much.

so the above ability to map 1 input match to multiple possible outputs (that
could even be radically different) would negate the need for a mapping table :)

------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    raster at

More information about the community mailing list