[SHR] illume predictive keyboard is too slow

Olof Sjobergh olofsj at gmail.com
Wed Jan 28 14:39:42 CET 2009


On Wed, Jan 28, 2009 at 2:05 PM, Helge Hafting <helge.hafting at hist.no> wrote:
> The obvious fix is to store the dictionary in such a format that
> conversions won't be necessary. Not sure why utf16 is being used,
> utf8 is more compact and  works so well for everything else in linux.

Yes, the obvious fix is to change the dictionary format. However, it's
not as simple as you might think.

The dictionary today is stored in utf8, not utf16. But the dictionary
lookup tries to match words not exactly the same as the input word,
for example e should also match é, è and ë. To do this, every
character in the input string, and every character of each word, has
to be "normalised" to ascii. Since in utf8 a single character can take
up multiple bytes, to normalise a word it's first converted to utf16
where all characters are the same size, and then a simple lookup table
can be used for each character. But converting from multibyte format
each time a string is compared to another adds overhead.

With a different dictionary format where all words are stored already
normalised, there would be no need for all the conversions. But then
you also have to store all possible conversions for each word, so the
format would be more complicated.

Best regards,

Olof Sjobergh




More information about the community mailing list