[SHR] illume predictive keyboard is too slow

Carsten Haitzler (The Rasterman) raster at rasterman.com
Sat Jan 31 23:31:09 CET 2009


On Fri, 30 Jan 2009 21:16:57 +0100 Olof Sjobergh <olofsj at gmail.com> said:

> On Fri, Jan 30, 2009 at 8:12 PM, The Rasterman Carsten Haitzler
> <raster at rasterman.com> wrote:
> > On Fri, 30 Jan 2009 08:31:43 +0100 Olof Sjobergh <olofsj at gmail.com> said:
> >> But I think a dictionary format in plain utf8 that includes the
> >> normalised words as well as any candidates to display would be the
> >> best way. Then the dictionary itself could choose which characters to
> >> normalise and which to leave as is. So for Swedish, you can leave å, ä
> >> and ö as they are but normalise é, à etc. Searching would be as simple
> >> as in your original implementation (no need to convert from multibyte
> >> format).
> >
> > the problem is - the dict in utf8 means searching is slow as you do it in
> > utf8 space. the dict is mmaped() to save ram - if it wasnt it'd need to be
> > allocated in non-swappable ram (its a phone - it has no swap) and thus a
> > few mb of your ram goes into the kbd dict at all times. by using mmap you
> > leave it to the kernels paging system to figure it out.
> >
> > so as such a dict change will mean a non-ascii format in future for this
> > reason. but there will then need to be a tool to generate such a file.
> 
> Searching in utf8 doesn't mean it has to be slow. Simple strcmp works
> fine on multibyte utf8 strings as well, and should be as fast as the
> dictionary was before adding multibyte to widechars conversions. But
> if you have some other idea in mind, please don't let me disturb. =)

the problem is - it INSt a simple key>value lookup. it's a possible-match tree
build on-the-fly. that means you jump about examining 1 character at a time.
the problem here is that 1 char may or may not be 1 byte or more and that makes
it really nasty. if it were a simple key lookup for a given simple string -
life would be easy. this is possible - but then u'd have to generate ALL
permutations first then look ALL of them up. if you weed out permutations AS
you look them up you can weed out something like 90% of the permutations as you
KNOw there are no words starting with qz... so as you go through qa... qs....
qx... qz... you can easily stop all the combinations with qs, qz ans qx as no
words begin with that (if you have an 8 letter word with 8 possible letters per
character in the word thats 8^6 lookups you avoided (in the case above - ie all
permutations of the other 6 letters). thats 262144 lookups avoided... just
there. for... 1 of the above impossible permutation trees. now add it up over
all of them.


-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    raster at rasterman.com





More information about the community mailing list