[SHR] illume predictive keyboard is too slow

Olof Sjobergh olofsj at gmail.com
Thu Jan 29 08:30:44 CET 2009


On Wed, Jan 28, 2009 at 11:16 PM, The Rasterman Carsten Haitzler
<raster at rasterman.com> wrote:
> On Wed, 28 Jan 2009 18:59:32 +0100 "Marco Trevisan (Treviño)" <mail at 3v1n0.net>
> said:
>
>> Olof Sjobergh wrote:
>> > Unless I missed something big (which I hope I didn't, but I wouldn't
>> > be surprised if I did), this is not fixable with the current
>> > dictionary lookup design. Raster talked about redesigning the
>> > dictionary format, so I guess we have to wait until he gets around to
>> > it (or someone else does it).
>>
>> I think that too. Maybe using something like a "trie" [1] to archive the
>> words could help (both for words matching and for compressing the
>> dictionary).
>> Too hard?
>>
>> [1] http://en.wikipedia.org/wiki/Trie
>
> the problem here comes with having multiple displays for a single match. let me
> take japanese as an example (i hope you have the fonts to see this at least -
> though there is no need to understand beyond knowing that there are a lot of
> matches that are visibly different):
>
> sakana ->
>  さかな 茶菓な 肴 魚 サカナ 坂な 差かな 左かな 査かな 鎖かな 鎖かな
>
> unlike simple decimation of é -> e and ë -> e and è -> e etc. you need 1 ascii
> input string matching one of MANY very different matches. the european case of
>
> vogel -> Vogel Vögel
>
> is a simplified version of the above. the reason i wanted "decimation to match
> a simple roman text (ascii) string is - that this is a pretty universal thing.
> thats how japanese, chinese and even some korean input methods work. it also
> works for european languages too. europeans are NOT used to the idea of a
> dictionary guessing/selecting system when they type - but the asians are. they
> are always typing and selecting. the smarts come with the dictionary system
> selecting the right one more often than not by default or the right selection
> you want being only 1 or 2 keystrokes away.
>
> i was hoping to be able to keep a SIMPLE ascii qwerty keyboard for as much as
> possible - so you can just type and it will work and offer the selections as
> it's trying to guess anyway - it can present the multiple accented versions
> too. this limits the need for special keyboards - doesn't obviate it, but
> allows more functionality out of the box. in the event users explicitly select
> an accented char - ie a non-ascii character, it should not "decimate". it
> should try match exactly that char.
>
> so if you add those keys and use them or flip to another key layout to select
> them - you get what you expect. but if i am to redo the dict - the api is very
> generic - just the internals and format need changing to be able to do the
> above. the cool bit is.. if i manage the above... it has almost solved asian
> languages too - and input methods... *IF* the vkbd is also able to talk to a
> complex input method (XIM/SCIM/UIM etc.) as keystroke faking wont let you type
> chinese characters... :) but in principle the dictionary and lookup scheme will
> work - its then just mechanics of sending the data to the app in a way it can
> use it.
>
> so back to the trie... the trie would only be useful for the ascii matching - i
> need something more complex. it just combines the data with the match tree
> (letters are inline). i need a match tree + lookup table to other matches to
> display - and possibly several match entries (all the matches to display also
> need to be in the tree pointing to a smaller match list).
>
> --
> ------------- Codito, ergo sum - "I code, therefore I am" --------------
> The Rasterman (Carsten Haitzler)    raster at rasterman.com

I think most problems could be solved by using a dictionary format
similar to what you describe above, i.e. something like:

match : candidate1 candidate2; frequency
for example:
vogel : Vogel Vögel; 123

That would mean you can search on the normalised word where simple
strcmp works fine and will be fast enough. To not make it too large
for example the following syntax could also be accepted:
eat; 512     // No candidates, just show the match as is
har här hår; 1234    // Also show the match itself as a candidate

If you think this would be good enough, I could try to implement it.

Another problem with languages like Swedish, and also Japanese, is the
heavy use of conjugation. For example, in Japanese the verbs 食べる and
考える can both be conjugated in the same way like this:
食べる 食べました 食べた 食べている 食べていた 食べています 食べていました
考える 考えました 考えた 考えている 考えていた 考えています 考えていました

Another example, the Swedish nouns:
bil bilen bilar bilarna bilens bilarnas

But including all these forms in a dictionary makes it very large,
which is impractical. So some way to indicate possible conjugations
would be good, but it would make the dictionary format a lot more
complex.

Best regards,

Olof Sjöbergh




More information about the community mailing list