[SHR] illume predictive keyboard is too slow

Helge Hafting helge.hafting at hist.no
Fri Jan 30 14:43:39 CET 2009


Carsten Haitzler (The Rasterman) wrote:
> On Thu, 29 Jan 2009 14:32:48 +0100 Helge Hafting <helge.hafting at hist.no> said:

>> I hope things like this will be possible, if a new dictionary format is 
>> realized. It is ok if typing "for" suggests "fôr" as an alternative, but 
>> "før" should not come up unless the user types "f" "ø" "r". In which 
>> case "o" must not be suggested...
> 
> ok - how do you romanise norwegian then? example. in german ö -> oe, ü -> ue,
> ß -> ss, etc. - there is a set of romanisation rules that can convert any such
> char to 1 or more roman letters. i was hoping to be even more lenient with ö ->
> o being valid too for the lazy :) japanese has romanisation rules - so does
> chinese... norwegian must (eg æ -> ae for example).
> 
Usually, one doesn't romanize Norwegian. There are some rules: æ->ae, 
ø->oe, å->aa.  They are next to useless, because ae and oe occur 
naturally in many words where æ or ø does not belong, and these double 
vowels are pronounced differently as well. A Norwegian seeing "oe" in a 
word may be able to figure out if this means "ø" or if it really is 
supposed to be "oe", but this may need a context of several words. And 
it looks funny/wrong - similar to how it looks silly transcribing "x" as 
"ks" and write "ksylophone".

You might want to transcribe "x" that way in an emergency, if your "x" 
key breaks, until you get a new keyboard. You probably don't want to 
throw away the "x" to save space on a keyboard though. And norwegian 
transcriptions aren't used for the same reasons. I have only seen two 
cases of such traqnscription:
1. Names of norwegian athletes in international sports events.
    Which looks real silly. And completely unnecessary. Sport computer
    systems these days handle more than a-z, the names are spelled
    correctly in national events after all.
    And it is not as if foreigners get big
    problems with an "ø". If they don't know what the slash is for,
    they can read it as "o", and so on. Similiar to how I read
    french - I have no idea what the difference between à and á is.
    Both is "a" to me.
2. Expert computer users sometimes use the transcriptions, because
    they often use the latest equipment before keyboards gets fixed
    and before ascii-only limitations are sorted out. Some of them
    are tired of fighting and give up. And they have actually heard
    about the concept of "transcription"! But mainstream users get
    equipment with proper keyboards, anything less is an unfinished
    product. You won't find an ascii keyboard in a norwegian shop.

> if something can be romanised - it can have a romanised match in a dictionary
> and thus suggest the appropriate matches. of course now the dictionary
> determines these rules implicitly by content, not by code specifically
> enforcing such rules. :)
> 
> but yes - selecting dictionary is needed so selecting a keyboard for that
> language as well as dictionary is useful. it still adds a few keys - thus
> squashing the keyboard some more :( i was hoping to avoid that.

English can work with 10 keys in a row, norwegian needs 11. :-)
The solution then is different keyboards, those who don't need more 
should not need to suffer the slightly smaller keys.

> note - the keyboard is by no means limited to ascii at all - it's perfectly
> able to have accented/other keys added to layouts - so i'm considering this
> problem "solved" as its simply a matter of everyone agreeing to make a .kbd for
> their language - should they need one other than the default qwerty (ascii)
> one. so from this point of view - that's solved. what isn't done yet is:

Excellent!
So if I have a wordlist and make a keyboard, then a dictionary can be 
synthesized so there will be no unnecessary confusion between o and ø, 
because both letters exists as keys?

> 1. a kbd being able to hint at wanting a specific dictionary language (or
> vice-versa).
For packaging, put the wordlist and keyboard layout in the same package. 
And switch both when swithcing keyboards. I guess several languages will 
have the same layout. This can be solved elegantly with hard links. Or a 
machanism where keyboards either uses stdandard ascii, or a language 
specific layout.

> 2. dictionary itself being able to hint to have a specific kbd layout.
> 3. applications not being able to hint for a specific language for input (and
> thus dictionary and/or kbd).
>
I believe we use the same apps, regardless of language? So an app should 
simply ask for numeric/alphabetic/terminal, and then the system provides 
the system default alpha kayboard. This could be english, norwegian, 
german, ... depending on a system setting.  Multilingual persons can 
have one default keyboard and explicitly select another when needed.

It'd be nice if one could have the option of setting a terminal keyboard 
as the default alphabetic keyboard too - some people don't like 
guesswork because the wordlist is never truly complete - or maybe there 
is no list for their language yet. Of course they then have to struggle 
with stylus and small keys instead.

> so there needs to be a tie-in between language, dict and kbd - which one drives
> what... is the question. it needs to not BREAK things like terminal kbd etc. -
> ie i can stay with norwegian ad my language but if i select the terminal kbd -
> it will stay there and not suddenly flip back to the simple kbd layout.
> number/symbol entry similarly. this bit of things is currently undefined and
> unimplemented.
> 
> the other is improved dictionary format. the problem is - if we go make the
> dict smarter... how on earth do you GENERATE such a dictionary. i sure as hell
> am not hand-writing a whole dictionary... and i doubt anyone here will - it
> could be a large community effort to build a full one for each language - but
> that will take time. you need to enter all words, all matches, conjugations,
> and then frequency info too. the simple dict english can use is much easier -
> it can be auto-generated from input text. just throw a (text version) of a book
> - or newspaper or documentation - it can just index every word it finds and
> even count frequency usage. thats easy to automate the production of such a
> dict (and that is why the dict is as it is now - sheer simplicity).

There are plenty of open-source dictionaries in existence. There are 
norwegian dictionaries used for word processor spell-checking, and they 
tend to have all the conjugations already. Frequencies can be found by 
processing some large text, such as the entire body of wikipedia 
articles in that language. Finally, dictionary creation might need to 
know transcription rules for the language. (So as to avoid run-time 
conversions) For norwegian, it'd be "treat  æøå" as base letters just 
like a-z, and use the standard ascii mappings for all other non-ascii 
letters.

If you describe a (largely automated) procedure, then I guess people 
will jump in and do the job for various languages. Just as they already 
have made some keyboard layouts. :-)

Will the wordlists be adaptive? Simpler phones with T9 have this. There 
are always something not in the list - such as the cat's name. It may 
still get used a lot, it is then nice that the T9 system only trips up 
the first time. From then on, the phone know the new word and offer it 
as an alternative.

Helge Hafting




More information about the community mailing list