[SHR] illume predictive keyboard is too slow
helge.hafting at hist.no
Fri Jan 30 14:43:39 CET 2009
Carsten Haitzler (The Rasterman) wrote:
> On Thu, 29 Jan 2009 14:32:48 +0100 Helge Hafting <helge.hafting at hist.no> said:
>> I hope things like this will be possible, if a new dictionary format is
>> realized. It is ok if typing "for" suggests "fôr" as an alternative, but
>> "før" should not come up unless the user types "f" "ø" "r". In which
>> case "o" must not be suggested...
> ok - how do you romanise norwegian then? example. in german ö -> oe, ü -> ue,
> ß -> ss, etc. - there is a set of romanisation rules that can convert any such
> char to 1 or more roman letters. i was hoping to be even more lenient with ö ->
> o being valid too for the lazy :) japanese has romanisation rules - so does
> chinese... norwegian must (eg æ -> ae for example).
Usually, one doesn't romanize Norwegian. There are some rules: æ->ae,
ø->oe, å->aa. They are next to useless, because ae and oe occur
naturally in many words where æ or ø does not belong, and these double
vowels are pronounced differently as well. A Norwegian seeing "oe" in a
word may be able to figure out if this means "ø" or if it really is
supposed to be "oe", but this may need a context of several words. And
it looks funny/wrong - similar to how it looks silly transcribing "x" as
"ks" and write "ksylophone".
You might want to transcribe "x" that way in an emergency, if your "x"
key breaks, until you get a new keyboard. You probably don't want to
throw away the "x" to save space on a keyboard though. And norwegian
transcriptions aren't used for the same reasons. I have only seen two
cases of such traqnscription:
1. Names of norwegian athletes in international sports events.
Which looks real silly. And completely unnecessary. Sport computer
systems these days handle more than a-z, the names are spelled
correctly in national events after all.
And it is not as if foreigners get big
problems with an "ø". If they don't know what the slash is for,
they can read it as "o", and so on. Similiar to how I read
french - I have no idea what the difference between à and á is.
Both is "a" to me.
2. Expert computer users sometimes use the transcriptions, because
they often use the latest equipment before keyboards gets fixed
and before ascii-only limitations are sorted out. Some of them
are tired of fighting and give up. And they have actually heard
about the concept of "transcription"! But mainstream users get
equipment with proper keyboards, anything less is an unfinished
product. You won't find an ascii keyboard in a norwegian shop.
> if something can be romanised - it can have a romanised match in a dictionary
> and thus suggest the appropriate matches. of course now the dictionary
> determines these rules implicitly by content, not by code specifically
> enforcing such rules. :)
> but yes - selecting dictionary is needed so selecting a keyboard for that
> language as well as dictionary is useful. it still adds a few keys - thus
> squashing the keyboard some more :( i was hoping to avoid that.
English can work with 10 keys in a row, norwegian needs 11. :-)
The solution then is different keyboards, those who don't need more
should not need to suffer the slightly smaller keys.
> note - the keyboard is by no means limited to ascii at all - it's perfectly
> able to have accented/other keys added to layouts - so i'm considering this
> problem "solved" as its simply a matter of everyone agreeing to make a .kbd for
> their language - should they need one other than the default qwerty (ascii)
> one. so from this point of view - that's solved. what isn't done yet is:
So if I have a wordlist and make a keyboard, then a dictionary can be
synthesized so there will be no unnecessary confusion between o and ø,
because both letters exists as keys?
> 1. a kbd being able to hint at wanting a specific dictionary language (or
For packaging, put the wordlist and keyboard layout in the same package.
And switch both when swithcing keyboards. I guess several languages will
have the same layout. This can be solved elegantly with hard links. Or a
machanism where keyboards either uses stdandard ascii, or a language
> 2. dictionary itself being able to hint to have a specific kbd layout.
> 3. applications not being able to hint for a specific language for input (and
> thus dictionary and/or kbd).
I believe we use the same apps, regardless of language? So an app should
simply ask for numeric/alphabetic/terminal, and then the system provides
the system default alpha kayboard. This could be english, norwegian,
german, ... depending on a system setting. Multilingual persons can
have one default keyboard and explicitly select another when needed.
It'd be nice if one could have the option of setting a terminal keyboard
as the default alphabetic keyboard too - some people don't like
guesswork because the wordlist is never truly complete - or maybe there
is no list for their language yet. Of course they then have to struggle
with stylus and small keys instead.
> so there needs to be a tie-in between language, dict and kbd - which one drives
> what... is the question. it needs to not BREAK things like terminal kbd etc. -
> ie i can stay with norwegian ad my language but if i select the terminal kbd -
> it will stay there and not suddenly flip back to the simple kbd layout.
> number/symbol entry similarly. this bit of things is currently undefined and
> the other is improved dictionary format. the problem is - if we go make the
> dict smarter... how on earth do you GENERATE such a dictionary. i sure as hell
> am not hand-writing a whole dictionary... and i doubt anyone here will - it
> could be a large community effort to build a full one for each language - but
> that will take time. you need to enter all words, all matches, conjugations,
> and then frequency info too. the simple dict english can use is much easier -
> it can be auto-generated from input text. just throw a (text version) of a book
> - or newspaper or documentation - it can just index every word it finds and
> even count frequency usage. thats easy to automate the production of such a
> dict (and that is why the dict is as it is now - sheer simplicity).
There are plenty of open-source dictionaries in existence. There are
norwegian dictionaries used for word processor spell-checking, and they
tend to have all the conjugations already. Frequencies can be found by
processing some large text, such as the entire body of wikipedia
articles in that language. Finally, dictionary creation might need to
know transcription rules for the language. (So as to avoid run-time
conversions) For norwegian, it'd be "treat æøå" as base letters just
like a-z, and use the standard ascii mappings for all other non-ascii
If you describe a (largely automated) procedure, then I guess people
will jump in and do the job for various languages. Just as they already
have made some keyboard layouts. :-)
Will the wordlists be adaptive? Simpler phones with T9 have this. There
are always something not in the list - such as the cat's name. It may
still get used a lot, it is then nice that the T9 system only trips up
the first time. From then on, the phone know the new word and offer it
as an alternative.
More information about the community