[SHR] illume predictive keyboard is too slow

Carsten Haitzler (The Rasterman) raster at rasterman.com
Fri Jan 30 20:10:04 CET 2009


On Fri, 30 Jan 2009 14:43:39 +0100 Helge Hafting <helge.hafting at hist.no> said:

> Carsten Haitzler (The Rasterman) wrote:
> > On Thu, 29 Jan 2009 14:32:48 +0100 Helge Hafting <helge.hafting at hist.no>
> > said:
> 
> >> I hope things like this will be possible, if a new dictionary format is 
> >> realized. It is ok if typing "for" suggests "fôr" as an alternative, but 
> >> "før" should not come up unless the user types "f" "ø" "r". In which 
> >> case "o" must not be suggested...
> > 
> > ok - how do you romanise norwegian then? example. in german ö -> oe, ü ->
> > ue, ß -> ss, etc. - there is a set of romanisation rules that can convert
> > any such char to 1 or more roman letters. i was hoping to be even more
> > lenient with ö -> o being valid too for the lazy :) japanese has
> > romanisation rules - so does chinese... norwegian must (eg æ -> ae for
> > example).
> > 
> Usually, one doesn't romanize Norwegian. There are some rules: æ->ae, 
> ø->oe, å->aa.  They are next to useless, because ae and oe occur 
> naturally in many words where æ or ø does not belong, and these double 
> vowels are pronounced differently as well. A Norwegian seeing "oe" in a 
> word may be able to figure out if this means "ø" or if it really is 
> supposed to be "oe", but this may need a context of several words. And 
> it looks funny/wrong - similar to how it looks silly transcribing "x" as 
> "ks" and write "ksylophone".

oh thats not bad! then it's just like english! (you get used to the vague
insanity of it all sooner or later!) :)
but seriously - if your name is nønæn, and you move to japan, and have to fill
out a form for your bank account name - they will see the ø and æ and go "ummm.
we can't do that - can you please use normal roman text?" because they will
either accept roman (a-z) OR japanese (hiragana/katakana/kanji). strange
accented european chars aren't going to work. :) so i guess i'm asking because
sooner or later when filling out an immigration form or something in another
country - you will need to drop such chars into roman text somehow (that ugly
nasty lowest common denominator thing - i know), and so i was curious... how
you solve that - as that then presents a set of solutions/rules that can be
applied. :) again - not saying to get rid of the ø's of this world. already
supported.but just wondering, how we can work when they are not there/used. :)

> You might want to transcribe "x" that way in an emergency, if your "x" 
> key breaks, until you get a new keyboard. You probably don't want to 
> throw away the "x" to save space on a keyboard though. And norwegian 
> transcriptions aren't used for the same reasons. I have only seen two 
> cases of such traqnscription:
> 1. Names of norwegian athletes in international sports events.
>     Which looks real silly. And completely unnecessary. Sport computer
>     systems these days handle more than a-z, the names are spelled
>     correctly in national events after all.
>     And it is not as if foreigners get big
>     problems with an "ø". If they don't know what the slash is for,
>     they can read it as "o", and so on. Similiar to how I read
>     french - I have no idea what the difference between à and á is.
>     Both is "a" to me.

just like my example above - but i guess i was being stricter. the stodgey old
banking system isn't going to go adapt like modern sports data systenms. its
"go roman - or go home". :)

> 2. Expert computer users sometimes use the transcriptions, because
>     they often use the latest equipment before keyboards gets fixed
>     and before ascii-only limitations are sorted out. Some of them
>     are tired of fighting and give up. And they have actually heard
>     about the concept of "transcription"! But mainstream users get
>     equipment with proper keyboards, anything less is an unfinished
>     product. You won't find an ascii keyboard in a norwegian shop.

hmm. how interesting. i have always been baffled why there is a UK qwerty
layout vs US - thre UK is the only place that uses it... all other english
speaking countries i know use US qwerty (and if UK qwerty was nicely killed
off.. it wouldn't need to be US qwerty - just qwerty) :)

ok - but there is a way to do this. when stuck on your friends pc when visiting
them in california, and they dont have compose-modes enabled... how do you type
æ and ø etc. that was basically the q - there must be some accepted mechanism
for decimation/conversion. seemingly it's the obvious: æ -> ae, ø -> o etc.
:)

> > if something can be romanised - it can have a romanised match in a
> > dictionary and thus suggest the appropriate matches. of course now the
> > dictionary determines these rules implicitly by content, not by code
> > specifically enforcing such rules. :)
> > 
> > but yes - selecting dictionary is needed so selecting a keyboard for that
> > language as well as dictionary is useful. it still adds a few keys - thus
> > squashing the keyboard some more :( i was hoping to avoid that.
> 
> English can work with 10 keys in a row, norwegian needs 11. :-)
> The solution then is different keyboards, those who don't need more 
> should not need to suffer the slightly smaller keys.

sure - not a problem! :)

> > note - the keyboard is by no means limited to ascii at all - it's perfectly
> > able to have accented/other keys added to layouts - so i'm considering this
> > problem "solved" as its simply a matter of everyone agreeing to make a .kbd
> > for their language - should they need one other than the default qwerty
> > (ascii) one. so from this point of view - that's solved. what isn't done
> > yet is:
> 
> Excellent!
> So if I have a wordlist and make a keyboard, then a dictionary can be 
> synthesized so there will be no unnecessary confusion between o and ø, 
> because both letters exists as keys?

correct. as long as the dict matching doesnt drop extra info - ie normalize o
-> ø. currently it does.  but the rest o the code doesn't. it's just the dict
matching engine - which as we have been discussing... needs work. :)

> > 1. a kbd being able to hint at wanting a specific dictionary language (or
> > vice-versa).
> For packaging, put the wordlist and keyboard layout in the same package. 
> And switch both when swithcing keyboards. I guess several languages will 
> have the same layout. This can be solved elegantly with hard links. Or a 
> machanism where keyboards either uses stdandard ascii, or a language 
> specific layout.

no need for hardlinks. soft will do just fine :). maybe i need to drive it from
the .kbd. kbd's will be named per language for "text entry" and then others
like Terminal.kbd have "no language" thus being neutral. if chosen they stay
(Numbers.kbd too etc.). as kbd's already have an icon... replacing it with a
little flag would work nicely. so just ned the .kbd to be able to also say "hey
- change dict to X". easy to add that.

> > 2. dictionary itself being able to hint to have a specific kbd layout.
> > 3. applications not being able to hint for a specific language for input
> > (and thus dictionary and/or kbd).
> >
> I believe we use the same apps, regardless of language? So an app should 
> simply ask for numeric/alphabetic/terminal, and then the system provides 
> the system default alpha kayboard. This could be english, norwegian, 
> german, ... depending on a system setting.  Multilingual persons can 
> have one default keyboard and explicitly select another when needed.

aaah here with app.. i was thinking - the app may be in a different lang/local
to another (one in german, one in french lets say) and thus the app will
probably say "hey - german please!" - it is technically possible. this also
opens up the way for "fake" langauges like "cmdline" that have a dictionary
of... unix commands. or "31337" that has a dictionary of k3wl w0RdZ0rZ...
depending on some specialised use case. if the app does NOT specify language -
assume the current set lang. :)

> It'd be nice if one could have the option of setting a terminal keyboard 
> as the default alphabetic keyboard too - some people don't like 
> guesswork because the wordlist is never truly complete - or maybe there 
> is no list for their language yet. Of course they then have to struggle 
> with stylus and small keys instead.

This is something I haven't done yet - set a "default" kbd. its simple to do,
but just not done yet :)

> > so there needs to be a tie-in between language, dict and kbd - which one
> > drives what... is the question. it needs to not BREAK things like terminal
> > kbd etc. - ie i can stay with norwegian ad my language but if i select the
> > terminal kbd - it will stay there and not suddenly flip back to the simple
> > kbd layout. number/symbol entry similarly. this bit of things is currently
> > undefined and unimplemented.
> > 
> > the other is improved dictionary format. the problem is - if we go make the
> > dict smarter... how on earth do you GENERATE such a dictionary. i sure as
> > hell am not hand-writing a whole dictionary... and i doubt anyone here will
> > - it could be a large community effort to build a full one for each
> > language - but that will take time. you need to enter all words, all
> > matches, conjugations, and then frequency info too. the simple dict english
> > can use is much easier - it can be auto-generated from input text. just
> > throw a (text version) of a book
> > - or newspaper or documentation - it can just index every word it finds and
> > even count frequency usage. thats easy to automate the production of such a
> > dict (and that is why the dict is as it is now - sheer simplicity).
> 
> There are plenty of open-source dictionaries in existence. There are 
> norwegian dictionaries used for word processor spell-checking, and they 
> tend to have all the conjugations already. Frequencies can be found by 
> processing some large text, such as the entire body of wikipedia 
> articles in that language. Finally, dictionary creation might need to 
> know transcription rules for the language. (So as to avoid run-time 
> conversions) For norwegian, it'd be "treat  æøå" as base letters just 
> like a-z, and use the standard ascii mappings for all other non-ascii 
> letters.
> 
> If you describe a (largely automated) procedure, then I guess people 
> will jump in and do the job for various languages. Just as they already 
> have made some keyboard layouts. :-)

defining the transcription rules - sure. i think this is something i need to
have in the dicts now. you're right. define specific transcription rules for a
given dict (can just be empty if you dont want them). thus i need a header and
then content.

> Will the wordlists be adaptive? Simpler phones with T9 have this. There 
> are always something not in the list - such as the cat's name. It may 
> still get used a lot, it is then nice that the T9 system only trips up 
> the first time. From then on, the phone know the new word and offer it 
> as an alternative.

they already are. everything you type is put into your personal dictionary file
as a line item - along with frequency of use (the first time it goes in it
inherits the system dict frequency count... + 1 for your use). this acts as an
overlay on the system dict, so the words in the personal one either provide a
new frequency count or add new words (if they never were in the system dict) so
yes - it's adaptive already. that's why i keep saying "if it isn't too good for
you - use it for a bit. it will get better fast as it learns your personal
language patterns". it will learn. look in ~/.e/e/dicts-dynamic - there's a
personal.dic that gets written out and maintained :)

-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    raster at rasterman.com





More information about the community mailing list