New Rasterman Image...

Steve Mosher steve at openmoko.com
Thu Oct 2 09:59:20 CEST 2008


Hey raster, How's it going.

I promised you some frequency data a while back.
http://ucrel.lancs.ac.uk/bncfreq/flists.html
http://ucrel.lancs.ac.uk/bncfreq/lists/1_2_all_freq.txt

there are others as well

Carsten Haitzler (The Rasterman) wrote:
> On Wed, 1 Oct 2008 21:05:53 -0600 "Ori Pessach" <opessach at gmail.com> babbled:
> 
>> I understand what it's doing. It's not doing it well. I tried it for shell
> 
> i disagree. it works like a charm for me - as per my previous mail - i can use
> it while walking down the street. more than i can say for pretty much any other
> virtual keyboard i have available to me.
> 
>> input, and it was an unusable mess. I tried it for text messaging, and it
> 
> why someone would use a language dictionary-based corrective keyboard for shell
> input beats me! in this case i call "silly user - using a motorcycle to deliver
> elephants" line :) use the terminal keyboard. use a stylus. thats what it was
> meant for. :)
> 
>> was an unusable mess. It has no model of the likelihood of erroneous input
> 
> it does. it absolutely does. maybe your fingers are incredibly off-center? here
> is the algorithm (and if u don't believe me - code is there to be read):
> 
> it stores a press POINT (x,y). it looks for all keys whose center point is
> WITHIN f distance of x,y (f being the fuzz value - the .kbd file for the
> qwerty Default keyboard is 135 units wide, with fuzz radius of 20, so that's
> about 1/3rd of the keyboard that it searches through for a likely match).
> likelihood factors (distance) per key found is allocated based on distance (0
> == most likely, > 0 less likely the greater the value). each press is done this
> way EXCEPT if u hold for 0.25 sec then drag to select a key explicitly in zoom
> mode - then the ONLY key available for that word slot is that letter selected
> given a distance of 0. as you type all permutations of letters are searched and
> put into a list - with each permutation given a distance metric based on the
> letters used (simply addition of the distances). now this is combined with the
> dictionary's frequency metric (multiplied by an inverse) so the more likely the
> word is to be used the lower its distance becomes. words are sorted from most
> to least likely based on this metric then listed with most likely in the middle
> of the list, leas likely to the left/right ends - which you may not see. the
> vertical list lists all matches from most to least likely (top to bottom) with 1
> exception - EXACTLY what u typed it as the top. it absolutely has a fairly good
> idea of likelihood of error and likelihood of usage of a word etc. etc.
> 
> eg:
> 
> Press | Guess+dist
> e       e+0 w+1 r+2 d+2 s+1
> r       r+0 t+1 e+2 f+1 g+2 d+3
> k       k+0 l+1 o+2 i+3 j+3
> d       d+0 f+1 s+1 e+1 c+1 r+2 w+2
> 
> so "erkd" has distance 0 = but its not a word in the dictionary at all, so
> thrown out. "rwkd" has distance 1, but not a word, "srkd" same, "etkd", "efkd",
> "erld", etc. etc.
> 
> in the end it produces a list where most likely "world" ends up the word
> with other options too - and this is a much simplified list. mostly the list
> for candidate letters per input letter is about 10-12 letters. so u have
> 12*12*12*12 permutations for a 4 letter word - of which a fraction of that
> space is legitimate words. each permutation has a likelihood value based on
> press distance and on frequency of usage of that word in language in general in
> the dictionary.
> 
> mind you - i AM talking about illume's keyboard, its algorithms as is in the
> image i built. if you use something else i cannot comment as it's something
> else.
> 
>> (relatively low) and instead appears to look for the word with the closest
>> minimum edit distance to the user's input. This is nuts. I have never -
> 
> it's not - as the edit distance is the likelihood of error. you likely press
> the key you want - or near it. thus keys near where you pressed are more likely
> than those further away. to limit search distance only up to a certain distance
> is searched. chances are that you do this:
> 
> fingerprint:
>   ___
>  /~~~\
>  |~~~|
>  |~~~|
>   \x/
>    "
> 
> where "x" is the pressure point reported on the touchscreen. the only info the
> touchscreen reports is the pressure point - nothing else. you think u press
> somewhere else, but don't. you know what u pressed bu what key "pops up" that
> lets u know pretty well how good your pressing of the screen is. this is just a
> hardware limit of a resistive touchscreen. the point of greatest pressure is
> used - not the middle point of the area in which skin contacts the screen. get
> the gpe-sketchbook and try press with the flat of your finger and see just of
> far off your press point is. it may surprise you.
> 
> as i said - it does have all the model and code and even data to do proper
> correction based on many factors. i do NOT have a dictionary with frequency
> info for all of english - there is a "small" english dict (5000 words) with
> some frequency info in it i managed to gather, but its very small.
> 
> if you don't believe me - read the code, or do better. patches accepted, but i
> think the problem is just that the dictionary has no frequency info by default
> (a matter of simple lack of data) or how you press the screen. i suggest you
> pay close attention to how you type and see. yes the "black word" (in the black
> box) may not be always the word u want - but its most often that word or a word
> right next to it - as you use it it will learn. if you are using it for
> non-english stuff then you need a different dictionary.
> 
>> literally - gotten the word I typed in. In the common use case, of a user
>> who enters a correct word, it invariably get it wrong.
>>
>> Understanding what it's doing doesn't make it less of a nuisance.
> 
> it does have concept of frequency orf words. i just dont have any DATA for
> that. the dict format handles is:
> word1
> word2
> word3
> 
> OR
> word1 20
> word2 434
> word3 1
> 
> etc.
> look at the personal dict file. ~/.e/e/dicts-dynamic/personal.dic
> 
> it saves usage frequency. this affects lookup likelihood. btw - for me it gets
> the word most of the time or the word is not the most likely but at least
> listed as one of the most likely. use it for a bit and it learns and gets
> better. if you wish to generate a dictionary with frequency info - please do
> so! i made it really easy.
> 




More information about the community mailing list