New Rasterman Image...

Carsten Haitzler (The Rasterman) raster at rasterman.com
Thu Oct 2 05:46:03 CEST 2008


On Wed, 1 Oct 2008 21:05:53 -0600 "Ori Pessach" <opessach at gmail.com> babbled:

> I understand what it's doing. It's not doing it well. I tried it for shell

i disagree. it works like a charm for me - as per my previous mail - i can use
it while walking down the street. more than i can say for pretty much any other
virtual keyboard i have available to me.

> input, and it was an unusable mess. I tried it for text messaging, and it

why someone would use a language dictionary-based corrective keyboard for shell
input beats me! in this case i call "silly user - using a motorcycle to deliver
elephants" line :) use the terminal keyboard. use a stylus. thats what it was
meant for. :)

> was an unusable mess. It has no model of the likelihood of erroneous input

it does. it absolutely does. maybe your fingers are incredibly off-center? here
is the algorithm (and if u don't believe me - code is there to be read):

it stores a press POINT (x,y). it looks for all keys whose center point is
WITHIN f distance of x,y (f being the fuzz value - the .kbd file for the
qwerty Default keyboard is 135 units wide, with fuzz radius of 20, so that's
about 1/3rd of the keyboard that it searches through for a likely match).
likelihood factors (distance) per key found is allocated based on distance (0
== most likely, > 0 less likely the greater the value). each press is done this
way EXCEPT if u hold for 0.25 sec then drag to select a key explicitly in zoom
mode - then the ONLY key available for that word slot is that letter selected
given a distance of 0. as you type all permutations of letters are searched and
put into a list - with each permutation given a distance metric based on the
letters used (simply addition of the distances). now this is combined with the
dictionary's frequency metric (multiplied by an inverse) so the more likely the
word is to be used the lower its distance becomes. words are sorted from most
to least likely based on this metric then listed with most likely in the middle
of the list, leas likely to the left/right ends - which you may not see. the
vertical list lists all matches from most to least likely (top to bottom) with 1
exception - EXACTLY what u typed it as the top. it absolutely has a fairly good
idea of likelihood of error and likelihood of usage of a word etc. etc.

eg:

Press | Guess+dist
e       e+0 w+1 r+2 d+2 s+1
r       r+0 t+1 e+2 f+1 g+2 d+3
k       k+0 l+1 o+2 i+3 j+3
d       d+0 f+1 s+1 e+1 c+1 r+2 w+2

so "erkd" has distance 0 = but its not a word in the dictionary at all, so
thrown out. "rwkd" has distance 1, but not a word, "srkd" same, "etkd", "efkd",
"erld", etc. etc.

in the end it produces a list where most likely "world" ends up the word
with other options too - and this is a much simplified list. mostly the list
for candidate letters per input letter is about 10-12 letters. so u have
12*12*12*12 permutations for a 4 letter word - of which a fraction of that
space is legitimate words. each permutation has a likelihood value based on
press distance and on frequency of usage of that word in language in general in
the dictionary.

mind you - i AM talking about illume's keyboard, its algorithms as is in the
image i built. if you use something else i cannot comment as it's something
else.

> (relatively low) and instead appears to look for the word with the closest
> minimum edit distance to the user's input. This is nuts. I have never -

it's not - as the edit distance is the likelihood of error. you likely press
the key you want - or near it. thus keys near where you pressed are more likely
than those further away. to limit search distance only up to a certain distance
is searched. chances are that you do this:

fingerprint:
  ___
 /~~~\
 |~~~|
 |~~~|
  \x/
   "

where "x" is the pressure point reported on the touchscreen. the only info the
touchscreen reports is the pressure point - nothing else. you think u press
somewhere else, but don't. you know what u pressed bu what key "pops up" that
lets u know pretty well how good your pressing of the screen is. this is just a
hardware limit of a resistive touchscreen. the point of greatest pressure is
used - not the middle point of the area in which skin contacts the screen. get
the gpe-sketchbook and try press with the flat of your finger and see just of
far off your press point is. it may surprise you.

as i said - it does have all the model and code and even data to do proper
correction based on many factors. i do NOT have a dictionary with frequency
info for all of english - there is a "small" english dict (5000 words) with
some frequency info in it i managed to gather, but its very small.

if you don't believe me - read the code, or do better. patches accepted, but i
think the problem is just that the dictionary has no frequency info by default
(a matter of simple lack of data) or how you press the screen. i suggest you
pay close attention to how you type and see. yes the "black word" (in the black
box) may not be always the word u want - but its most often that word or a word
right next to it - as you use it it will learn. if you are using it for
non-english stuff then you need a different dictionary.

> literally - gotten the word I typed in. In the common use case, of a user
> who enters a correct word, it invariably get it wrong.
> 
> Understanding what it's doing doesn't make it less of a nuisance.

it does have concept of frequency orf words. i just dont have any DATA for
that. the dict format handles is:
word1
word2
word3

OR
word1 20
word2 434
word3 1

etc.
look at the personal dict file. ~/.e/e/dicts-dynamic/personal.dic

it saves usage frequency. this affects lookup likelihood. btw - for me it gets
the word most of the time or the word is not the most likely but at least
listed as one of the most likely. use it for a bit and it learns and gets
better. if you wish to generate a dictionary with frequency info - please do
so! i made it really easy.

-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    raster at rasterman.com





More information about the community mailing list