[SHR] illume predictive keyboard is too slow

Carsten Haitzler (The Rasterman) raster at rasterman.com
Thu Feb 5 02:10:57 CET 2009


On Wed, 4 Feb 2009 16:37:56 +0100 Laszlo KREKACS
<laszlo.krekacs.list at gmail.com> said:

> Hi!
> 
> > ok - so if a young person typed:
> > Öt szép szűz
> > it'd be:
> > Ot szep szuz
> 
> ((btw, the meaning of "Öt szép szűz lány őrült írót nyúz" is
> "Five virgins tire a crazy writer".
> It is the hungarian synonym of "The quick brown fox jumps over the lazy dog"))
> 
> 
> Yes, and in that specific case works.
> (because none of the above words (Ot, szep, szuz) has a meaning in
> hungarian language, so you can understand that example without
> accent.)
> 
> But there are other cases, where it is not that clear:
> ólt - pound (accusative)
> ölt - he killed ...
> olt - to graft

sure.. maybe being an english speaker.. this doesn't bother me so much as
english is full of such words... 1 word can have 2 or 3 or even more very
different meanings. written the same way. only context lets you figure it out.
so to me i go "so.. what's the problem?" :)

> So when you see "olt" in the text you cant be sure it is "olt", "ólt"
> or "ölt" without analysing the whole sentence.
> 
> The german example is two-way conversion: ü - ue, ß - ss. You can
> switch back and for
> without losing additional information.

yup. as i speak german i have been using it as an example :)

> >> A simple word based dictionary is limited anyway for the hungarian
> >> language, where you can create a word as long as this:
> >> "elkelkáposztástalaníthatatlanságoskodásaitokért".
> >
> > ugh. so its like german. compound words get created a lot by just stringing
> > multiple words together without a space. that's ok- as long as there arent a
> > massive set of them... :)
> >
> 
> But there are. Because this language is "agglutinative".
> I explain a bit the difficulty.
> 
> In german you can create the following word:
> wood [en] - Holz [de] - fa [hu]
> house [en] - Haus [de] - ház [hu]
> 
> wood house [en] - Holzhaus [de] - faház [hu]
> 
> So you glued together house and wood in one word.
> (this is your example: stringing together without space)
> 
> In german you can even create words of one verb plus a modifier, like:
> to work [en], arbeiten [de], dolgoz [hu]
> to ply [en], bearbeiten (be+arbeiten) [de], megdolgoz (meg+dolgoz) [hu]
> 
> It is the same process;) There are many example of this:
> to link together[en], anschliessen (an+schliessen) [de] - összekapcsol
> (össze+kapcsol) [hu],
> to buy up [en], aufkaufen (auf+kaufen) - felvásárol (fel+vásárol) [hu]
> 
> But in hungarian language, we glue together everything, some example:
> in house [en], im Haus [de], házban (ház+ban) [hu]
> car [en], Wagen [de], kocsi [hu]
> our car [en], unseren Wagen (unser+en Wagen) [de], 
> (kocsi+(u/ü)nk+(a/á/e/é)t) [hu]
> 
> So the possibilities are nearly infinite.
> Without analysing the sentence and the word, you cant find the root
> word with correct accent.

oh dear. so you basically take the idea and run with it. nuts! like asian
langs... they dont even know what space is! :) (by asian i mean korean,
chinese, japanese).

> And finding the root word requires a spell checker (the best available
> is hunspell for the hungarian language)
> 
> Summary:
> - Losing the accents (in hungarian) most of the time results in contradiction.
> - Need a spell checker to suggesting the right accented word.
> (see: http://hunspell.sourceforge.net/)
> 
> So creating an architecture for spell checker is not a bad idea (for
> future extensibility).
> It could be handy for english too. But for other language (ex:
> hungarian) maybe essential.

originally i wanted to actually use aspell to do this... for the vkbd... but its
api just didnt cut it. i was wanting to re-use as much as possible, but
submitting the totally misspelt word on the kbd just doesnt get you results in a
spellchecker. (i hand created some and fed them to aspell to see what it did
and it just was useless). they are used to 1 or 2 errors of certain kinds
- maybe 3. but when every letter is totally wrong you need an exhaustive search
through permutations. :(

when "kocsinkat" is the word you wanted... but you actually typed
"opdsomlsr" ... try get a speller to fix that! interestingly enough at least
the english equivalenets: wanted "foolhardy" and i actually typed
"gioljsefu"... illume can and will correct it to "foolhardy"... probably as the
top or one of the top suggestions... whic is a far cry better than what aspell
can dream of doing. it DOEs have a limit that exactly the same number of chars
in the desired word need to exist as the matches - but for now, lets assume you
hit the kbd the right number of times and its really just screen/finger accuracy
fixing.

i can't begin to imagine the permutation searches needed for hungarian as
either you put all permutations in the dictionary of all words, (for german
it's doable - seemingly not for hungarian), or you need to start trying all
sorts of permutations of multiple words string together for matches... man
thats going to be nastiness. to be honest. i really can't see it being possible
to solve this without a lot of work. i don't have the bandwidth to go solving
every language on the planet's input problems. i'm going to have to leave it up
to others to pick up that baton. i think the best i can do is cover the cases i
can do sanely given what i know of those languages. for now that probably
covers western european languages (and so that means western europe, all of
north/south america, some bits of asia/pacific, and probably most of africa,
and maybe russian too).

i'm simply going to have to rely on others to take the baton and run with it
for other languages. i can;'t see a universal engine covering more than the
languages/countries i described above for now. (and possibly other languages
that behave similarly with spaces between words and reasonable dictionary sizes)

> Sorry for being so tiresome.
> 
> Best regards,
>  Khiraly
> 


-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    raster at rasterman.com





More information about the community mailing list