New Rasterman Image...

Carsten Haitzler (The Rasterman) raster at
Tue Oct 7 01:52:13 CEST 2008

On Thu, 02 Oct 2008 20:52:00 +0200 "Marco Trevisan (Treviño)" <mail at>

> Carsten Haitzler (The Rasterman) wrote:
> > it does have concept of frequency orf words. i just dont have any DATA for
> > that. the dict format handles is:
> > word1
> > word2
> > word3
> > 
> > OR
> > word1 20
> > word2 434
> > word3 1
> I was thinking to a way to automatize this a while ago, but I wrote
> something just now...
> The basic idea is that of using the google number of results for each
> word and using this value as a frequency number (well, I know these
> numbers are often too much great, so I guess that they should be
> re-analyzed and lowered but I had no time to do this now :P).
> So this is a little utility I wrote [1] to check the frequency of each
> word and writing back a new dictionary with frequency data.
> To run it you need php-cli (I guess v5 or above), set the given options,
> do "php words-popularity.php" and wait the work to be finished! :P
> It could be a long work, but it should give good results.

yes. it would. who wants to run it? :)

nb. i checked illume's kbd code - it does have issues with utf8 keysequences in
sorted dicts. if you have any it'll fail to keep looking for more words so you
need to remove anything utf8 from your dict :( yes - i know. bad. i need to
address this. and the change in dict format i am sure 1. makes this now simple,
2. compresses the dict, 3. speeds it up, 4. solves this problem. :) but i just
need to do it - no time right now :(

> PS: I've used php since I run it both on my PC and on a server (dividing
> the work) where I've ssh access but in which I can run by command line
> just a little subset of languages, and php is one of this.
> [1]
> -- 
> Treviño's World - Life and Linux
> _______________________________________________
> Openmoko community mailing list
> community at

------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    raster at

More information about the community mailing list