New Rasterman Image...

"Marco Trevisan (Treviño)" mail at 3v1n0.net
Thu Oct 2 20:52:00 CEST 2008


Carsten Haitzler (The Rasterman) wrote:
> it does have concept of frequency orf words. i just dont have any DATA for
> that. the dict format handles is:
> word1
> word2
> word3
> 
> OR
> word1 20
> word2 434
> word3 1

I was thinking to a way to automatize this a while ago, but I wrote
something just now...
The basic idea is that of using the google number of results for each
word and using this value as a frequency number (well, I know these
numbers are often too much great, so I guess that they should be
re-analyzed and lowered but I had no time to do this now :P).

So this is a little utility I wrote [1] to check the frequency of each
word and writing back a new dictionary with frequency data.

To run it you need php-cli (I guess v5 or above), set the given options,
do "php words-popularity.php" and wait the work to be finished! :P

It could be a long work, but it should give good results.

PS: I've used php since I run it both on my PC and on a server (dividing
the work) where I've ssh access but in which I can run by command line
just a little subset of languages, and php is one of this.

[1] http://3v1n0.tuxfamily.org/openmoko/words-popularity.phps

-- 
Treviño's World - Life and Linux
http://www.3v1n0.net/





More information about the community mailing list