New Rasterman Image...
"Marco Trevisan (Treviño)"
mail at 3v1n0.net
Thu Oct 2 20:52:00 CEST 2008
Carsten Haitzler (The Rasterman) wrote:
> it does have concept of frequency orf words. i just dont have any DATA for
> that. the dict format handles is:
> word1 20
> word2 434
> word3 1
I was thinking to a way to automatize this a while ago, but I wrote
something just now...
The basic idea is that of using the google number of results for each
word and using this value as a frequency number (well, I know these
numbers are often too much great, so I guess that they should be
re-analyzed and lowered but I had no time to do this now :P).
So this is a little utility I wrote  to check the frequency of each
word and writing back a new dictionary with frequency data.
To run it you need php-cli (I guess v5 or above), set the given options,
do "php words-popularity.php" and wait the work to be finished! :P
It could be a long work, but it should give good results.
PS: I've used php since I run it both on my PC and on a server (dividing
the work) where I've ssh access but in which I can run by command line
just a little subset of languages, and php is one of this.
Treviño's World - Life and Linux
More information about the community