Automatically splitting the planet by language
ato at meshy.org
Tue Oct 7 03:38:54 CEST 2008
I've been experimenting with splitting the planet's feed by language.
Here's the result:
As you can see it's at least doing a pretty good job with English,
French and German. I've got it on an hourly cron so we'll see if it
continues to perform so well. The planet_split.tar.gz contains the
source to the script. All I'm doing to detect the language is feeding
the text of each post into a Python port  of TextCat .
More information about the documentation