Automatically splitting the planet by language

Alex Osborne ato at
Tue Oct 7 03:38:54 CEST 2008

Hello everyone,

I've been experimenting with splitting the planet's feed by language. 
Here's the result:

As you can see it's at least doing a pretty good job with English,
French and German.  I've got it on an hourly cron so we'll see if it
continues to perform so well.  The planet_split.tar.gz contains the
source to the script.  All I'm doing to detect the language is feeding
the text of each post into a Python port [1] of TextCat [2].




More information about the documentation mailing list