Automatically splitting the planet by language
Minh Ha Duong
haduong at centre-cired.fr
Tue Oct 7 10:20:50 CEST 2008
Le mardi 07 octobre 2008, Alex Osborne a écrit :
> Hello everyone,
> I've been experimenting with splitting the planet's feed by language.
> Here's the result:
> As you can see it's at least doing a pretty good job with English,
> French and German. I've got it on an hourly cron so we'll see if it
> continues to perform so well. The planet_split.tar.gz contains the
> source to the script. All I'm doing to detect the language is feeding
> the text of each post into a Python port  of TextCat .
Ah, good job. If I may suggest, I think this code should be carefully stored
where admins will find it when they need it. That means: would you care to
open a ticket in https://admin-trac.openmoko.org/trac about "Research and
implement i18n in planet" and attach it there ?
The Right Thing to do would be to upgrade or patch the aggregator used by
Openmoko for the planet. I think that they are using planetplanet (or is it
named venus now ?) which happens to be in Python too.
More information about the documentation