Automatically splitting the planet by language

Minh Ha Duong haduong at centre-cired.fr
Tue Oct 7 10:20:50 CEST 2008


Le mardi 07 octobre 2008, Alex Osborne a écrit :
> Hello everyone,
>
> I've been experimenting with splitting the planet's feed by language.
> Here's the result:
>
> http://meshy.org/~ato/planet_om/
>
> As you can see it's at least doing a pretty good job with English,
> French and German.  I've got it on an hourly cron so we'll see if it
> continues to perform so well.  The planet_split.tar.gz contains the
> source to the script.  All I'm doing to detect the language is feeding
> the text of each post into a Python port [1] of TextCat [2].

 Ah, good job. If I may suggest, I think this code should be carefully stored 
where admins will find it when they need it. That means: would you care to 
open a ticket in https://admin-trac.openmoko.org/trac about "Research and 
implement i18n in planet" and attach it there ?

  The Right Thing to do would be to upgrade or patch the aggregator used by 
Openmoko for the planet. I think that they are using  planetplanet (or is it 
named venus now ?) which happens to be in Python too.

Yours,
Minh



More information about the documentation mailing list