[WikiReader] Sharing compiling sources.
David Reyes Samblas Martinez
david at tuxbrain.com
Mon Nov 30 14:49:43 CET 2009
Here you have :)
David Reyes Samblas Martinez
Open ultraportable & embedded solutions
Openmoko, Openpandora, Arduino
Hey, watch out!!! There's a linux in your pocket!!!
2009/11/30 Tilman Baumann <tilman at baumann.name>:
> can you maybe release this as a patch?
> I like to inegrate this in github. But I fear I might miss something if I
> try to fiddle out the changes by hand.
> David Reyes Samblas Martinez wrote:
>> Sorry for the wait Thomas,
>> I was working to solve the broken pipe issue that stops the parser
>> when it finds an error. I have applied a quick and dirty workaround
>> using try-catch technique and now the process will not stop and just
>> skip the faulty article and keeps going :) it logs the faulty ones in
>> a text file (title and position) for posterior forensics, but my first
>> guesses in that is not a codification issue with utf8 is more an
>> unexpected formating tag the php parser don't know how to deal with
>> Actually parsing the german wikipedia with more than 1.3 million articles
>> Count: 1043000
>> Failing count: 2
>> and keeps going I supose we can sacrificate two articles for having
>> one milion available now :)
>> as you requested I uploaded my working compiled tools but without
>> any xml sources it's about 113Mb, but if you have a working tools on
>> your system you just have to change
>> host-tools/offline-renderer/ArticleParser.py by the attached on this
>> mail and you can forget to cry like a child that his ice cream has
>> fall to the floor when after more than 24h parsing hundred of thousand
>> articles pased the process you see this ugly python error backtrace
>> blablabla and not your desired file :)
>> by the way the faultyarticles.txt is saved at same
>> host-tools/offline-renderer directory, (i'm too lazy to put a
>> parameter for change that and I hardcoded the name of the file ,
>> yes... don't waste typing on correct that bad habit, I know)
>> If you have curiosity of what articles on the german wiki are causing
>> on dewiki-latest-pages-articles.xml (date 2009-11-20)
>> ~Storck Bicycle
>> ~Musculus serratus posterior inferior
>> Regards I hope I will upload the German wikipedia on Sunday... and
>> will be available on Monday, sorry for the wait but my Asymmetric DSL
>> is very asymmetric and upload 1.5-2 Gb (expected file size) will take
>> a bunch of hours.
>> For those than wants to compile his own , go for it :) the
>> Quickreference in the doc directory on the souce is all you need to
>> start working, just remember than if you have a 64 bit system you
>> will have to follow the 64 bits method to compile the tools,
>> David Reyes Samblas Martinez
>> Open ultraportable & embedded solutions
>> Openmoko, Openpandora, Arduino
>> Hey, watch out!!! There's a linux in your pocket!!!
>> 2009/11/27 Thomas HOCEDEZ <thomas.hocedez at free.fr>:
>>> Thomas HOCEDEZ a écrit :
>>>> Hi DAvid,
>>>> Can you share your scripts & configs to do the same in French (and
>>>> languages) ?
>>> As the Mailing list seems to be broken (or users started hibernating for
>>> winter...) I find by myself the way to compile things step by step.
>>> I'm for now rendering the French Wikipedia. As it started a few minutes
>>> the result will be availabel during the weekend (I hope).
>>> I'll also post the way I managed to do so ! (I'm at the office for now,
>>> I'm leaving...)
>>> Regards to you all !
>> Openmoko community mailing list
>> community at lists.openmoko.org
> Openmoko community mailing list
> community at lists.openmoko.org
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 2454 bytes
Desc: not available
Url : http://lists.openmoko.org/pipermail/community/attachments/20091130/7ae7ec1a/attachment.bin
More information about the community