[WikiReader] Sharing compiling sources.

Tilman Baumann tilman at baumann.name
Mon Nov 30 12:09:04 CET 2009


Actually the tar file seems to be broken. So much for potentially missing
stuff. ;)

Tilman Baumann wrote:
> Hi,
>
> can you maybe release this as a patch?
> I like to inegrate this in github. But I fear I might miss something if I
> try to fiddle out the changes by hand.
>
> Thanks
>
> David Reyes Samblas Martinez wrote:
>> Sorry for the wait Thomas,
>> I was working to solve the broken pipe issue that stops the parser
>> when it finds an error. I have applied a quick and dirty workaround
>> using try-catch technique and now the process will not stop  and just
>> skip the faulty article and keeps going :) it logs the faulty ones in
>> a text file (title and position) for posterior forensics, but my first
>> guesses in that is not a codification issue with utf8 is more an
>> unexpected formating tag the php parser don't know how to deal with
>> Actually parsing the german wikipedia with more than 1.3 million
>> articles
>>
>> Count: 1043000
>> Failing count: 2
>>
>> and keeps going I supose we can sacrificate two articles for having
>> one milion available now :)
>>
>> as you requested I uploaded my working compiled tools[1]  but without
>> any xml sources it's about 113Mb, but if you have a working tools on
>> your system you just have to change
>> host-tools/offline-renderer/ArticleParser.py by the attached on this
>> mail and you can forget to cry like a child that his ice cream has
>> fall to the floor when after more than 24h parsing hundred of thousand
>> articles pased the process you see this ugly python error backtrace
>> blablabla and not your desired file :)
>>
>> by the way the faultyarticles.txt is saved at same
>> host-tools/offline-renderer directory, (i'm too lazy to put a
>> parameter for change that and I hardcoded the name of the file ,
>> yes... don't waste typing on correct that bad habit, I know)
>>
>> If you have curiosity of what articles on the german wiki are causing
>> troubles
>> on dewiki-latest-pages-articles.xml (date 2009-11-20)
>>
>> ~Storck Bicycle
>> 832673
>> ~Musculus serratus posterior inferior
>> 857334
>>
>> Regards I hope I will upload the German wikipedia on Sunday... and
>> will be available on Monday, sorry for the wait but my Asymmetric DSL
>> is very asymmetric and upload 1.5-2 Gb (expected file size) will take
>> a bunch of hours.
>>
>> For those than wants to compile his own , go for it :) the
>> Quickreference in the doc directory on the souce is all you need to
>> start working,  just remember than if you have a 64 bit system you
>> will have to follow the 64 bits method to compile the tools,
>>
>> Regards
>> [1]http://tuxbrain.org/downloads/wikireader/wikireaderbinaries20091127_dsamblas_modified_trycatch.tar.bz2
>> David Reyes Samblas Martinez
>> http://www.tuxbrain.com
>> Open ultraportable & embedded solutions
>> Openmoko, Openpandora,  Arduino
>> Hey, watch out!!! There's a linux in your pocket!!!
>>
>>
>>
>>
>> 2009/11/27 Thomas HOCEDEZ <thomas.hocedez at free.fr>:
>>> Thomas HOCEDEZ a écrit :
>>>>
>>>> Hi DAvid,
>>>>
>>>> Can you share your scripts & configs to do the same in French (and
>>>> other
>>>> languages) ?
>>>> Thanks
>>>>
>>>> Thomas
>>>>
>>>>
>>>
>>> As the Mailing list seems to be broken (or users started hibernating
>>> for
>>> winter...) I find by myself the way to compile things step by step.
>>> I'm for now rendering the French Wikipedia. As it started a few minutes
>>> ago,
>>> the result will be availabel during the weekend (I hope).
>>>
>>> I'll also post the way I managed to do so ! (I'm at the office for now,
>>> and
>>> I'm leaving...)
>>>
>>> Regards to you all !
>>>
>>> Thomas
>>>
>> _______________________________________________
>> Openmoko community mailing list
>> community at lists.openmoko.org
>> http://lists.openmoko.org/mailman/listinfo/community
>>
>
>
> --
>
>


-- 





More information about the community mailing list