[WikiReader] Sharing compiling sources.

Tilman Baumann tilman at baumann.name
Mon Nov 30 18:20:22 CET 2009


I realized that it does not apply to the latest version.
So I took the liberty of making a fork on github and merged it.
http://github.com/tbaumann/wikireader

I think I cracked the nut, but have a look if you would be so kind.
I'm not sure I completely got it.

(Please ignore the first commit. I did not test correctly before checking
in. :-/ )

Regards
 Tilman Baumann


David Reyes Samblas Martinez wrote:
> Here you have :)
>
> David Reyes Samblas Martinez
> http://www.tuxbrain.com
> Open ultraportable & embedded solutions
> Openmoko, Openpandora,  Arduino
> Hey, watch out!!! There's a linux in your pocket!!!
>
>
>
>
> 2009/11/30 Tilman Baumann <tilman at baumann.name>:
>> Hi,
>>
>> can you maybe release this as a patch?
>> I like to inegrate this in github. But I fear I might miss something if
>> I
>> try to fiddle out the changes by hand.
>>
>> Thanks
>>
>> David Reyes Samblas Martinez wrote:
>>> Sorry for the wait Thomas,
>>> I was working to solve the broken pipe issue that stops the parser
>>> when it finds an error. I have applied a quick and dirty workaround
>>> using try-catch technique and now the process will not stop  and just
>>> skip the faulty article and keeps going :) it logs the faulty ones in
>>> a text file (title and position) for posterior forensics, but my first
>>> guesses in that is not a codification issue with utf8 is more an
>>> unexpected formating tag the php parser don't know how to deal with
>>> Actually parsing the german wikipedia with more than 1.3 million
>>> articles
>>>
>>> Count: 1043000
>>> Failing count: 2
>>>
>>> and keeps going I supose we can sacrificate two articles for having
>>> one milion available now :)
>>>
>>> as you requested I uploaded my working compiled tools[1]  but without
>>> any xml sources it's about 113Mb, but if you have a working tools on
>>> your system you just have to change
>>> host-tools/offline-renderer/ArticleParser.py by the attached on this
>>> mail and you can forget to cry like a child that his ice cream has
>>> fall to the floor when after more than 24h parsing hundred of thousand
>>> articles pased the process you see this ugly python error backtrace
>>> blablabla and not your desired file :)
>>>
>>> by the way the faultyarticles.txt is saved at same
>>> host-tools/offline-renderer directory, (i'm too lazy to put a
>>> parameter for change that and I hardcoded the name of the file ,
>>> yes... don't waste typing on correct that bad habit, I know)
>>>
>>> If you have curiosity of what articles on the german wiki are causing
>>> troubles
>>> on dewiki-latest-pages-articles.xml (date 2009-11-20)
>>>
>>> ~Storck Bicycle
>>> 832673
>>> ~Musculus serratus posterior inferior
>>> 857334
>>>
>>> Regards I hope I will upload the German wikipedia on Sunday... and
>>> will be available on Monday, sorry for the wait but my Asymmetric DSL
>>> is very asymmetric and upload 1.5-2 Gb (expected file size) will take
>>> a bunch of hours.
>>>
>>> For those than wants to compile his own , go for it :) the
>>> Quickreference in the doc directory on the souce is all you need to
>>> start working,  just remember than if you have a 64 bit system you
>>> will have to follow the 64 bits method to compile the tools,
>>>
>>> Regards
>>> [1]http://tuxbrain.org/downloads/wikireader/wikireaderbinaries20091127_dsamblas_modified_trycatch.tar.bz2
>>> David Reyes Samblas Martinez
>>> http://www.tuxbrain.com
>>> Open ultraportable & embedded solutions
>>> Openmoko, Openpandora,  Arduino
>>> Hey, watch out!!! There's a linux in your pocket!!!
>>>
>>>
>>>
>>>
>>> 2009/11/27 Thomas HOCEDEZ <thomas.hocedez at free.fr>:
>>>> Thomas HOCEDEZ a écrit :
>>>>>
>>>>> Hi DAvid,
>>>>>
>>>>> Can you share your scripts & configs to do the same in French (and
>>>>> other
>>>>> languages) ?
>>>>> Thanks
>>>>>
>>>>> Thomas
>>>>>
>>>>>
>>>>
>>>> As the Mailing list seems to be broken (or users started hibernating
>>>> for
>>>> winter...) I find by myself the way to compile things step by step.
>>>> I'm for now rendering the French Wikipedia. As it started a few
>>>> minutes
>>>> ago,
>>>> the result will be availabel during the weekend (I hope).
>>>>
>>>> I'll also post the way I managed to do so ! (I'm at the office for
>>>> now,
>>>> and
>>>> I'm leaving...)
>>>>
>>>> Regards to you all !
>>>>
>>>> Thomas
>>>>
>>> _______________________________________________
>>> Openmoko community mailing list
>>> community at lists.openmoko.org
>>> http://lists.openmoko.org/mailman/listinfo/community
>>>
>>
>>
>> --
>>
>>
>>
>> _______________________________________________
>> Openmoko community mailing list
>> community at lists.openmoko.org
>> http://lists.openmoko.org/mailman/listinfo/community
>>
> _______________________________________________
> Openmoko community mailing list
> community at lists.openmoko.org
> http://lists.openmoko.org/mailman/listinfo/community
>


-- 





More information about the community mailing list