[wikireader]Error on parsing the spanish wikipedia

David Reyes Samblas Martinez david at tuxbrain.com
Fri Oct 30 19:46:21 CET 2009


just an think I realized , all faulty articles the title starts with
the "~" simbol
regards
David Reyes Samblas Martinez
http://www.tuxbrain.com
Open ultraportable & embedded solutions
Openmoko, Openpandora,  Arduino
Hey, watch out!!! There's a linux in your pocket!!!




2009/10/30 David Reyes Samblas Martinez <david at tuxbrain.com>:
> Are you uploading this changes to git? can I take a look?
>
> David Reyes Samblas Martinez
> http://www.tuxbrain.com
> Open ultraportable & embedded solutions
> Openmoko, Openpandora,  Arduino
> Hey, watch out!!! There's a linux in your pocket!!!
>
>
>
>
> 2009/10/30 Sean Moss-Pultz <sean at openmoko.com>:
>> On Fri, Oct 30, 2009 at 4:50 AM, David Reyes Samblas Martinez
>> <david at tuxbrain.com> wrote:
>>> Hi I'm trying to generate the file for a spainsh wikipedia on the WR ,
>>> after compiling succsesfuly the source on the git and solve some
>>> annoyings with utf8 encoding on phyton error was somthing like this:
>>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in
>>> position....: ordinal not in range(128)
>>> this was solved changing the default encode "ascii" to "utf8" int the
>>> /usr/lib/python2.6/site.py file
>>> after this I was hable to execute ok the instruction:
>>> make DESTDIR=image WORKDIR=work
>>> XML_FILES=xml-file-samples/eswiki-latest-pages-articles.xml index
>>> parse render combine
>>>
>>> Every thing seem fine for a couple(about 6-7h) of hours parsing the
>>> 700000 articles in spanish but  then ... the horror
>>> Count: 380000
>>> Traceback (most recent call last):
>>>  File "./ArticleParser.py", line 224, in <module>
>>>    main()
>>>  File "./ArticleParser.py", line 172, in main
>>>    process_article_text(title.encode('utf-8'),  f.read(length), newf)
>>>  File "./ArticleParser.py", line 218, in process_article_text
>>>    newf.write(text + '\n')
>>> IOError: [Errno 32] Broken pipe
>>> make[1]: *** [parse] Error 1
>>> make[1]: se sale del directorio
>>> `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer'
>>> make: *** [parse] Error 2
>>
>> OK that's fixed now. Chris already checked in the code. Our build
>> worked fine. We need to do a few more tweaks and then we can post a
>> (super) early test image. Give us until early this coming week.
>>
>>  -Sean
>>
>> _______________________________________________
>> Openmoko community mailing list
>> community at lists.openmoko.org
>> http://lists.openmoko.org/mailman/listinfo/community
>>
>



More information about the community mailing list