[Wikireader] Error on processing the German Wikipedia

David Reyes Samblas Martinez david at tuxbrain.com
Fri Nov 20 10:15:41 CET 2009


Well spanish one give me the same error before but now it works, I'm
parsing the de wikipedia right now (Count: 173000) lets see whats
happens :)

Note:Parsing the 2009-Nov-11
http://download.wikipedia.org/dewiki/latest/dewiki-latest-pages-articles.xml.bz2

Regards

David Reyes Samblas Martinez
http://www.tuxbrain.com
Open ultraportable & embedded solutions
Openmoko, Openpandora,  Arduino
Hey, watch out!!! There's a linux in your pocket!!!




2009/11/20 Tilman Baumann <tilman at baumann.name>:
> Can you reproduce this with a neutral locale?
>  export LC_ALL=C
>
> I'm at the moment trying the same. I had a lot of hickups, caused by many
> things. Among them missing tools and not enough memory.
>
> This is currently where I'm stuck with the German wikipedia.
>
> Count: 823000
> Count: 824000
> Count: 825000
> Count: 826000
> Count: 827000
> Count: 828000
> Count: 829000
> Count: 830000
> Count: 831000
> Count: 832000
> Count: 833000
> Traceback (most recent call last):
>  File "./ArticleParser.py", line 203, in <module>
>    main()
>  File "./ArticleParser.py", line 168, in main
>    process_article_text(title.encode('utf-8'),  f.read(length), newf)
>  File "./ArticleParser.py", line 197, in process_article_text
>    newf.write(text + '\n')
> IOError: [Errno 32] Broken pipe
> make[1]: *** [parse] Error 1
> make[1]: Leaving directory
> `/home/tilli/wikireader/host-tools/offline-renderer'
> make: *** [parse] Error 2
>
> I suppose it failed somewhere in PARSER_COMMAND
>
>
> Before that, the following steps went through without fail.
> make
> make DESTDIR=image WORKDIR=work
> XML_FILES=dewiki-20091028-pages-articles.xml index
>
>
> David Reyes Samblas Martinez wrote:
>> After the "success" of the spanish wikipedia pending to resolve the
>> indexing part, I was starting to work on the german wikipedia
>> http://download.wikipedia.org/dewiki/latest/dewiki-latest-pages-meta-current.xml.bz2
>>
>> but it fails at first step with the following error
>>
>> #make DESTDIR=image WORKDIR=work
>> XML_FILES=dewiki-latest-pages-meta-current.xml index parse render
>> combine
>>
>> awk: línea ord.:1: fatal: no se puede abrir el fichero
>> `work/counts.text' para lectura (No existe el fichero ó directorio)
>> cd host-tools/offline-renderer && make index \
>>               XML_FILES="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/dewiki-latest-pages-meta-current.xml"
>> RENDER_BLOCK="0" \
>>               WORKDIR="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work"
>> DESTDIR="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/image"
>> make[1]: se ingresa al directorio
>> `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer'
>> ./ArticleIndex.py  \
>>               --article-index="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/articles.db"
>> \
>>               --article-offsets="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/offsets.db"
>> \
>>               --article-counts="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/counts.text"
>> \
>>               --prefix="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/image/pedia"
>> /OE/Proyectos/tuxbrain/productos/wikireader/wikireader/dewiki-latest-pages-meta-current.xml
>> Traceback (most recent call last):
>>   File "./ArticleIndex.py", line 611, in <module>
>>     main()
>>   File "./ArticleIndex.py", line 172, in main
>>     limit = processor.process(f, limit)
>>   File
>> "/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer/FileScanner.py",
>> line 141, in process
>>     if '#' == body[0] and 'redirect' == body[1:9].lower():
>> IndexError: string index out of range
>> Flushing databases
>> Writing: files
>> Time: 0s
>> Writing: articles
>> Time: 0s
>> Writing: offsets
>> Time: 0s
>> Loading: articles
>> Time: 0s
>> Loading: offsets and files
>> Time: 0s
>> make[1]: *** [index] Error 1
>> make[1]: se sale del directorio
>> `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer'
>> make: *** [index] Error 2
>>
>> Regards
>>
>> David Reyes Samblas Martinez
>> http://www.tuxbrain.com
>> Open ultraportable & embedded solutions
>> Openmoko, Openpandora,  Arduino
>> Hey, watch out!!! There's a linux in your pocket!!!
>>
>> _______________________________________________
>> Openmoko community mailing list
>> community at lists.openmoko.org
>> http://lists.openmoko.org/mailman/listinfo/community
>>
>
>
> --
>
>
>
> _______________________________________________
> Openmoko community mailing list
> community at lists.openmoko.org
> http://lists.openmoko.org/mailman/listinfo/community
>



More information about the community mailing list