[Wikireader] Error on processing the German Wikipedia

Tilman Baumann tilman at baumann.name
Fri Nov 20 18:18:36 CET 2009


David Reyes Samblas Martinez wrote:
> Don't hold your breath :( failing at Count: 832000

Same error as I?

> David Reyes Samblas Martinez
> http://www.tuxbrain.com
> Open ultraportable & embedded solutions
> Openmoko, Openpandora,  Arduino
> Hey, watch out!!! There's a linux in your pocket!!!
>
>
>
>
> 2009/11/20 Tilman Baumann <tilman at baumann.name>:
>>
>> David Reyes Samblas Martinez wrote:
>>> Well spanish one give me the same error before but now it works,
>> Any idea what solved it? Or is it just random and will go away if I try
>> it
>> again? :)
>>
>>> I'm parsing the de wikipedia right now (Count: 173000) lets see whats
>>> happens :)
>>
>> I would definitely be interessted in the results...
>>
>>> Note:Parsing the 2009-Nov-11
>>> http://download.wikipedia.org/dewiki/latest/dewiki-latest-pages-articles.xml.bz2
>>>
>>> Regards
>>>
>>> David Reyes Samblas Martinez
>>> http://www.tuxbrain.com
>>> Open ultraportable & embedded solutions
>>> Openmoko, Openpandora,  Arduino
>>> Hey, watch out!!! There's a linux in your pocket!!!
>>>
>>>
>>>
>>>
>>> 2009/11/20 Tilman Baumann <tilman at baumann.name>:
>>>> Can you reproduce this with a neutral locale?
>>>>  export LC_ALL=C
>>>>
>>>> I'm at the moment trying the same. I had a lot of hickups, caused by
>>>> many
>>>> things. Among them missing tools and not enough memory.
>>>>
>>>> This is currently where I'm stuck with the German wikipedia.
>>>>
>>>> Count: 823000
>>>> Count: 824000
>>>> Count: 825000
>>>> Count: 826000
>>>> Count: 827000
>>>> Count: 828000
>>>> Count: 829000
>>>> Count: 830000
>>>> Count: 831000
>>>> Count: 832000
>>>> Count: 833000
>>>> Traceback (most recent call last):
>>>>  File "./ArticleParser.py", line 203, in <module>
>>>>    main()
>>>>  File "./ArticleParser.py", line 168, in main
>>>>    process_article_text(title.encode('utf-8'),  f.read(length), newf)
>>>>  File "./ArticleParser.py", line 197, in process_article_text
>>>>    newf.write(text + '\n')
>>>> IOError: [Errno 32] Broken pipe
>>>> make[1]: *** [parse] Error 1
>>>> make[1]: Leaving directory
>>>> `/home/tilli/wikireader/host-tools/offline-renderer'
>>>> make: *** [parse] Error 2
>>>>
>>>> I suppose it failed somewhere in PARSER_COMMAND
>>>>
>>>>
>>>> Before that, the following steps went through without fail.
>>>> make
>>>> make DESTDIR=image WORKDIR=work
>>>> XML_FILES=dewiki-20091028-pages-articles.xml index
>>>>
>>>>
>>>> David Reyes Samblas Martinez wrote:
>>>>> After the "success" of the spanish wikipedia pending to resolve the
>>>>> indexing part, I was starting to work on the german wikipedia
>>>>> http://download.wikipedia.org/dewiki/latest/dewiki-latest-pages-meta-current.xml.bz2
>>>>>
>>>>> but it fails at first step with the following error
>>>>>
>>>>> #make DESTDIR=image WORKDIR=work
>>>>> XML_FILES=dewiki-latest-pages-meta-current.xml index parse render
>>>>> combine
>>>>>
>>>>> awk: línea ord.:1: fatal: no se puede abrir el fichero
>>>>> `work/counts.text' para lectura (No existe el fichero ó directorio)
>>>>> cd host-tools/offline-renderer && make index \
>>>>>
>>>>> XML_FILES="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/dewiki-latest-pages-meta-current.xml"
>>>>> RENDER_BLOCK="0" \
>>>>>
>>>>> WORKDIR="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work"
>>>>> DESTDIR="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/image"
>>>>> make[1]: se ingresa al directorio
>>>>> `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer'
>>>>> ./ArticleIndex.py  \
>>>>>
>>>>> --article-index="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/articles.db"
>>>>> \
>>>>>
>>>>> --article-offsets="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/offsets.db"
>>>>> \
>>>>>
>>>>> --article-counts="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/counts.text"
>>>>> \
>>>>>
>>>>> --prefix="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/image/pedia"
>>>>> /OE/Proyectos/tuxbrain/productos/wikireader/wikireader/dewiki-latest-pages-meta-current.xml
>>>>> Traceback (most recent call last):
>>>>>   File "./ArticleIndex.py", line 611, in <module>
>>>>>     main()
>>>>>   File "./ArticleIndex.py", line 172, in main
>>>>>     limit = processor.process(f, limit)
>>>>>   File
>>>>> "/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer/FileScanner.py",
>>>>> line 141, in process
>>>>>     if '#' == body[0] and 'redirect' == body[1:9].lower():
>>>>> IndexError: string index out of range
>>>>> Flushing databases
>>>>> Writing: files
>>>>> Time: 0s
>>>>> Writing: articles
>>>>> Time: 0s
>>>>> Writing: offsets
>>>>> Time: 0s
>>>>> Loading: articles
>>>>> Time: 0s
>>>>> Loading: offsets and files
>>>>> Time: 0s
>>>>> make[1]: *** [index] Error 1
>>>>> make[1]: se sale del directorio
>>>>> `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer'
>>>>> make: *** [index] Error 2
>>>>>
>>>>> Regards
>>>>>
>>>>> David Reyes Samblas Martinez
>>>>> http://www.tuxbrain.com
>>>>> Open ultraportable & embedded solutions
>>>>> Openmoko, Openpandora,  Arduino
>>>>> Hey, watch out!!! There's a linux in your pocket!!!
>>>>>
>>>>> _______________________________________________
>>>>> Openmoko community mailing list
>>>>> community at lists.openmoko.org
>>>>> http://lists.openmoko.org/mailman/listinfo/community
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Openmoko community mailing list
>>>> community at lists.openmoko.org
>>>> http://lists.openmoko.org/mailman/listinfo/community
>>>>
>>>
>>> _______________________________________________
>>> Openmoko community mailing list
>>> community at lists.openmoko.org
>>> http://lists.openmoko.org/mailman/listinfo/community
>>>
>>
>>
>> --
>>
>>
>>
>> _______________________________________________
>> Openmoko community mailing list
>> community at lists.openmoko.org
>> http://lists.openmoko.org/mailman/listinfo/community
>>
>
> _______________________________________________
> Openmoko community mailing list
> community at lists.openmoko.org
> http://lists.openmoko.org/mailman/listinfo/community
>


-- 





More information about the community mailing list