[Wikireader] Error on processing the German Wikipedia

Tilman Baumann tilman at baumann.name
Fri Nov 20 10:01:32 CET 2009


Can you reproduce this with a neutral locale?
 export LC_ALL=C

I'm at the moment trying the same. I had a lot of hickups, caused by many
things. Among them missing tools and not enough memory.

This is currently where I'm stuck with the German wikipedia.

Count: 823000
Count: 824000
Count: 825000
Count: 826000
Count: 827000
Count: 828000
Count: 829000
Count: 830000
Count: 831000
Count: 832000
Count: 833000
Traceback (most recent call last):
  File "./ArticleParser.py", line 203, in <module>
    main()
  File "./ArticleParser.py", line 168, in main
    process_article_text(title.encode('utf-8'),  f.read(length), newf)
  File "./ArticleParser.py", line 197, in process_article_text
    newf.write(text + '\n')
IOError: [Errno 32] Broken pipe
make[1]: *** [parse] Error 1
make[1]: Leaving directory
`/home/tilli/wikireader/host-tools/offline-renderer'
make: *** [parse] Error 2

I suppose it failed somewhere in PARSER_COMMAND


Before that, the following steps went through without fail.
make
make DESTDIR=image WORKDIR=work
XML_FILES=dewiki-20091028-pages-articles.xml index


David Reyes Samblas Martinez wrote:
> After the "success" of the spanish wikipedia pending to resolve the
> indexing part, I was starting to work on the german wikipedia
> http://download.wikipedia.org/dewiki/latest/dewiki-latest-pages-meta-current.xml.bz2
>
> but it fails at first step with the following error
>
> #make DESTDIR=image WORKDIR=work
> XML_FILES=dewiki-latest-pages-meta-current.xml index parse render
> combine
>
> awk: línea ord.:1: fatal: no se puede abrir el fichero
> `work/counts.text' para lectura (No existe el fichero ó directorio)
> cd host-tools/offline-renderer && make index \
> 		XML_FILES="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/dewiki-latest-pages-meta-current.xml"
> RENDER_BLOCK="0" \
> 		WORKDIR="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work"
> DESTDIR="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/image"
> make[1]: se ingresa al directorio
> `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer'
> ./ArticleIndex.py  \
> 		--article-index="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/articles.db"
> \
> 		--article-offsets="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/offsets.db"
> \
> 		--article-counts="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/counts.text"
> \
> 		--prefix="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/image/pedia"
> /OE/Proyectos/tuxbrain/productos/wikireader/wikireader/dewiki-latest-pages-meta-current.xml
> Traceback (most recent call last):
>   File "./ArticleIndex.py", line 611, in <module>
>     main()
>   File "./ArticleIndex.py", line 172, in main
>     limit = processor.process(f, limit)
>   File
> "/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer/FileScanner.py",
> line 141, in process
>     if '#' == body[0] and 'redirect' == body[1:9].lower():
> IndexError: string index out of range
> Flushing databases
> Writing: files
> Time: 0s
> Writing: articles
> Time: 0s
> Writing: offsets
> Time: 0s
> Loading: articles
> Time: 0s
> Loading: offsets and files
> Time: 0s
> make[1]: *** [index] Error 1
> make[1]: se sale del directorio
> `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer'
> make: *** [index] Error 2
>
> Regards
>
> David Reyes Samblas Martinez
> http://www.tuxbrain.com
> Open ultraportable & embedded solutions
> Openmoko, Openpandora,  Arduino
> Hey, watch out!!! There's a linux in your pocket!!!
>
> _______________________________________________
> Openmoko community mailing list
> community at lists.openmoko.org
> http://lists.openmoko.org/mailman/listinfo/community
>


-- 





More information about the community mailing list