[WikiReader] Sharing compiling sources.

Thomas HOCEDEZ thomas.hocedez at free.fr
Mon Nov 30 17:09:25 CET 2009


Tim Besard a écrit :
> Hi,
>
> It seems that the Dutch wikipedia contains some UTF-8 only characters,
> which crashes the parser after all due to the "system echo" in the
> exception handler. Changing the offending line to
>     os.system('echo \"%s\" >> fault_articles.txt' %
> title.encode("utf8"))
> fixes the issue.
>
> Tim
>   
Well, thanks a lot Tim, the error occured also on the french parsing. 
And as I told before, I'm a Pythonbeginner, so the only way I found to 
avoid this was to ... remove the line, and keep the counter alive.

For information I finished rendering wfrench Wikipedia dump :
1 140 000 articles
61 false articles
parsing took 12 hours
rendering 18  hours

The image weights 1,6 Gigs, but only in one file (don't sure it is normal ?)
All this was done on a QuadCore 2.2Ghz, 2Go Ram.
I have to notice that the disk is NTFS, perhaps a ext4 would be better 
(my mount process dramatically worked during those processes).

The image is readable by the emulator, but as it was finish while I'm at 
the office, I could only try with a deported X display  (through SSH)....
I Will post later (at home) when the file will be in the reader. Some 
friends will host the file, and I'm working on a automated script 
(weekly french image ?)

See you tonight

Thomas from "Wikilecteur" Team




More information about the community mailing list