[wikireader] Rudimentary support for several wikis

Tue Jan 19 16:33:58 CET 2010

I now registered to the list, since unregistered didn't seem to come
through and code at thewikireader doesn't seem to respond. Possibly you
might recive this message more than once.

-------- Original Message --------
Subject: [wikireader] Rudimentary support for several wikis
Date: Sun, 17 Jan 2010 00:56:53 +0000
From: Tom Bachmann <tb401 at cam.ac.uk>
To: community at lists.openmoko.org

Hello,

first of all, please CC me since I'm not registered to the list.

Over the last few days I have been hacking together rudimentary support
for displaying several collections of data (e.g. wikis of different
languages) on the wikireader. This code is not yet ready to be
incorporated into the main repository (I think), and furthermore I don't
actually know if it complies with your ideas of simplicity.

HOWEVER, I would be very grateful to everyone who can test the code. I
don't yet have a real wikireader (i.e. I have been developing this on
the simulator; I will get one after sorting out my budget...) and I'm
worried that there might be problems related to e.g. the scarcity of
memory on the reader (how much ram has it installed?).

Here is what I did: basically, articles are now identified by their
index and by their "collection id" (the highest four bits of the 32bit
identifier). The .pfx, .fnd, .hsh and .idx files are replicated per
collection. The .dat files are just numbered consecutively (and
identified by the usual way). So if you have e.g. two collections, say
english and french wikipedia, then your image layout may look like this:

pedia0.idx pedia0.hsh pedia0.pfx pedia0.fnd
pedia1.idx pedia1.hsh pedia1.pfx pedia1.fnd
pedia0.dat pedia1.dat pedia2.dat pedia3.dat pedia4.dat

You cannot tell what articles are in what .dat files (in principle
articles from several wikis could be mixed in one file), but in practice
we might have pedia0-2.dat corresponding to the collection 0 (english
wiki) and pedia{3,4}.dat corresponding to collection 1 (french wiki).

The searching functionality etc is implemented in the wiki-app, the user
inteface is rather non-existent. As a hack for testing I'm statically
configuring the system to use two collections (identified 0 and 1) and I
added an "invisible" button to the upper right corner of the search menu
to switch between the collections (in the simulator you will see a
message). There seem to be some bugs in that button but it's really for
testing only.

In addition to implementing all that in the wiki-app, I modified the
render, index and combine programs. All take a new --coll-number
argument to identify the collection being worked on, and
ArticleRender.py has a new --dat-number argument to specify the .dat
file (--number only identifies the block for the .idx file).

The good news is, you can just re-use your primary collection (the one
identified by 0). The bad news is, all extra collections have to be
re-built. For a quick test, try

make  DESTDIR=image WORKDIR=work \
       XML_FILES=xml-file-samples/japanese_architects.xml \
       COLL_NUMBER=1 DAT_NUMBER=${first unused index in .dat} iprch

make  DESTDIR=image WORKDIR=work install

and then copy everything to your wikireader (or try sim4).

Again, it would be *greatly* appreciated if someone could build a large
second collection and try two real-life datasets on the wikireader.

All the code is at gitorious (just because I am already registered there
but not yet on github). To get it, do

git clone git://gitorious.org/wikireader-ness/wikireader-ness.git

Let me know what you think!

Thanks,
Tom