Illume dictionary for Dutch (Nederlands)
pander at users.sourceforge.net
Thu Nov 20 10:40:46 CET 2008
I intent to generate the following:
- a full list utf-8 (for 8 bit SMS and regular use, default)
- b full list utf-8 GSM 03.38 (for 7 bit SMS)
- c truncated list utf-8 (for 8 bit SMS and regular use)
- d truncated list utf-8 GSM 03.38 (for 7 bit SMS, default)
 These utf-8 characters in this list are within the 7-bit range of GSM
03.38, see http://en.wikipedia.org/wiki/Short_message_service#GSM Note
that more characters
a and b will both have 250,000 words
b will be conversion, remapping and normalisation of a
c and d are truncations and normalisation of respectively a and b
For utf-16, a simple conversion of the utf-8 files can be used, but I'll
leave this for now. This could result in two extra files.
Note that nor extended nor non-extended ASCII is available. Is this
desirable? This can result in four extra files.
So, I can come up with 10 different files. Which are according to you the
On Thu, November 20, 2008 08:58, Rui Miguel Silva Seabra wrote:
> On Thu, Nov 20, 2008 at 03:02:41AM +0100, "Marco Trevisan (TreviÃ±o)"
>> Pander wrote:
>> > Of course this particular word list is very long and contains about
>> > 250,000 words and has a typical loooong tail. Many words or
>> > or occur seldom in average day use.
>> > What would be a good cut off point in number of words, also in terms
>> > performance?
>> > The Portuguese list contains 56,609 words. Is this workable? How many
>> > does the English contain?
>> The Italian one can count also 500'000 words (to be short), but I can
>> get a well working dictionary only using a smaller one (with about
>> 150'000 words that I've taken counting its google popularity).
>> Btw I've written more complete posts about this on the list...
> Well, since my basis was based on a million words taken from the most
> printed daily newspaper in Portugal (I didn't count but still I removed
> a lot of non words like numbers, etc...) already with frequency data, my
> job was so much easier... :)
> As for writing SMS/text messages... I haven't found yet a word that
> wasn't there (in fact my problem is that it so often is the first of
> several matches so I have to use the menu on the left) but I must
> confess to not be one of those whose primary use of the phone is
> Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD 3174
> + No matter how much you do, you never do enough -- unknown
> + Whatever you do will be insignificant,
> | but it is very important that you do it -- Gandhi
> + So let's do it...?
> Openmoko community mailing list
> community at lists.openmoko.org
More information about the community