Illume dictionary for Dutch (Nederlands)

Rui Miguel Silva Seabra rms at 1407.org
Thu Nov 20 10:44:10 CET 2008


I have no idea... I might only make a new version with utf-8 encoded
characters. :)


On Thu, Nov 20, 2008 at 10:40:46AM +0100, Pander wrote:
> Hi all,
> 
> I intent to generate the following:
> - a full list utf-8 (for 8 bit SMS and regular use, default)
> - b full list utf-8 GSM 03.38[1] (for 7 bit SMS)
> - c truncated list utf-8 (for 8 bit SMS and regular use)
> - d truncated list utf-8 GSM 03.38[1] (for 7 bit SMS, default)
> 
> [1] These utf-8 characters in this list are within the 7-bit range of GSM
> 03.38, see http://en.wikipedia.org/wiki/Short_message_service#GSM Note
> that more characters
> 
> a and b will both have 250,000 words
> b will be conversion, remapping and normalisation of a
> c and d are truncations and normalisation of respectively a and b
> 
> For utf-16, a simple conversion of the utf-8 files can be used, but I'll
> leave this for now. This could result in two extra files.
> 
> Note that nor extended nor non-extended ASCII is available. Is this
> desirable? This can result in four extra files.
> 
> So, I can come up with 10 different files. Which are according to you the
> most useful?
> 
> Regards,
> 
> Pander
> 
> On Thu, November 20, 2008 08:58, Rui Miguel Silva Seabra wrote:
> > On Thu, Nov 20, 2008 at 03:02:41AM +0100, "Marco Trevisan (Treviño)"
> > wrote:
> >> Pander wrote:
> >> > Of course this particular word list is very long and contains about
> >> > 250,000 words and has a typical loooong tail. Many words or
> >> compositions
> >> > or occur seldom in average day use.
> >> >
> >> > What would be a good cut off point in number of words, also in terms
> >> of
> >> > performance?
> >> >
> >> > The Portuguese list contains 56,609 words. Is this workable? How many
> >> > does the English contain?
> >>
> >> The Italian one can count also 500'000 words (to be short), but I can
> >> get a well working dictionary only using a smaller one (with about
> >> 150'000 words that I've taken counting its google popularity).
> >>
> >> Btw I've written more complete posts about this on the list...
> >
> > Well, since my basis was based on a million words taken from the most
> > printed daily newspaper in Portugal (I didn't count but still I removed
> > a lot of non words like numbers, etc...) already with frequency data, my
> > job was so much easier... :)
> >
> > As for writing SMS/text messages... I haven't found yet a word that
> > wasn't there (in fact my problem is that it so often is the first of
> > several matches so I have to use the menu on the left) but I must
> > confess to not be one of those whose primary use of the phone is
> > SMS/text!
> >
> > Rui
> >
> > --
> > Frink!
> > Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD 3174
> > + No matter how much you do, you never do enough -- unknown
> > + Whatever you do will be insignificant,
> > | but it is very important that you do it -- Gandhi
> > + So let's do it...?
> >
> > _______________________________________________
> > Openmoko community mailing list
> > community at lists.openmoko.org
> > http://lists.openmoko.org/mailman/listinfo/community
> >
> 
> 
> 
> _______________________________________________
> Openmoko community mailing list
> community at lists.openmoko.org
> http://lists.openmoko.org/mailman/listinfo/community

-- 
You are what you see.
Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD 3174
+ No matter how much you do, you never do enough -- unknown
+ Whatever you do will be insignificant,
| but it is very important that you do it -- Gandhi
+ So let's do it...?




More information about the community mailing list