Illume dictionary for Dutch (Nederlands)
pander at users.sourceforge.net
Thu Nov 20 10:55:02 CET 2008
Small correction to my text:
"Note that more characters" must be "Note that certain special characters
are in GSM 03.38 which are not in extended ASCII"
Nevertheless, one complete utf-8 dictionary could be used by most
applications, also SMS. The conversion I do for GSM 03.38 could also be
done later just before sending the SMS.
On Thu, November 20, 2008 10:44, Rui Miguel Silva Seabra wrote:
> I have no idea... I might only make a new version with utf-8 encoded
> characters. :)
> On Thu, Nov 20, 2008 at 10:40:46AM +0100, Pander wrote:
>> Hi all,
>> I intent to generate the following:
>> - a full list utf-8 (for 8 bit SMS and regular use, default)
>> - b full list utf-8 GSM 03.38 (for 7 bit SMS)
>> - c truncated list utf-8 (for 8 bit SMS and regular use)
>> - d truncated list utf-8 GSM 03.38 (for 7 bit SMS, default)
>>  These utf-8 characters in this list are within the 7-bit range of
>> 03.38, see http://en.wikipedia.org/wiki/Short_message_service#GSM Note
>> that more characters
>> a and b will both have 250,000 words
>> b will be conversion, remapping and normalisation of a
>> c and d are truncations and normalisation of respectively a and b
>> For utf-16, a simple conversion of the utf-8 files can be used, but I'll
>> leave this for now. This could result in two extra files.
>> Note that nor extended nor non-extended ASCII is available. Is this
>> desirable? This can result in four extra files.
>> So, I can come up with 10 different files. Which are according to you
>> most useful?
>> On Thu, November 20, 2008 08:58, Rui Miguel Silva Seabra wrote:
>> > On Thu, Nov 20, 2008 at 03:02:41AM +0100, "Marco Trevisan
>> > wrote:
>> >> Pander wrote:
>> >> > Of course this particular word list is very long and contains about
>> >> > 250,000 words and has a typical loooong tail. Many words or
>> >> compositions
>> >> > or occur seldom in average day use.
>> >> >
>> >> > What would be a good cut off point in number of words, also in
>> >> of
>> >> > performance?
>> >> >
>> >> > The Portuguese list contains 56,609 words. Is this workable? How
>> >> > does the English contain?
>> >> The Italian one can count also 500'000 words (to be short), but I can
>> >> get a well working dictionary only using a smaller one (with about
>> >> 150'000 words that I've taken counting its google popularity).
>> >> Btw I've written more complete posts about this on the list...
>> > Well, since my basis was based on a million words taken from the most
>> > printed daily newspaper in Portugal (I didn't count but still I
>> > a lot of non words like numbers, etc...) already with frequency data,
>> > job was so much easier... :)
>> > As for writing SMS/text messages... I haven't found yet a word that
>> > wasn't there (in fact my problem is that it so often is the first of
>> > several matches so I have to use the menu on the left) but I must
>> > confess to not be one of those whose primary use of the phone is
>> > SMS/text!
>> > Rui
>> > --
>> > Frink!
>> > Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD
>> > + No matter how much you do, you never do enough -- unknown
>> > + Whatever you do will be insignificant,
>> > | but it is very important that you do it -- Gandhi
>> > + So let's do it...?
>> > _______________________________________________
>> > Openmoko community mailing list
>> > community at lists.openmoko.org
>> > http://lists.openmoko.org/mailman/listinfo/community
>> Openmoko community mailing list
>> community at lists.openmoko.org
> You are what you see.
> Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD 3174
> + No matter how much you do, you never do enough -- unknown
> + Whatever you do will be insignificant,
> | but it is very important that you do it -- Gandhi
> + So let's do it...?
> Openmoko community mailing list
> community at lists.openmoko.org
More information about the community