Illume dictionary for Dutch (Nederlands)

Carsten Haitzler (The Rasterman) raster at rasterman.com
Thu Nov 20 10:59:53 CET 2008


On Thu, 20 Nov 2008 10:55:02 +0100 (CET) "Pander"
<pander at users.sourceforge.net> babbled:

any dictionary should not care about gsm encodings. it should be just a utf8
dictionary file. it is the job of the sms app to convert normal utf8 unicode to
whatever encoding used by the network, and back. :)

> Small correction to my text:
> 
> "Note that more characters" must be "Note that certain special characters
> are in GSM 03.38 which are not in extended ASCII"
> 
> 
> Nevertheless, one complete utf-8 dictionary could be used by most
> applications, also SMS. The conversion I do for GSM 03.38 could also be
> done later just before sending the SMS.
> 
> On Thu, November 20, 2008 10:44, Rui Miguel Silva Seabra wrote:
> > I have no idea... I might only make a new version with utf-8 encoded
> > characters. :)
> >
> >
> > On Thu, Nov 20, 2008 at 10:40:46AM +0100, Pander wrote:
> >> Hi all,
> >>
> >> I intent to generate the following:
> >> - a full list utf-8 (for 8 bit SMS and regular use, default)
> >> - b full list utf-8 GSM 03.38[1] (for 7 bit SMS)
> >> - c truncated list utf-8 (for 8 bit SMS and regular use)
> >> - d truncated list utf-8 GSM 03.38[1] (for 7 bit SMS, default)
> >>
> >> [1] These utf-8 characters in this list are within the 7-bit range of
> >> GSM
> >> 03.38, see http://en.wikipedia.org/wiki/Short_message_service#GSM Note
> >> that more characters
> >>
> >> a and b will both have 250,000 words
> >> b will be conversion, remapping and normalisation of a
> >> c and d are truncations and normalisation of respectively a and b
> >>
> >> For utf-16, a simple conversion of the utf-8 files can be used, but I'll
> >> leave this for now. This could result in two extra files.
> >>
> >> Note that nor extended nor non-extended ASCII is available. Is this
> >> desirable? This can result in four extra files.
> >>
> >> So, I can come up with 10 different files. Which are according to you
> >> the
> >> most useful?
> >>
> >> Regards,
> >>
> >> Pander
> >>
> >> On Thu, November 20, 2008 08:58, Rui Miguel Silva Seabra wrote:
> >> > On Thu, Nov 20, 2008 at 03:02:41AM +0100, "Marco Trevisan
> >> (Treviño)"
> >> > wrote:
> >> >> Pander wrote:
> >> >> > Of course this particular word list is very long and contains about
> >> >> > 250,000 words and has a typical loooong tail. Many words or
> >> >> compositions
> >> >> > or occur seldom in average day use.
> >> >> >
> >> >> > What would be a good cut off point in number of words, also in
> >> terms
> >> >> of
> >> >> > performance?
> >> >> >
> >> >> > The Portuguese list contains 56,609 words. Is this workable? How
> >> many
> >> >> > does the English contain?
> >> >>
> >> >> The Italian one can count also 500'000 words (to be short), but I can
> >> >> get a well working dictionary only using a smaller one (with about
> >> >> 150'000 words that I've taken counting its google popularity).
> >> >>
> >> >> Btw I've written more complete posts about this on the list...
> >> >
> >> > Well, since my basis was based on a million words taken from the most
> >> > printed daily newspaper in Portugal (I didn't count but still I
> >> removed
> >> > a lot of non words like numbers, etc...) already with frequency data,
> >> my
> >> > job was so much easier... :)
> >> >
> >> > As for writing SMS/text messages... I haven't found yet a word that
> >> > wasn't there (in fact my problem is that it so often is the first of
> >> > several matches so I have to use the menu on the left) but I must
> >> > confess to not be one of those whose primary use of the phone is
> >> > SMS/text!
> >> >
> >> > Rui
> >> >
> >> > --
> >> > Frink!
> >> > Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD
> >> 3174
> >> > + No matter how much you do, you never do enough -- unknown
> >> > + Whatever you do will be insignificant,
> >> > | but it is very important that you do it -- Gandhi
> >> > + So let's do it...?
> >> >
> >> > _______________________________________________
> >> > Openmoko community mailing list
> >> > community at lists.openmoko.org
> >> > http://lists.openmoko.org/mailman/listinfo/community
> >> >
> >>
> >>
> >>
> >> _______________________________________________
> >> Openmoko community mailing list
> >> community at lists.openmoko.org
> >> http://lists.openmoko.org/mailman/listinfo/community
> >
> > --
> > You are what you see.
> > Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD 3174
> > + No matter how much you do, you never do enough -- unknown
> > + Whatever you do will be insignificant,
> > | but it is very important that you do it -- Gandhi
> > + So let's do it...?
> >
> > _______________________________________________
> > Openmoko community mailing list
> > community at lists.openmoko.org
> > http://lists.openmoko.org/mailman/listinfo/community
> >
> 
> 
> 
> _______________________________________________
> Openmoko community mailing list
> community at lists.openmoko.org
> http://lists.openmoko.org/mailman/listinfo/community
> 


-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    raster at rasterman.com





More information about the community mailing list