Illume dictionary for Dutch (Nederlands)

Carsten Haitzler (The Rasterman) raster at rasterman.com
Fri Nov 28 01:02:23 CET 2008


On Fri, 28 Nov 2008 00:20:38 +0100 Pander <pander at users.sourceforge.net>
babbled:

> Is it possible to put comments in the .dic file? If so, in what format?
> E.g. only the first couple of lines which start with a #.

no. it doesnt support comments.

> Carsten Haitzler (The Rasterman) wrote:
> > On Thu, 20 Nov 2008 10:55:02 +0100 (CET) "Pander"
> > <pander at users.sourceforge.net> babbled:
> > 
> > any dictionary should not care about gsm encodings. it should be just a utf8
> > dictionary file. it is the job of the sms app to convert normal utf8
> > unicode to whatever encoding used by the network, and back. :)
> > 
> >> Small correction to my text:
> >>
> >> "Note that more characters" must be "Note that certain special characters
> >> are in GSM 03.38 which are not in extended ASCII"
> >>
> >>
> >> Nevertheless, one complete utf-8 dictionary could be used by most
> >> applications, also SMS. The conversion I do for GSM 03.38 could also be
> >> done later just before sending the SMS.
> >>
> >> On Thu, November 20, 2008 10:44, Rui Miguel Silva Seabra wrote:
> >>> I have no idea... I might only make a new version with utf-8 encoded
> >>> characters. :)
> >>>
> >>>
> >>> On Thu, Nov 20, 2008 at 10:40:46AM +0100, Pander wrote:
> >>>> Hi all,
> >>>>
> >>>> I intent to generate the following:
> >>>> - a full list utf-8 (for 8 bit SMS and regular use, default)
> >>>> - b full list utf-8 GSM 03.38[1] (for 7 bit SMS)
> >>>> - c truncated list utf-8 (for 8 bit SMS and regular use)
> >>>> - d truncated list utf-8 GSM 03.38[1] (for 7 bit SMS, default)
> >>>>
> >>>> [1] These utf-8 characters in this list are within the 7-bit range of
> >>>> GSM
> >>>> 03.38, see http://en.wikipedia.org/wiki/Short_message_service#GSM Note
> >>>> that more characters
> >>>>
> >>>> a and b will both have 250,000 words
> >>>> b will be conversion, remapping and normalisation of a
> >>>> c and d are truncations and normalisation of respectively a and b
> >>>>
> >>>> For utf-16, a simple conversion of the utf-8 files can be used, but I'll
> >>>> leave this for now. This could result in two extra files.
> >>>>
> >>>> Note that nor extended nor non-extended ASCII is available. Is this
> >>>> desirable? This can result in four extra files.
> >>>>
> >>>> So, I can come up with 10 different files. Which are according to you
> >>>> the
> >>>> most useful?
> >>>>
> >>>> Regards,
> >>>>
> >>>> Pander
> >>>>
> >>>> On Thu, November 20, 2008 08:58, Rui Miguel Silva Seabra wrote:
> >>>>> On Thu, Nov 20, 2008 at 03:02:41AM +0100, "Marco Trevisan
> >>>> (Treviño)"
> >>>>> wrote:
> >>>>>> Pander wrote:
> >>>>>>> Of course this particular word list is very long and contains about
> >>>>>>> 250,000 words and has a typical loooong tail. Many words or
> >>>>>> compositions
> >>>>>>> or occur seldom in average day use.
> >>>>>>>
> >>>>>>> What would be a good cut off point in number of words, also in
> >>>> terms
> >>>>>> of
> >>>>>>> performance?
> >>>>>>>
> >>>>>>> The Portuguese list contains 56,609 words. Is this workable? How
> >>>> many
> >>>>>>> does the English contain?
> >>>>>> The Italian one can count also 500'000 words (to be short), but I can
> >>>>>> get a well working dictionary only using a smaller one (with about
> >>>>>> 150'000 words that I've taken counting its google popularity).
> >>>>>>
> >>>>>> Btw I've written more complete posts about this on the list...
> >>>>> Well, since my basis was based on a million words taken from the most
> >>>>> printed daily newspaper in Portugal (I didn't count but still I
> >>>> removed
> >>>>> a lot of non words like numbers, etc...) already with frequency data,
> >>>> my
> >>>>> job was so much easier... :)
> >>>>>
> >>>>> As for writing SMS/text messages... I haven't found yet a word that
> >>>>> wasn't there (in fact my problem is that it so often is the first of
> >>>>> several matches so I have to use the menu on the left) but I must
> >>>>> confess to not be one of those whose primary use of the phone is
> >>>>> SMS/text!
> >>>>>
> >>>>> Rui
> >>>>>
> >>>>> --
> >>>>> Frink!
> >>>>> Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD
> >>>> 3174
> >>>>> + No matter how much you do, you never do enough -- unknown
> >>>>> + Whatever you do will be insignificant,
> >>>>> | but it is very important that you do it -- Gandhi
> >>>>> + So let's do it...?
> >>>>>
> >>>>> _______________________________________________
> >>>>> Openmoko community mailing list
> >>>>> community at lists.openmoko.org
> >>>>> http://lists.openmoko.org/mailman/listinfo/community
> >>>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Openmoko community mailing list
> >>>> community at lists.openmoko.org
> >>>> http://lists.openmoko.org/mailman/listinfo/community
> >>> --
> >>> You are what you see.
> >>> Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD 3174
> >>> + No matter how much you do, you never do enough -- unknown
> >>> + Whatever you do will be insignificant,
> >>> | but it is very important that you do it -- Gandhi
> >>> + So let's do it...?
> >>>
> >>> _______________________________________________
> >>> Openmoko community mailing list
> >>> community at lists.openmoko.org
> >>> http://lists.openmoko.org/mailman/listinfo/community
> >>>
> >>
> >>
> >> _______________________________________________
> >> Openmoko community mailing list
> >> community at lists.openmoko.org
> >> http://lists.openmoko.org/mailman/listinfo/community
> >>
> > 
> > 
> 
> 
> _______________________________________________
> Openmoko community mailing list
> community at lists.openmoko.org
> http://lists.openmoko.org/mailman/listinfo/community


-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    raster at rasterman.com





More information about the community mailing list