New Rasterman Image...
steve at openmoko.com
Mon Oct 6 03:13:57 CEST 2008
Spelling issue Noted.
RE: first and last names: I suppose it was worth the try?; Was it
primarily the last names (US is a nation of immigrants) that gave the
most trouble? or first names as well? The last names file is huge in
comparision to the first name files, FWIW. and
Looking at the list of names and graphing frequency versus rank, it
looks like it follows a power law
distribution (http://en.wikipedia.org/wiki/Power_law) to get to 80% of
each gender you need : 313 male names, 811 female names). The top 100
male names is 60% of all male names. The top 100 female names is 43% of
all female names. last names, the top 100 are only 18% of the
population. FWIW, word frequency is known to follow a power law
distribution, so I'm not surprised that proper names appear to follow a
Ok, no more backseat driving.
Hope all is well
Carsten Haitzler (The Rasterman) wrote:
> On Sun, 05 Oct 2008 09:28:19 -0700 Steve Mosher <steve at openmoko.com> babbled:
>> I don't think I was exactly clear on using the email database.
>> What I was thinking is that you could run a simple program on
>> your desktop computer and do a frequency analysis of the mails
>> ( and contacts) that you have on your PC. Same with chat logs.
>> Then merge that (somehow) into the database on the phone,
>> say in the personal.
> i know - i don't think that's a great idea. it'd be full of typos! :) if it's
> going to pollute the dictionary with typos... they should be the user's own! :)
> btw - i added the first/last names in and man did completion quality go down.
> too many names. a lot of them can be strange and look nonsensical when you
> actually use the dict with it all in - it suggests lots of garbage (so now
> suggestion lists are huge and most of them are entire junk as they are some
> bizarre name).
>> Daniel Willmann wrote:
>>> On Fri, 3 Oct 2008 09:34:28 +1000
>>> Carsten Haitzler (The Rasterman) <raster at rasterman.com> wrote:
>>>> On Thu, 02 Oct 2008 09:37:58 -0700 Steve Mosher <steve at openmoko.com>
>>>>> I used to have a bunch of them when I was doing a NLG ( natural
>>>>> Language generation) pet project. I sent you a link to US names as
>>>>> well. from the US census.
>>>>> For personal dictionaries, people could just run a simple word
>>>>> frequency analysis on their archived email,( there are GPL
>>>>> programs that do this I believe, but its dead easy to write
>>>>> yourself) and import their email contacts into the database.
>>>> in fact that is the idea of the "3" dictionaries the keyboard has. it
>>>> has "system" (which is base language - eg english), personal (any
>>>> words they type in at all go in here - they inherit frequency they
>>>> had before but now gain in count as they get used more), and..
>>>> "generated" dictionary
>>>> - ~/.e/e/dicts-dynamic/data.dic - this file is expected to be
>>>> regularly generated from the users sms's, emails, contact list etc.
>>>> containing words from their every-day activity - so that friend with
>>>> a strange name... gets their name into the dictionary pool this
>>>> way. :)
>>> Is there a way to tie into bash_completion when we are on a terminal?
>>> That should be fun :-)
>>> Daniel Willmann
>>> Openmoko community mailing list
>>> community at lists.openmoko.org
>> Openmoko community mailing list
>> community at lists.openmoko.org
More information about the community