New Rasterman Image...

Joel Newkirk freerunner at newkirk.us
Thu Oct 2 19:02:09 CEST 2008


On Thu, 02 Oct 2008 09:37:58 -0700, Steve Mosher <steve at openmoko.com>
wrote:
> I used to have a bunch of them when I was doing a NLG ( natural Language
> generation) pet project. I sent you a link to US names as well. from the
> US census.
> For personal dictionaries, people could just run a simple word frequency
> analysis   on their archived email,( there are GPL programs that do this
> I believe, but its dead easy to write yourself) and import their email
> contacts into the database.
> ( speeling mistakes might require some work, like the one I just did)
> If you had access to archived chats or chat logs you could pick up
> things like LOL, PITA, etc, or logs of SMS. There are some studies on
> word frequency in SMS but I havent found a online resource.
> 

I was thinking of a quickie program to scan the ~/.bash_history on my
desktop and generate frequency data for command names... ;)  Take
everything up to the first space, strip off any path, count and insert in
dict.  (unfortunately the command history on the FR is by default VERY VERY
short, I've not investigated how to extend it)

More useful would be if someone can scare up a thorough list of common SMS
shorthand, like 'cul8r' and what-not.  (don't know how that'd work out
though, with numeric characters embedded)

j






More information about the community mailing list