New Rasterman Image...

Steve Mosher steve at openmoko.com
Fri Oct 3 00:54:12 CEST 2008


  I suppose I could compile a list of the academic books done on SMS,
contact the authors, and see if they are willing to cough up the data.
For oneline dictionaries with no frequency data you have several 
dictionaries, like

http://www.netlingo.com/emailsh.cfm

Have not found any downloadable lists with freq data.

Joel Newkirk wrote:
> On Thu, 02 Oct 2008 09:37:58 -0700, Steve Mosher <steve at openmoko.com>
> wrote:
>> I used to have a bunch of them when I was doing a NLG ( natural Language
>> generation) pet project. I sent you a link to US names as well. from the
>> US census.
>> For personal dictionaries, people could just run a simple word frequency
>> analysis   on their archived email,( there are GPL programs that do this
>> I believe, but its dead easy to write yourself) and import their email
>> contacts into the database.
>> ( speeling mistakes might require some work, like the one I just did)
>> If you had access to archived chats or chat logs you could pick up
>> things like LOL, PITA, etc, or logs of SMS. There are some studies on
>> word frequency in SMS but I havent found a online resource.
>>
> 
> I was thinking of a quickie program to scan the ~/.bash_history on my
> desktop and generate frequency data for command names... ;)  Take
> everything up to the first space, strip off any path, count and insert in
> dict.  (unfortunately the command history on the FR is by default VERY VERY
> short, I've not investigated how to extend it)
> 
> More useful would be if someone can scare up a thorough list of common SMS
> shorthand, like 'cul8r' and what-not.  (don't know how that'd work out
> though, with numeric characters embedded)
> 
> j
> 
> 




More information about the community mailing list