speech -> text on FR?

Mon Jun 16 01:07:21 CEST 2008

Hello,

I know nothing about speech recognition, so if the following won't work, 
please let me know (gently :) ).

I understand that there is a project called Sphinx in CMU which attempts 
speech recognition.  It seems pretty complex.  I couldn't get it to work 
on my Linux desktop.  I'm not sure if it would work on an FR since it 
may need a lot of CPU horsepower and memory.

I see a speech project on the OM projects page.  To me, it seems like 
the project is attempting command recognition, e.g., voice dialing. 
However, it would be great if the FR can function as a rudimentary 
dictation machine, i.e., allow the user to speak and convert to text.

Perhaps the following may work.

1. Ask the user to speak some standard words.  Record the speech and
    establish the mapping from the words to the corresponding speech.
    It may even be good to maintain separate databases for different
    purposes, e.g., one for UNIX command lines, one for emails, and a
    third for technical documents.

2. The speech recognizer then functions similar to a keyboard in that it
    converts speech to text which it then enters into the application
    that has focus.

3. The user must speak word by word.  The speech recognizer finds the
    closest match for the speech my checking against the recordings made
    in step 1 (and step 4).  The user may need to set the database from
    which the match must be made.

4. If there is no close match, or if the user is unhappy with the
    selection made in step 3, the user can type in the correct word.  A
    new record can be added to the appropriate database.

The process may be frustrating for the user at first, but over time, the 
speech recognition should become better and better.

The separate databases may be needed, for example, because the word 
period should usually translate to the symbol `.' except when writing 
about time periods when it should translate to the word `period'.

I do not know what the storage requirements would be to maintain this 
database.  I do not know if the closest match algorithm in step 3 is 
even possible.  But if we could get a good dictation engine, that would 
be a killer app, in my opinion.  No more typing!  No more carpal tunnel 
injuries.  No more having to worry about small on screen keyboards that 
challenge finger typing.

Thanks.

Ajit