speech -> text on FR?
saurabhgupta1403 at gmail.com
Mon Jun 16 20:06:05 CEST 2008
On Mon, Jun 16, 2008 at 4:37 AM, Ajit Natarajan <ajitk at email.com> wrote:
> I know nothing about speech recognition, so if the following won't work,
> please let me know (gently :) ).
> I understand that there is a project called Sphinx in CMU which attempts
> speech recognition. It seems pretty complex. I couldn't get it to work
> on my Linux desktop. I'm not sure if it would work on an FR since it
> may need a lot of CPU horsepower and memory.
Indeed the sphinx packages are very well written but they were compiled with
the aim of desktop processors. With a lot of data management and large
storage, it implements a lot floating point calculations and algorithms. On
FR like devices, codes has to be properly adopted and modified. In fact this
is the very aim of the GSoC Speech Recognition Project to prepare a speech
recognition engine which can run on a processor with 256 or maximum 400MHz
without floating point hardware.
> I see a speech project on the OM projects page. To me, it seems like
> the project is attempting command recognition, e.g., voice dialing.
> However, it would be great if the FR can function as a rudimentary
> dictation machine, i.e., allow the user to speak and convert to text.
yes, once the speech recognition engine is ready then a lot of applications
can be built on it. The basic aim of speech recognition will be to identify
the work spoken by comparing it with the HMM models of the stored words
dictionary and calculating the maximum probability. Once a word has been
detected, any API can be called corresponding to that word.
> Perhaps the following may work.
> 1. Ask the user to speak some standard words. Record the speech and
> establish the mapping from the words to the corresponding speech.
> It may even be good to maintain separate databases for different
> purposes, e.g., one for UNIX command lines, one for emails, and a
> third for technical documents.
> 2. The speech recognizer then functions similar to a keyboard in that it
> converts speech to text which it then enters into the application
> that has focus.
> 3. The user must speak word by word. The speech recognizer finds the
> closest match for the speech my checking against the recordings made
> in step 1 (and step 4). The user may need to set the database from
> which the match must be made.
> 4. If there is no close match, or if the user is unhappy with the
> selection made in step 3, the user can type in the correct word. A
> new record can be added to the appropriate database.
> The process may be frustrating for the user at first, but over time, the
> speech recognition should become better and better.
> The separate databases may be needed, for example, because the word
> period should usually translate to the symbol `.' except when writing
> about time periods when it should translate to the word `period'.
> I do not know what the storage requirements would be to maintain this
> database. I do not know if the closest match algorithm in step 3 is
> even possible. But if we could get a good dictation engine, that would
> be a killer app, in my opinion. No more typing! No more carpal tunnel
> injuries. No more having to worry about small on screen keyboards that
> challenge finger typing.
It would be certainly a great application. But at the moment I am not very
sure about the capability of free runner and the applications which it can
handle. May be in future more and more betterment can be introduced in the
> Openmoko community mailing list
> community at lists.openmoko.org
Electronics and Communication Engg.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the community