speech -> text on FR?

Mon Jun 16 20:32:38 CEST 2008

On Mon, Jun 16, 2008 at 6:04 AM, Dan Staley <dlstal2 at uky.edu> wrote:

> I actually just interfaced with the Sphinx project at one of the
> research positions I hold.  It is actually a very well written interface
> (for the most part...there were a few things poorly documented and/or
> implemented) But anyway, I found the java version of the project (Sphinx
> 4 http://cmusphinx.sourceforge.net/sphinx4/ ) to be pretty easy to
> build/interface with.

Its great Dan that u got sphinx packages worked for you. I tried it but got
some error. However now a days i was concentrating on understanding their
some libraries and trying to write my own optimized codes. I will definitely
ping you in case of any help.

>
>
> The benefit of using the HMMs and models and methods that Sphinx
> implements is that anyone in their programs should be able to specify a
> grammar (similar to a simplified regex) that they want to be recognized
> and then the interpreter should be able to be user independant...meaning
> anyone can speak the phrase into the phone and get the desired output.
> Speech training wouldn't be required.  I found that once you set it up
> correctly, the Sphinx engine is very powerful, and usually identifies
> the spoken words no matter who says them (we found it even seemed to
> work decently well with a variety different accents).

This is good and in fact I will also try to implement this in the model. I
will get the HMM models of words by training them from different speakers.
This thing i have covered in my Design Document.

Thanks in advance...

>
> -Dan Staley
>
> On Sun, 2008-06-15 at 19:07 -0400, Ajit Natarajan wrote:
> > Hello,
> >
> > I know nothing about speech recognition, so if the following won't work,
> > please let me know (gently :) ).
> >
> > I understand that there is a project called Sphinx in CMU which attempts
> > speech recognition.  It seems pretty complex.  I couldn't get it to work
> > on my Linux desktop.  I'm not sure if it would work on an FR since it
> > may need a lot of CPU horsepower and memory.
> >
> > I see a speech project on the OM projects page.  To me, it seems like
> > the project is attempting command recognition, e.g., voice dialing.
> > However, it would be great if the FR can function as a rudimentary
> > dictation machine, i.e., allow the user to speak and convert to text.
> >
> > Perhaps the following may work.
> >
> > 1. Ask the user to speak some standard words.  Record the speech and
> >     establish the mapping from the words to the corresponding speech.
> >     It may even be good to maintain separate databases for different
> >     purposes, e.g., one for UNIX command lines, one for emails, and a
> >     third for technical documents.
> >
> > 2. The speech recognizer then functions similar to a keyboard in that it
> >     converts speech to text which it then enters into the application
> >     that has focus.
> >
> > 3. The user must speak word by word.  The speech recognizer finds the
> >     closest match for the speech my checking against the recordings made
> >     in step 1 (and step 4).  The user may need to set the database from
> >     which the match must be made.
> >
> > 4. If there is no close match, or if the user is unhappy with the
> >     selection made in step 3, the user can type in the correct word.  A
> >     new record can be added to the appropriate database.
> >
> > The process may be frustrating for the user at first, but over time, the
> > speech recognition should become better and better.
> >
> > The separate databases may be needed, for example, because the word
> > period should usually translate to the symbol `.' except when writing
> > about time periods when it should translate to the word `period'.
> >
> > I do not know what the storage requirements would be to maintain this
> > database.  I do not know if the closest match algorithm in step 3 is
> > even possible.  But if we could get a good dictation engine, that would
> > be a killer app, in my opinion.  No more typing!  No more carpal tunnel
> > injuries.  No more having to worry about small on screen keyboards that
> > challenge finger typing.
> >
> > Thanks.
> >
> > Ajit
> >
> >
>
>
> _______________________________________________
> Openmoko community mailing list
> community at lists.openmoko.org
> http://lists.openmoko.org/mailman/listinfo/community
>

-- 
Saurabh Gupta
Electronics and Communication Engg.
NSIT,New Delhi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.openmoko.org/pipermail/community/attachments/20080617/26f941db/attachment.htm