GSoC Project Status Update 04: Speech Recognition in Openmoko

Asheesh Laroia openmoko at asheesh.org
Sun Jun 29 23:44:48 CEST 2008


On Mon, 30 Jun 2008, saurabh gupta wrote:

> You have identified the correct and justified problem in training. I thought
> to handle it in this way. Whenever a user runs this application, the GUI for
> speech recognition will ask it to go in training or recognition mode. In
> training mode, after uttering a word, the GUI will again ask the user to
> utter the same word again and so on. The user will have to feed the training
> word three times (I have assumed that constant to be three) to fully create
> a word in the vocabulary. If the user terminates the application or
> mishandles it before three sequences, the application will not save the
> word.

What do you mean mishandles?

> However there is no easy way to detect the mishandling since if the user 
> neither terminates the application nor speaks training word again, 
> application can pick the louder noise thinking it as the training word 
> and wrong result will be produced. This is always a bigger problem in 
> speech related applications since environment noise as well as end point 
> detection is quite difficult in real world scenario.

You are speaking of the "training mode", which I agree is important.

I am instead talking about making the normal use mode a training mode, in 
a way, to non-intrusively improve accuracy.

At least, that's my guess - I think it would be worthwhile to run some 
experiments to see if it's really true!  But if you can explain to me why 
this idea is invalid from the start than maybe we can skip the 
experiments. (-;

-- Asheesh.

-- 
Clear the laundromat!!  This whirl-o-matic just had a nuclear meltdown!!




More information about the community mailing list