GSoC Project Status Update 04: Speech Recognition in Openmoko

saurabh gupta saurabhgupta1403 at
Mon Jun 30 00:05:39 CEST 2008

On Mon, Jun 30, 2008 at 3:14 AM, Asheesh Laroia <openmoko at>

> On Mon, 30 Jun 2008, saurabh gupta wrote:
> > You have identified the correct and justified problem in training. I
> thought
> > to handle it in this way. Whenever a user runs this application, the GUI
> for
> > speech recognition will ask it to go in training or recognition mode. In
> > training mode, after uttering a word, the GUI will again ask the user to
> > utter the same word again and so on. The user will have to feed the
> training
> > word three times (I have assumed that constant to be three) to fully
> create
> > a word in the vocabulary. If the user terminates the application or
> > mishandles it before three sequences, the application will not save the
> > word.
> What do you mean mishandles?

Mishandling the application meant that the user didn't train the word fully
at the time of training.

> > However there is no easy way to detect the mishandling since if the user
> > neither terminates the application nor speaks training word again,
> > application can pick the louder noise thinking it as the training word
> > and wrong result will be produced. This is always a bigger problem in
> > speech related applications since environment noise as well as end point
> > detection is quite difficult in real world scenario.
> You are speaking of the "training mode", which I agree is important.
> I am instead talking about making the normal use mode a training mode, in
> a way, to non-intrusively improve accuracy.
> At least, that's my guess - I think it would be worthwhile to run some
> experiments to see if it's really true!  But if you can explain to me why
> this idea is invalid from the start than maybe we can skip the
> experiments. (-;

Correct me if I am not getting exactly what you really meant to say.

As you said, to use the normal mode as a training mode, then I see a problem
in it. Suppose a user trains a word e.g. "hello" insufficiently, then there
are chances that the application recognizes a wrong or mispronounced word as
this word (i.e. "hello") because of a poor HMM model. Now if it uses this
new word to improve the previous trained model (for the word "hello"), then
it will turn out to be a completely wrong trained word since the word which
is recognized is itself not correct. This can be solved to make it a manual
procedure, that is, when the application recognizes a word then it asks the
user if it was a correct word or not. If it is correct then it will use that
to improve the previous model since the model was not fully trained. But
again this will require the use of a lot of memory to store the word and
much processing.
Also as this application implements vector quantization so a codebook of
each word is to be prepared during training. The best way to prepare a
proper codebook is to have enough training vectors, which should be used
together to create this codebook.

> -- Asheesh.
> --
> Clear the laundromat!!  This whirl-o-matic just had a nuclear meltdown!!
> _______________________________________________
> Openmoko community mailing list
> community at

Saurabh Gupta
Electronics and Communication Engg.
NSIT,New Delhi, India
I blog here:
-------------- next part --------------
An HTML attachment was scrubbed...

More information about the community mailing list