GSoC Project Status Update 04: Speech Recognition in Openmoko

Asheesh Laroia openmoko at asheesh.org
Sun Jun 29 23:11:02 CEST 2008


On Sun, 29 Jun 2008, saurabh gupta wrote:

>    Besides this, some modification is being done in the noise rejection
> part since it can degrade the performance wildly. I will use zero crossing
> rate and short term energy algorithm for end point detection. My model will
> also use left to right HMM. But for properly training HMM models one needs
> more than one training sequence. It means that in speaker dependent
> recognition, for training any word, one needs to utter the same word two or
> three times so that proper modeling of HMM parameters take place. When more
> than one training sequence is used for training, baum welch or K-means
> segmental method gives better modeling of HMM  parameters.

The training problem is interesting.  Here is my idea; please let me know 
if it's bogus:

The user utters a phrase and the HMM classifies it as meaning something. 
We can wait a short while to see if the user does something to indicate 
that this classification is incorrect.  If there is no such action, and if 
the HMM had low confidence of its classification, train it on the 
utterance just issued so that next time it would be more confident (and 
presumably catch further variants).

Obviously, there is the danger of over-training.  It seems we can mitigate 
that through (1) our detection that the utterance was correctly classified 
by the HMM, given that the user didn't do anything to correct it, and (2) 
perhaps limiting the system to only do this re-training if the counter of 
how many training data have been used for this particular classification 
is below some constant.  That constant could decay over time, for example, 
to allow us to gently migrate to varying patterns (and so that if a phone 
transfers owners it would gracefully switch to the new patterns).

Thoughts?

> Next To Do:
> 1)Porting the whole code on openmoko platform:
> 2)testing with real adc channel of Freerunner
> 3)Proper testing of noise handling and recognition on freerunner

Your Next To Do list looks pretty great and full enough even without my 
suggestion, but I'm still curious what you and others think. (-:

-- Asheesh.

-- 
The chief cause of problems is solutions.
 		-- Eric Sevareid




More information about the community mailing list