GSoC Project Status Update 03: Speech Recognition Facility in Openmoko
prishelec at gmail.com
prishelec at gmail.com
Sun Jun 22 21:35:54 CEST 2008
Will it be possible to use it with voice dialing?
You said vocabulary is 5-10. Will it be enough?
It would be cool if I would have a possibility to say: "Message to
Jane" to open sms dialog or "Call to Jane" to call, presuming Jane is
a hot chick ;-)
Is it possible?
On 6/22/08, saurabh gupta <saurabhgupta1403 at gmail.com> wrote:
> Hello everyone,
> This is the status update of the GSoC project, Speech Recognition facility
> in Openmoko. This week, much of the time was devoted in writing codes and
> optimizing the existing one. I have written many subroutines like forward
> backward procedure, LPC and cepstral analysis of speech signals in frames,
> viterbi algorithm and training algorithm using K-means segmental method. All
> the source codes have been successfully compiled using GNU C compiler.
> There are various optimizations done in the coding to make it suitable
> for working on the ARM 16/32-bit processor running at 266 or 400 MHz
> maximum. The whole code is written using fixed point arithmetic. I used
> some external libraries for some subroutines and converted them in fixed
> point arithmetic. The other optimization was done by choosing K-means
> segmental procedure for training the HMM models rather than Baum Welch
> algorithm which requires more processing since it accounts for all the
> possible hidden states for a given sequence. On the other hand K-means
> segmental method uses viterbi algorithm to find the best state sequence and
> then iterates for re-estimation and training the HMM model. K-means
> segmental method has been proved to show good results and fast processing
> than Baum-Welch. The other optimization is regarding the probability density
> function. As this project aims for a small vocabulary (around 5 or 10) for
> recognition, vector quantization will be used instead of continuous
> observation sequence. Vector quantization procedure is faster and yields
> good result for applications in small embedded devices. The vector
> quantization source code is about to finish. Soon after that, the actual
> testing of speech recognition code will be done on the speech samples
> I have uploaded all Documents (Design Document version-0.2) and
> source codes on the svn repository of Openmoko (
> https://svn.projects.openmoko.org/svnroot/speech/). Any comments and
> suggestions will be highly appreciated.
> Saurabh Gupta
> Electronics and Communication Engg.
> NSIT,New Delhi
More information about the community