GSoC Project Status Update 03: Speech Recognition Facility in Openmoko
saurabhgupta1403 at gmail.com
Sun Jun 22 22:42:09 CEST 2008
On Mon, Jun 23, 2008 at 1:05 AM, <prishelec at gmail.com> wrote:
> Will it be possible to use it with voice dialing?
> You said vocabulary is 5-10. Will it be enough?
> It would be cool if I would have a possibility to say: "Message to
> Jane" to open sms dialog or "Call to Jane" to call, presuming Jane is
> a hot chick ;-)
> Is it possible?
yes, it is of course possible.
But it requires the speech recognition for connected words which needs the
level building algorithms and proper noise handling along with learning
grammar for machine. This project has a great scope and can be extended to
any limit. However in this small duration for GSoC Project, I dont think
that it will be possible to incorporate these advanced features in it. The
initial aim will be to provide an API in which user can store his/her own
words individually and connect any particular activity with that word. Upon
detection of that word, the API corresponding to that activity for that word
will be called. I have included these points in my Design Document and the
scope of advanced models using speech recognition. I think once the
individual word recognition application is built, the advanced features can
be added using this application and newer one.
> On 6/22/08, saurabh gupta <saurabhgupta1403 at gmail.com> wrote:
> > Hello everyone,
> > This is the status update of the GSoC project, Speech Recognition
> > in Openmoko. This week, much of the time was devoted in writing codes and
> > optimizing the existing one. I have written many subroutines like forward
> > backward procedure, LPC and cepstral analysis of speech signals in
> > viterbi algorithm and training algorithm using K-means segmental method.
> > the source codes have been successfully compiled using GNU C compiler.
> > There are various optimizations done in the coding to make it
> > for working on the ARM 16/32-bit processor running at 266 or 400 MHz
> > maximum. The whole code is written using fixed point arithmetic. I used
> > some external libraries for some subroutines and converted them in fixed
> > point arithmetic. The other optimization was done by choosing K-means
> > segmental procedure for training the HMM models rather than Baum Welch
> > algorithm which requires more processing since it accounts for all the
> > possible hidden states for a given sequence. On the other hand K-means
> > segmental method uses viterbi algorithm to find the best state sequence
> > then iterates for re-estimation and training the HMM model. K-means
> > segmental method has been proved to show good results and fast processing
> > than Baum-Welch. The other optimization is regarding the probability
> > function. As this project aims for a small vocabulary (around 5 or 10)
> > recognition, vector quantization will be used instead of continuous
> > observation sequence. Vector quantization procedure is faster and yields
> > good result for applications in small embedded devices. The vector
> > quantization source code is about to finish. Soon after that, the actual
> > testing of speech recognition code will be done on the speech samples
> > collected.
> > I have uploaded all Documents (Design Document version-0.2)
> > source codes on the svn repository of Openmoko (
> > https://svn.projects.openmoko.org/svnroot/speech/). Any comments and
> > suggestions will be highly appreciated.
> > http://saurabh1403.wordpress.com/
> > Regards....
> > --
> > Saurabh Gupta
> > Electronics and Communication Engg.
> > NSIT,New Delhi
> Openmoko community mailing list
> community at lists.openmoko.org
Electronics and Communication Engg.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the community