<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.6000.16674" name=GENERATOR></HEAD>
<BODY>
<DIV dir=ltr align=left><SPAN class=484014020-29062008><FONT face=Arial
color=#0000ff size=2>very cool. as voice reco is one of my long time passions
I'm glad to see someone take this up.</FONT></SPAN></DIV><BR>
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> community-bounces@lists.openmoko.org
[mailto:community-bounces@lists.openmoko.org] <B>On Behalf Of </B>saurabh
gupta<BR><B>Sent:</B> Sunday, June 29, 2008 9:38 AM<BR><B>To:</B> List for
Openmoko community discussion<BR><B>Cc:</B>
community-repository@lists.openmoko.org.<BR><B>Subject:</B> GSoC Project Status
Update 04: Speech Recognition in Openmoko<BR></FONT><BR></DIV>
<DIV></DIV>Hello everyone,<BR><BR> Finally I also got my Neo
Freerunner on Friday and I spent some time playing with it:). Here is the status
update of this week. I finally completed the code book design code using vector
quantization. Now the testing phase of the code is going on. I recorded various
samples of word like "hello" in a .wav file and then using scilab I converted
them into text files having arrays of numbers. The most challenging part in
testing is the proper scaling and testing of each subroutine separately for
fixed point notation. I also made an important change in fixed point by now
using 8:8 notation than 16:16 as suggested by Erwin Lewin. However, I had to
keep a track of all data types used in various subroutines for their ranges
which was also interesting. While checking each subroutine separately,I found
most of them giving correct results but some still needs to be modified for
underflow and overflow problems.<BR> Besides this, some
modification is being done in the noise rejection part since it can degrade the
performance wildly. I will use zero crossing rate and short term energy
algorithm for end point detection. My model will also use left to right HMM. But
for properly training HMM models one needs more than one training sequence. It
means that in speaker dependent recognition, for training any word, one needs to
utter the same word two or three times so that proper modeling of HMM parameters
take place. When more than one training sequence is used for training, baum
welch or K-means segmental method gives better modeling of HMM
parameters.<BR><BR>Next To Do:<BR>1)Porting the whole code on openmoko
platform:<BR>2)testing with real adc channel of Freerunner<BR>3)Proper testing
of noise handling and recognition on freerunner<BR><BR clear=all><BR>--
<BR>Saurabh Gupta<BR>Electronics and Communication Engg.<BR>NSIT,New Delhi,
India<BR>I blog here: <A
href="http://saurabh1403.wordpress.com">http://saurabh1403.wordpress.com</A><BR><BR></BODY></HTML>