QtMoko audio state work

Neil Jerram neil at ossau.homelinux.net
Thu Jan 17 01:52:57 CET 2013


I wanted to write a bit of an update on my recent GTA04/QtMoko work, and
would appreciate people's thoughts.  This is still focussed on audio
handling, because with my A3 I still haven't reached a point of having
fully reliable phone calls.

(As usual, for the sake of avoiding any possible negative marketing, I
should point out that the main problems here are A3-specific.  I hope,
and believe to be the case, that phone calls are already reliable on
A4.)

With Neil Brown's helpful input, I've reviewed and improved my
understanding of all the ALSA state files and controls.  This was
prompted specifically by the PhoneHeadset state not working (because it
was completely wrong) but more generally it bothers me that we use such
an over-general system as ALSA state files when there are really only a
handful (in fact, 7 switches and 1 volume control) of independent things
that we ever need to change when moving between audio states.  Also it
has bothered me that we don't have a uniform and persistent way of
changing overall volume.

I've now reimplemented the GTA04 neoaudioplugin.cpp code so that it
moves between audio states by changing those 7 switches, instead of
using ALSA state files, and that seems to work well (including for
PhoneHeadset, subject to A3 software routing trouble).  One detail,
which is nice but slightly worrying, is that I used to always to get a
very audible click about 1s before, e.g., hearing the new message
arrival sound; and with the reimplementation I no longer get that click.
I _think_ the explanation for that was using pasuspender, and that I no
longer get it because I no longer need to use pasuspender - but it's
slightly worrying in case that's wrong, and in particular in case it's
because I'm leaving some circuit on more than before, and hence drawing
more current.  (I haven't seen any evidence for drawing extra current.)

Overall phone volume is controlled by the addition of the three DAC2
Gain controls, and the new state change implementation never touches
those, which means that it's now possible for a volume control widget to
change those controls independently, and for that setting to persist.  I
haven't implemented that yet though.

Next up is the A3 software audio routing.  I announced a while ago that
I had that working with pulseaudio's module-loopback instead of Radek's
gta04-gsm-voice-routing program, and I thought that pulseaudio would be
the way to go.  Since then I've tried to add in echo cancellation, and
tried running on an wheezy/armhf system instead of squeeze/armel -
which is believed to be advantageous because of better floating point
performance, but I've hit various problems.

- Just plain loopback, with module-loopback, appears to use a lot (~50%)
  of CPU, even when running at just 8000Hz, without any resampling, and
  without asking for particularly low latency.  I don't recall how much
  CPU gta04-gsm-voice-routing takes, but I don't think it was that much.

- Pulseaudio needs to be run without RT scheduling, in order to avoid
  being killed (because of tight-looping) during the initial window
  between when a call is started and when the GSM capture device becomes
  readable.  But running without RT scheduling reduces the quality of
  media playback.

- Pulseaudio loopback exhibits some odd artefacts.  During that initial
  window (of maybe a second or two) it cyclically replays whatever was
  last in the device's playback buffer.  It audibly does this for what
  is played through the earpiece/speaker/headset; I wonder if it might
  do it a bit in the other direction too, i.e. to the other end of the
  call?  It also seems to cause occasional short sound repeats _during_
  a call.  I think one possible cause of this is divergence between the
  two sound cards' clocks, hence the buffers being used up at different
  rates, and at some point Pulseaudio has to choose some strategy for
  filling in some missing data (to avoid an underrun).  I've also
  frequently noticed that a DTMF tone pressed by me seems to have an
  effect on the other end as though I had pressed the key twice instead
  of once, and I wonder if that might be related to repeating or echoing
  in the audio stream going to the other end.

- Despite the WebRTC echo cancellation's apparent good reputation, I
  haven't been able to get either it or Speex to work effectively.  Also
  CPU usage when trying to do loopback with echo cancellation is 80-90%,
  even on armhf.

  (In general, for some reason, it appears that the floating point code
  in Pulseaudio and its dependencies doesn't run any better on armhf
  than it does on armel.)

The upshot of all that is that I'm now inclined to look more again at
the other possible solutions, i.e. gta04-gsm-voice-routing (by Radek)
and alsaloop (as used in SHR).  The simplicity of
gta04-gsm-voice-routing is appealing, but I know from previous
experience that it sometimes fails completely.  alsaloop in comparison
has a drastically different and more complex design.  I'm wondering if
gta04-gsm-voice-routing is unstable _because_ its design is overly
simple, and if something more like alsaloop is fundamentally needed -
but I haven't yet worked out even how to start analysing that; any ideas
would be most welcome.  Also, if we did go with alsaloop, I've no idea
yet how we might try to add in echo cancellation.

That's it for now.  If you've read as far as here, thanks, and all
thoughts would be most appreciated.  I haven't yet pushed my
work-in-progress to anywhere public, but can easily do that if people
are interested.

Regards,
        Neil



More information about the community mailing list