WLAN: known issues and how to help

Werner Almesberger werner at openmoko.org
Wed Nov 26 18:30:53 CET 2008


In https://docs.openmoko.org/trac/ticket/1794, the question was
recently raised how one could help getting WLAN to work better.
The answer needs a bit of context ...


Let's first look at my current lists of bugs, classified by their
probable cause or some specific circumstances:

- Low-level communication instability #1:

  #1387 WLAN is generally unstable (report very vague)
  #1285 WLAN connectivity gradually decays over many hours (maybe
        linked to WPA)
  #1860 WMI Control EP full (= #1285 ?)
  #1929 WLAN gets stuck after a while with ar6000_ioctl_giwscan()
        complaining (= #1285 ?)

- Low-level communication instability #2:

  #1597 event/0 goes crazy (believed to be solved by SDIO stack switch)

- WPA protocol:

  (list) De-associates after a while, then fails to re-associate when
         using WPA

- Interoperability:

  #1250 Protocol incompatibility with AP running Tomato (Tomato broken ?)

- Reconfiguration lingers or causes problems, probably fixed:

  #1392 Switching between ad-hoc to infrastructure needs
        iwconfig essid off
  (list) Wrong key needs quiet period to expire

- Status reporting to user space:

  #1367 WLAN does not report statistics on /proc/net/wireless
  #1742 iwconfig reports signal even if not associated

- Unreliable driver bringup:

  #2133 Insanely large allocation of AR_SOFTC_T (fixed ?)

- Strange crash:

  #1939 Setting key and ESSID hangs system

Did I forget/overlook any ?


I think #1392 and #2133 may be solved by recent patches, and since I've
seen #1597 lots of times with the Atheros SDIO stack and never with the
Linux SDIO stack, that may be solved as well.

Not sure what to do about #1250. Seems that nobody's interested in it
anymore. (We need a new resolution option - "died of boredom and old
age" ;-) #1939 is also a bit mysterious. Setting WEP key and ESSID
(with iwconfig) certainly works for me.

Then we have two larger groups of remaining bugs:

1) There appears to be a general reliability problem in low-level
   communication that eventually leads to a failure, often involving
   complaints about the "WMI Control EP" or ar6000_ioctl_giwscan.

   I don't know if they are all different manifestations of the same
   problem of if we really have a number of separate issues there.

   I've never seen any of these myself, so one way to help would be
   to generate easily reproducible procedures that exhibit those
   bugs. More about this below.

2) Incomplete implementation of APIs to user space. Those may be
   genuine bugs or corner cases of the wireless API - I haven't
   checked yet.

   Unless they are blockers for current user space (GUI) work, I'd
   consider them low priority. So if anyone wants to fix them,
   here's an opportunity :-)


Now, this brings us to the really interesting group of bugs, the
mystery instabilities. As befits a mystery, it's a bit unclear under
what circumstances they occur.

First of all, the GUIs seem to be in a pretty bad state, so anything
that was controlled from a GUI may just have failed because of that.
There's also the possibility that weird operations from the GUI upset
kernel or firmware.

Second, we don't know for which range of encryption options these
problems appear.

Third, suspend/resume can do many wondrous things to a system, so it
would be good to separate runs where a suspend/resume occurred from
runs where it didn't.

Finally, there may be an external trigger, such as an access point
sending something that causes indigestion in firmware or kernel.

All this means that the problem is still too vague to say what
specifically could be done to solve it. (Once you understand a
problem, it usually as good as solved anyway :-)


So it would be good if we could try to reproduce those hangs under
controlled conditions, and then do a triage. We did one a while ago
inside Openmoko, when WLAN was considered completely unusable. That
triage was a bit of a victim of its own success, as it turned out
that the WLAN worked beautifully, just the GUI didn't.

My idea is as follows:

- in order to avoid interference from user space, I'd make kernel/
  rootfs/SD card images containing some shell-accessible networking
  tools. That way, everybody can use exactly the same system
  environment.

- that system won't suspend unless explicitly told to, so we'll
  have that parameter under control as well.

- I'll specify more detailed reporting criteria. E.g., the existing
  bug reports don't reveal much about the network environment.

I want to make suspend-without-reset work first, and then I'll start
preparing that trial. Hopefully, this will provide the clues needed
to slay that dragon as well.

Regarding areas that need help, let's not forget that for real-life
use on the go, the command line isn't much fun for configuring your
WLAN. So anything that brings Connman or Network Manager into a better
shape will also be very welcome.

Thanks,
- Werner



More information about the devel mailing list