WLAN: known issues and how to help

Philip Rhoades phil at pricom.com.au
Tue Dec 2 04:18:30 CET 2008


Werner,

I have been testing nearly all the distributions and have never been 
able to keep a WLAN connection going for more than 5.5 hours (just a 
simple script pinging every two minutes).  I had not looked at SHR 
before because I thought it might not suit me but I decided to try it 
out a few days ago.  The first attempt only lasted a few hours but the 
most recent one lasted 16.5 hours before I had to stop it!  Is SHR an 
older OS but more worked on ie less bugs?

Regards,

Phil.


Werner Almesberger wrote:
> In https://docs.openmoko.org/trac/ticket/1794, the question was
> recently raised how one could help getting WLAN to work better.
> The answer needs a bit of context ...
> 
> 
> Let's first look at my current lists of bugs, classified by their
> probable cause or some specific circumstances:
> 
> - Low-level communication instability #1:
> 
>   #1387 WLAN is generally unstable (report very vague)
>   #1285 WLAN connectivity gradually decays over many hours (maybe
>         linked to WPA)
>   #1860 WMI Control EP full (= #1285 ?)
>   #1929 WLAN gets stuck after a while with ar6000_ioctl_giwscan()
>         complaining (= #1285 ?)
> 
> - Low-level communication instability #2:
> 
>   #1597 event/0 goes crazy (believed to be solved by SDIO stack switch)
> 
> - WPA protocol:
> 
>   (list) De-associates after a while, then fails to re-associate when
>          using WPA
> 
> - Interoperability:
> 
>   #1250 Protocol incompatibility with AP running Tomato (Tomato broken ?)
> 
> - Reconfiguration lingers or causes problems, probably fixed:
> 
>   #1392 Switching between ad-hoc to infrastructure needs
>         iwconfig essid off
>   (list) Wrong key needs quiet period to expire
> 
> - Status reporting to user space:
> 
>   #1367 WLAN does not report statistics on /proc/net/wireless
>   #1742 iwconfig reports signal even if not associated
> 
> - Unreliable driver bringup:
> 
>   #2133 Insanely large allocation of AR_SOFTC_T (fixed ?)
> 
> - Strange crash:
> 
>   #1939 Setting key and ESSID hangs system
> 
> Did I forget/overlook any ?
> 
> 
> I think #1392 and #2133 may be solved by recent patches, and since I've
> seen #1597 lots of times with the Atheros SDIO stack and never with the
> Linux SDIO stack, that may be solved as well.
> 
> Not sure what to do about #1250. Seems that nobody's interested in it
> anymore. (We need a new resolution option - "died of boredom and old
> age" ;-) #1939 is also a bit mysterious. Setting WEP key and ESSID
> (with iwconfig) certainly works for me.
> 
> Then we have two larger groups of remaining bugs:
> 
> 1) There appears to be a general reliability problem in low-level
>    communication that eventually leads to a failure, often involving
>    complaints about the "WMI Control EP" or ar6000_ioctl_giwscan.
> 
>    I don't know if they are all different manifestations of the same
>    problem of if we really have a number of separate issues there.
> 
>    I've never seen any of these myself, so one way to help would be
>    to generate easily reproducible procedures that exhibit those
>    bugs. More about this below.
> 
> 2) Incomplete implementation of APIs to user space. Those may be
>    genuine bugs or corner cases of the wireless API - I haven't
>    checked yet.
> 
>    Unless they are blockers for current user space (GUI) work, I'd
>    consider them low priority. So if anyone wants to fix them,
>    here's an opportunity :-)
> 
> 
> Now, this brings us to the really interesting group of bugs, the
> mystery instabilities. As befits a mystery, it's a bit unclear under
> what circumstances they occur.
> 
> First of all, the GUIs seem to be in a pretty bad state, so anything
> that was controlled from a GUI may just have failed because of that.
> There's also the possibility that weird operations from the GUI upset
> kernel or firmware.
> 
> Second, we don't know for which range of encryption options these
> problems appear.
> 
> Third, suspend/resume can do many wondrous things to a system, so it
> would be good to separate runs where a suspend/resume occurred from
> runs where it didn't.
> 
> Finally, there may be an external trigger, such as an access point
> sending something that causes indigestion in firmware or kernel.
> 
> All this means that the problem is still too vague to say what
> specifically could be done to solve it. (Once you understand a
> problem, it usually as good as solved anyway :-)
> 
> 
> So it would be good if we could try to reproduce those hangs under
> controlled conditions, and then do a triage. We did one a while ago
> inside Openmoko, when WLAN was considered completely unusable. That
> triage was a bit of a victim of its own success, as it turned out
> that the WLAN worked beautifully, just the GUI didn't.
> 
> My idea is as follows:
> 
> - in order to avoid interference from user space, I'd make kernel/
>   rootfs/SD card images containing some shell-accessible networking
>   tools. That way, everybody can use exactly the same system
>   environment.
> 
> - that system won't suspend unless explicitly told to, so we'll
>   have that parameter under control as well.
> 
> - I'll specify more detailed reporting criteria. E.g., the existing
>   bug reports don't reveal much about the network environment.
> 
> I want to make suspend-without-reset work first, and then I'll start
> preparing that trial. Hopefully, this will provide the clues needed
> to slay that dragon as well.
> 
> Regarding areas that need help, let's not forget that for real-life
> use on the go, the command line isn't much fun for configuring your
> WLAN. So anything that brings Connman or Network Manager into a better
> shape will also be very welcome.
> 
> Thanks,
> - Werner
> 
> _______________________________________________
> devel mailing list
> devel at lists.openmoko.org
> https://lists.openmoko.org/mailman/listinfo/devel

-- 
Philip Rhoades

GPO Box 3411
Sydney NSW	2001
Australia
E-mail:  phil at pricom.com.au



More information about the devel mailing list