WLAN: known issues and how to help
Philip Rhoades
phil at pricom.com.au
Tue Dec 2 04:18:30 CET 2008
Werner,
I have been testing nearly all the distributions and have never been
able to keep a WLAN connection going for more than 5.5 hours (just a
simple script pinging every two minutes). I had not looked at SHR
before because I thought it might not suit me but I decided to try it
out a few days ago. The first attempt only lasted a few hours but the
most recent one lasted 16.5 hours before I had to stop it! Is SHR an
older OS but more worked on ie less bugs?
Regards,
Phil.
Werner Almesberger wrote:
> In https://docs.openmoko.org/trac/ticket/1794, the question was
> recently raised how one could help getting WLAN to work better.
> The answer needs a bit of context ...
>
>
> Let's first look at my current lists of bugs, classified by their
> probable cause or some specific circumstances:
>
> - Low-level communication instability #1:
>
> #1387 WLAN is generally unstable (report very vague)
> #1285 WLAN connectivity gradually decays over many hours (maybe
> linked to WPA)
> #1860 WMI Control EP full (= #1285 ?)
> #1929 WLAN gets stuck after a while with ar6000_ioctl_giwscan()
> complaining (= #1285 ?)
>
> - Low-level communication instability #2:
>
> #1597 event/0 goes crazy (believed to be solved by SDIO stack switch)
>
> - WPA protocol:
>
> (list) De-associates after a while, then fails to re-associate when
> using WPA
>
> - Interoperability:
>
> #1250 Protocol incompatibility with AP running Tomato (Tomato broken ?)
>
> - Reconfiguration lingers or causes problems, probably fixed:
>
> #1392 Switching between ad-hoc to infrastructure needs
> iwconfig essid off
> (list) Wrong key needs quiet period to expire
>
> - Status reporting to user space:
>
> #1367 WLAN does not report statistics on /proc/net/wireless
> #1742 iwconfig reports signal even if not associated
>
> - Unreliable driver bringup:
>
> #2133 Insanely large allocation of AR_SOFTC_T (fixed ?)
>
> - Strange crash:
>
> #1939 Setting key and ESSID hangs system
>
> Did I forget/overlook any ?
>
>
> I think #1392 and #2133 may be solved by recent patches, and since I've
> seen #1597 lots of times with the Atheros SDIO stack and never with the
> Linux SDIO stack, that may be solved as well.
>
> Not sure what to do about #1250. Seems that nobody's interested in it
> anymore. (We need a new resolution option - "died of boredom and old
> age" ;-) #1939 is also a bit mysterious. Setting WEP key and ESSID
> (with iwconfig) certainly works for me.
>
> Then we have two larger groups of remaining bugs:
>
> 1) There appears to be a general reliability problem in low-level
> communication that eventually leads to a failure, often involving
> complaints about the "WMI Control EP" or ar6000_ioctl_giwscan.
>
> I don't know if they are all different manifestations of the same
> problem of if we really have a number of separate issues there.
>
> I've never seen any of these myself, so one way to help would be
> to generate easily reproducible procedures that exhibit those
> bugs. More about this below.
>
> 2) Incomplete implementation of APIs to user space. Those may be
> genuine bugs or corner cases of the wireless API - I haven't
> checked yet.
>
> Unless they are blockers for current user space (GUI) work, I'd
> consider them low priority. So if anyone wants to fix them,
> here's an opportunity :-)
>
>
> Now, this brings us to the really interesting group of bugs, the
> mystery instabilities. As befits a mystery, it's a bit unclear under
> what circumstances they occur.
>
> First of all, the GUIs seem to be in a pretty bad state, so anything
> that was controlled from a GUI may just have failed because of that.
> There's also the possibility that weird operations from the GUI upset
> kernel or firmware.
>
> Second, we don't know for which range of encryption options these
> problems appear.
>
> Third, suspend/resume can do many wondrous things to a system, so it
> would be good to separate runs where a suspend/resume occurred from
> runs where it didn't.
>
> Finally, there may be an external trigger, such as an access point
> sending something that causes indigestion in firmware or kernel.
>
> All this means that the problem is still too vague to say what
> specifically could be done to solve it. (Once you understand a
> problem, it usually as good as solved anyway :-)
>
>
> So it would be good if we could try to reproduce those hangs under
> controlled conditions, and then do a triage. We did one a while ago
> inside Openmoko, when WLAN was considered completely unusable. That
> triage was a bit of a victim of its own success, as it turned out
> that the WLAN worked beautifully, just the GUI didn't.
>
> My idea is as follows:
>
> - in order to avoid interference from user space, I'd make kernel/
> rootfs/SD card images containing some shell-accessible networking
> tools. That way, everybody can use exactly the same system
> environment.
>
> - that system won't suspend unless explicitly told to, so we'll
> have that parameter under control as well.
>
> - I'll specify more detailed reporting criteria. E.g., the existing
> bug reports don't reveal much about the network environment.
>
> I want to make suspend-without-reset work first, and then I'll start
> preparing that trial. Hopefully, this will provide the clues needed
> to slay that dragon as well.
>
> Regarding areas that need help, let's not forget that for real-life
> use on the go, the command line isn't much fun for configuring your
> WLAN. So anything that brings Connman or Network Manager into a better
> shape will also be very welcome.
>
> Thanks,
> - Werner
>
> _______________________________________________
> devel mailing list
> devel at lists.openmoko.org
> https://lists.openmoko.org/mailman/listinfo/devel
--
Philip Rhoades
GPO Box 3411
Sydney NSW 2001
Australia
E-mail: phil at pricom.com.au
More information about the devel
mailing list