fascinating bug to do with apm and child processes...

Carsten Haitzler (The Rasterman) raster at openmoko.org
Wed May 14 16:15:50 CEST 2008


On Wed, 14 May 2008 13:30:25 +0100 Graeme Gregory <graeme at openmoko.org> babbled:

> On Wed, May 14, 2008 at 10:12:00PM +1000, Carsten Haitzler wrote:
> > On Wed, 14 May 2008 13:06:01 +0100 Andy Green <andy at openmoko.com> babbled:
> > 
> > > -----BEGIN PGP SIGNED MESSAGE-----
> > > Hash: SHA1
> > > 
> > > Somebody in the thread at some point said:
> > > 
> > > |        WNOHANG     return immediately if no child has exited.
> > > |
> > > | so under no circumstances should this ever hang... but oooh. it does.
> > > | now interestingly i attached to apm to see what it was doing.. and
> > > lo-and-behold,
> > > | it woke up and continued to execute then exited with sh reaping the
> > > child then
> > > | e reaping the sh and e waking up again:
> > > 
> > > There's some process freezing step as part of entering suspend, I guess
> > > it is to do with that.  FWIW echo mem > /sys/power/state also the echo
> > > never returns until it comes back in resume.
> > 
> > sure - but the system never suspended - it stayed alive. that's why i could
> > debug :) the problem is a sigchld has been issued for a process that hasn't
> > fully exited and waitpid() is blocking even with WNOHANG. it should never
> > block
> > - ever. doesn't matter what the child is doing. :) never hang. ever! :) the
> > problem is the freeze of the apm process propagates to all its parents -
> > when i do know that ecore (the lib for e handling this) is carefully
> > written to avoid such hangs. :) 
> 
> This sounds like the bug I found years ago, when more than one program
> opens /dev/apm_bios when apm -s is called they all lock up until you
> kill them all until only apmd is left. Then suddenly stuff starts
> working again.
> 
> This is the reason I originally wrote the
> 
> org.openmoko.dev/packages/xorg-xserver/xserver-kdrive/disable-apm.patch
> 
> To stop xserver causing this fault.

shouldn't we look at fixing apm itself instead of working around it? :) also
this is a pretty major issue here as this means suspend/resume is going to be
very liable to hit this bug and thus have problems. the apm process itself
getting hung in such a way all parent processes also get hung in waitpid() even
if they call it with WNOHANG is going to hang... this is bad... :(

-- 
Carsten Haitzler (The Rasterman) <raster at openmoko.org>




More information about the openmoko-kernel mailing list