fascinating bug to do with apm and child processes...
Carsten Haitzler (The Rasterman)
raster at openmoko.org
Wed May 14 13:58:18 CEST 2008
i found a fascinating (what appears to be) kernel bug...
ok.
1. e executes the "apm -s" command to suspend when it decides u've been idle
and need to. so funnily enough from ps:
root 1402 1375 0 May12 ? 00:01:22 enlightenment -profile illume
...
root 1463 1402 0 May12 ? 00:00:00 sh -c apm -s
ok - pid 1463 is a shell running apm -s - and the parent process is...
enlightenment. ok - so what is this shell process doing:
root at om-gta02:~# strace -p 1463
Process 1463 attached - interrupt to quit
wait4(-1,
look at that. it's hung in an eternal wait for its child proc (apm), which is
over here:
root 1464 1463 0 May12 ? 00:00:00 apm -s
interestingly enough e is also hung on a wait:
root at om-gta02:~# strace -p 1402
Process 1402 attached - interrupt to quit
wait4(-1, <unfinished ...>
which i KNOW is the following line of code:
while ((pid = waitpid(-1, &status, WNOHANG)) > 0)
which... should NEVER HANG. EVER. if no child exited - return immediately - as
per the manual page:
WNOHANG return immediately if no child has exited.
so under no circumstances should this ever hang... but oooh. it does. now
interestingly i attached to apm to see what it was doing.. and lo-and-behold,
it woke up and continued to execute then exited with sh reaping the child then
e reaping the sh and e waking up again:
root at om-gta02:~# strace -p 1464
Process 1464 attached - interrupt to quit
dup(2) = 5
fcntl64(5, F_GETFL) = 0x20001 (flags O_WRONLY|O_LARGEFILE)
close(5) = 0
write(2, "apm: Interrupted system call\n", 29) = 29
close(4) = 0
io_submit(0, 0, 0xfbad2088Process 1464 detached
root at om-gta02:~#
so somewhere along the way apm was stuck in a forever hung syscall - which i
don't know what it is... but the chain of sigchild pain back from this is just
wrong. thar be dragons!
--
Carsten Haitzler (The Rasterman) <raster at openmoko.org>
More information about the openmoko-kernel
mailing list