new kernel ... / meaning of stable-tracking

Mike (mwester) mwester at dls.net
Mon Sep 29 08:49:42 CEST 2008


Werner Almesberger wrote:
> Mike (mwester) wrote:
>> None of the distros, and certainly not the kernel, are stable enough to
>> ensure that any such code would actually remain running -- for every
>> "solution" you come up with to insulate such a daemon from a failure,
>> somebody somewhere will come up with another way to make it fail.
> 
> Well, if the kernel is dead, also the current hack wouldn't help you.

"Dead" kernels are relatively easy to fix compared to the problems we
deal with now.  There are numerous ways the current kernel "injures"
applications in ways it shouldn't, and the apps just aren't resilient
enough yet.  Not to mention the many ways in which one application can
injure others, from "SIGKILL" to just running the system out of memory
or some other resource.

> Besides, that platform-specific code would need to be separated from
> the actual driver anyway, so the choice is not between leaving it
> alone and replacing it, but between refactoring that mess and getting
> rid of it for good.
> 
>> Of course Openmoko can just unilaterally do what you state, and force
>> developers for every distro to invest huge amounts of effort in
>> developing fail-safe watchdog daemons for their distros.
> 
> Yeah, that would indeed be unreasonable. So I sat down and wrote it
> myself. Took about one and a half hours, most of which was spent on
> figuring out how to interface with events and i2c-dev, which I've
> never done before.
> 
> http://svn.openmoko.org/developers/werner/neodog/

Well, I'm not biting on that bait.  I said in my original email that we
could all waste time by doing silly stuff like proposing something, and
someone else finding a use case where the daemon would fail, and someone
else doing something to address that, etc, etc.

I'm sure it's good code, and I'm sure you've taken care that it will
continue to run when there's no memory left, or when other common
resources are exhausted, or access to them is locked up, and I'm sure
it's statically linked to protect from package corruption --  but it's
still user-space, and I'm quite certain there are still failure modes
where it will die along with all userspace, when a simple bit of kernel
code can still quite capably run.

> I agree with you that the kernel still needs lots of work. But
> wasting time on wrapping some more rolls of band-aid around things
> we can fix properly doesn't help ...

Sometimes the band-aid needs to be applied directly to the wound to be
effective.


Regards,
Mike



More information about the openmoko-kernel mailing list