kernel defconfig, debugging, preemption, and very noticeable speedups/ debugging

Werner Almesberger werner at openmoko.org
Sun Jan 10 15:45:48 CET 2010


Gennady Kupava wrote:
> I am really surprised that i have to spend so much time talking about
> this issue, which seem very trivial for me.

Thanks for having patience with those of us who are a little slow :-)

> may be something is not so visible on desktops, but slower embedded
> devices do not forgive such debugging on the side of user.

Yes, this makes perfect sense. What I'm looking for is a more detailed
quantification of the impact. What I'd expect is that many debug
options have very little impact while others make a huge difference.
We have to be careful with the latter, because they can also
invalidate experiments (see below).

I understand that things like cache thrashing may never show up unless
you change a large number of options at the same time. However, even
this is useful information, much like the mass defect in nuclear
experiments tells you something about particles you may not be able to
observe.

> no. anyone seem expierence this, see reports in community ml. have you
> tried to use the kernel without debugging?

I only changed config options when I caught them red-handed doing
something bad.

> it's hard to find 10% performance impact with testing individual
> options, as they have stistical error, but few such 10% may provide
> major slowdown and eventually run us out of cache.

Yes, death by a million gnat bites is a possibility. But have you
proven that this is the case or is it just a hypothesis ? (Like
the one that debug options have no perceptible cost, which you've
proven incorrect now.)

> this is normal practice in whole world - to install debugging version if
> you want to help with chasing bug. 

What I'm afraid of is that distributions will ship radically trimmed
down kernels that don't provide any useful diagnostics. So any
suspected kernel bug would require the user to install a debug kernel.
There are a few problems with this:

- users may be reluctant to change kernels and the kernel change may
  also cause other problems (e.g., mistakes made when performing an
  unfamiliar operation) which contribute to making it harder to
  debug the problem

- today's debug kernel may not match yesterday's production kernel, so
  the symptoms need a systematic re-evaluation. This is generally very
  hard to do, particularly if it's a bug that's not trivial to
  reproduce. (Since the ones that are easy to reproduce get eliminated
  quickly, any strategy that assumes easy reproducibility will work
  great in the beginning but cause grief later.)

- if the production kernel's performance differs wildly from that of
  the debug kernel, even if using matching versions, the changes
  experience may confuse the user and result in false symptoms getting
  reported.

  Worse yet, if the problem is a race condition or an access to
  improperly initialized memory, the symptoms may simply vanish in the
  debug kernel.

That's why I'm unhappy with the recommendation to just throw out all
debugging in production kernels. I'd rather have a production kernel
that does as much debugging as possible while still delivering good
performance.

> no way to understand why cpu usage of running top decreased 2x - no
> kernel messages are produced in process.

Yes, explaining the exact underlying reasons for a quantitative change
is another can of worms :-)

> urghh. fixed. hope that this is only reason :)

Thanks a lot !

- Werner



More information about the openmoko-kernel mailing list