Wacky Racers on Glamo

Tue Jul 29 09:04:19 CEST 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi folks -

I made some progress reproducing and maybe even understanding the WSOD
(White screen of Death) that the Glamo has always been partial to as
part of a wider deathmatch with it.

It seems that our choice about divider network (R1813 / R1814) to
level-translate 3.3V CPU GPIO to 1.8V Glamo RST# might not have been the
brightest thing we ever did.  I can reset the Glamo (provoking sticky
WSOD) by touching Glamo RST# with my scope probe.  Glamo data says you
need to assert RST# for 1us to get a reset, but it appears to reset the
thing on any old glitch and is just recommending to hold it 1us to be
sure everything got reset.

With my scope probe applied, I cannot see any glitch: but it certainly
knows if my probe is on since it resets immediately which is highly
illegal behaviour for us.  So, there is a glitch and the energy needed
for Glamo to accept it with our driving network is so low that that my
scope probe capacitance dumping on it is enough, highly abnormal.

I finally got put onto this when I found the Glamo registers are being
forced back to reset values by resume time during some suspends here.
(I noticed this because we hold a lot of unused Glamo engines in reset
to avoid trouble, but the engines do not reset to being in reset, and
they were all out of reset sometimes on resume)  Since we currently rely
on bulk of Glamo regs staying where they were during suspend, this
causes death.

The Glamo docs also say that for 4ms after reset, we cannot touch the
registers, and that matches a lot of the race outcomes I see, PLLs not
started again sometimes even when we ran the correct code to start them,
stuff passing PLL lock tests and then brain damage later when it updates
cursor in framebuffer or brings up SD Card again.  The brain damage is
very ugly to debug because one race outcome is the Glamo just jams nWAIT
down forever if it isn't in a state to service your read [1].

Further, since it is glitchy (due to 80K source impedence from this
divider no doubt), I think we are racing the glitch and this is part of
the general resume instability.  But I do not know what provokes the
glitching since it doesn't randomly do it in normal operation.  Maybe
when we transition from MEMLDO to the high current 1.8V reg on resume.

Last night I started on workaround code to dump the registers we are
using on suspend and reapply them in resume, but it was still crashy and
unstable when I went to bed.

Still, this is progress!

- -Andy

[1] But I have the medicine for that in most cases, short nWAIT to 1.8V
briefly while holding down AUX, having sneaked a deliberate NULL pointer
~ OOPS into the AUX keyboard ISR which then triggers the emergency panic
dump code I added some weeks ago, so I can see where we got locked up
despite it is a true hard freeze.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iEYEARECAAYFAkiOwOsACgkQOjLpvpq7dMrmDgCeKlxtbrOlCNWwmJX5w4L7YkG2
n5sAnjvuzch/sGL2NSU1bcixnRdUuyIj
=2Rh/
-----END PGP SIGNATURE-----