Openmoko Bug #1841: white screen of death (WSOD) after resume

Openmoko Public Trac bugs at docs.openmoko.org
Tue Dec 9 14:50:28 CET 2008


#1841: white screen of death (WSOD) after resume
-----------------------+----------------------------------------------------
 Reporter:  Rorschach  |          Owner:  openmoko-devel
     Type:  defect     |         Status:  new           
 Priority:  highest    |      Milestone:                
Component:  unknown    |        Version:  GTA02v5       
 Severity:  critical   |       Keywords:  wsod,resume   
 Haspatch:  0          |      Blockedby:                
Estimated:             |    Patchreview:                
 Blocking:             |   Reproducible:  always        
-----------------------+----------------------------------------------------

Comment(by joerg):

 Results of our tests so far:
 first we found two devices to show WSOD relatively frequent and
 reproduceably:
 #51 from https://docs.openmoko.org/trac/ticket/1621
 and a A7 PP model.

 We verified temperature dependency, by warming up whole device (-> no
 WSOD),
 then cooling down LCM while keeping rest of device in warmed up state ->
 WSOD
 on first try.

 We applied the no_deep_suspend patch to recent stable branch 2.6.24, and
 we
 found (on #51) it reduces probability of WSOD but won't fix it. There are
 other reports [http://docs.openmoko.org/trac/ticket/2115] of WSOD not
 being
 dependent on going to deep_suspend mode at all (and thus this patch
 shouldn't
 be able to help there).
 Seems deep_suspend can trigger WSOD very easily, but WSOD has some
 different
 operation scheme than exactly something going wrong during deep_suspend or
 resume from that.

 WSOD is dependent on time the device is suspended, i.e. it seems like it
 takes
 quite a few minutes sometimes until suspend triggers WSOD. This seems
 somewhat paradox regarding paragraph above.

 We patched JBT6K74.c driver to increase existing mdelay() and inserting
 new
 ones on every reasonable point of communication-flow, and even lowered
 GLAMO-SPI clockfrequency, to make LCM feel quite comfortable with any
 aspect
 of timing regarding the control-communication. Result: none. Randomness of
 WSOD seems unchanged.

 We added printk() and created logs of a consecutive resume-ok, and a
 resume-WSOD following immediately. On comparing both sequences we didn't
 notice any significant difference, neither in sequence of function calls
 nor
 in timing.

 We had 2 or 3 times a complete refusal of #51 to produce WSOD. After
 taking
 out battery for 10min it was back to normal (means 95% immediate WSOD
 after
 20sec suspend)

 We swapped LCM of #51 with the one of a known good device. Result: 40
 suspend/resume, as well as placing #51 with new LCM to the fridge for
 30min
 and then resuming, didn't show any WSOD.
 We attached #51-LCM to a known-good device, and it didn't show WSOD on 6
 cycles. So obviously the issue isn't located on the LCM entirely.

 We never seen any WSOD recovering on subsequent suspend/resume cycles. It
 always needed a reboot to recover. *)

 So far we didn't see a single WSOD on boot.

 So we are wondering what's the difference between
 a) switching LCM power down via LDO6, while keeping *all* lines to LCM at
 low
 (to stop reverse powering by sneak currents, and not to violate JBT6K74
 electrical specs), then power up and reset
 ~and
 b) a usual boot bringing up LCM in sane state
 Maybe that's pure incidence we never seen a WSOD on boot so far?

 *) Further results:
 we attached debug-board and resetted the device to reboot without power-
 down:
 WSOD recovered.

 We probed for the signals on LCM-FPC by using a GTA03-debugboard (task not
 completed yet): With an old image and kernel (2008.08) there was 3.2V for
 powersupply and some of the datalines. We didn't find differences in
 probed
 signals between WSOD and clear display.
 We didn't see a LCM-RESET on resume though.
 By messing around with probing the signals, we got a recover from WSOD
 once,
 but it wasn't reproduceable and only *might* be connected with shorting
 reset
 to GND.
 Removing a WSODed LCM from device during suspend, then reconnecting it,
 then
 resume: WSOD recovered at least on second resume after that (first one
 probably got some confusion by reconnecting FPC made not a nice switch and
 some bounces on the lines and wrong sequences for power-up).
 First resume LCM usually faded from white to black.

 Conclusion: root cause of WSOD is some 'analog' thing depending on LCM and
 device. We can not provide a good clue to nature of the issue.
 By first(! Vio <= VDD) switching all glamo->lcm IO's to 0V/high-Z, then
 disabling LDO6 for suspend, and on resume first powering up device via
 LDO6
 and then initializing it (incl. activating glamo interface), we should
 achieve to get zero power-consumption during suspend for LCM, and be able
 to
 recover/avoid WSOD.

 As Andy is much more savvy in meddling the kernel space, and
 LDO6-switchoff is
 announced by him anyway, we didn't try to implement this plus the needed
 glamo-lines-pulldown here in TPE.

 jOERG

-- 
Ticket URL: <https://docs.openmoko.org/trac/ticket/1841#comment:119>
docs.openmoko.org <http://docs.openmoko.org/trac/>
openmoko trac


More information about the devel mailing list