kernel panic on the second boot from the sd

Dennis Ferron dennis.ferron at gmail.com
Thu Sep 18 20:46:59 CEST 2008


I've been having the identical problem with an 8Gb sandisk SDHC.  Is the
original poster seeing this with the 512MB card that comes with the phone,
or a different one?  (I've not seen this behavior with the original card.)
I don't have a fix yet but I think I can explain what's going on here.

The problem is that sometimes when Linux tries to access the sd card, the
card isn't ready yet and the first read doesn't get any data. After the
first successful read in a session everything's gravy and you can read/write
all you want, but in some circumstances software gives up at the first error
so you never get a chance to try again.  For instance if that first access
is attempting to get the partition table and fails, then the kernel is going
to think no partitions exist.  In the case with the kernel panic on boot,
the "unknown-block (179,2)" is exactly the same numbers I get and (I assume)
it's trying to either access block 179 of partition 2, or a partition
starting at block 179 (the first block of partition 2?) and get sector 2 of
it.  In any case what is going on here is that uboot can read the card
correctly and it loaded the kernel from the small 8 MB space in partition 1,
but when the kernel turns around to load the root file system from partition
2, the reads don't work and the kernel thinks there isn't anything there.
 The fact that the kernel got loaded at all doesn't prove the kernel can
access the card, because uboot did that work.  The kernel doesn't try to
access the card until it looks for the root file system, at which point it
panics when the read fails.

You phrased the subject very specifically that it happens on second boot
from sd.  You could mean you can boot once and it is ok, but not a second
time.  However, the other way to read that phrase, and the way I'm
interpreting your meaning because it describes the problem I'm having and I
think we're having the same problem, is that you always have to try twice to
boot, once getting nowhere and the second time getting to kernel panic.  I
don't know if you managed to catch what is on the screen the first time you
try to boot because it flashes very quickly before disappearing, but I
rebooted over and over to read it:  what happens (at least on my phone) is
the first time you try to boot *uboot itself* fails to read the sd and can't
find the boot partition.  It drops me back to the NAND uboot menu
automatcially.  On the *second* try after that, it loads the kernel
correctly.  This makes me think that the first attempt from uboot somehow
"warmed up" the card so that it was read the second time uboot accessed it.

This is not really a boot problem but a problem the partition table.  When I
first tried my card, I was able to partition it and put Debian on it.  The
next time I booted my phone (from flash) I couldn't access the card at all.
 Sometimes I had mmcblk0 only, other times I had mmcblk0p1 and p2.
 Sometimes I could mount them and sometimes not.  But I found in this thread
on kernel trap someone with a similar problem had a "voodoo" workaround:

http://kerneltrap.org/mailarchive/openmoko-community/2008/7/23/2653864/thread#mid-2653864

Most of what he was doing for the voodoo was irrelevant, but I noticed that
he did "fdisk -l /dev/mmcblk0" several times (with different, illogical
results each time).  He was doing that to see when the card showed up; I
suspected that rather than just telling you the card was showing up, it
actually was the part of the voodoo that caused the card to work!  Basically
I think what he was doing was triggering a read from block 0 of the card
over and over, until finally it gets a successful read.  As I said, once you
get 1 successful read, you can read/write all you want after that.  Let me
show you a log of what it looked like on my phone when I did this.  First, I
list /dev; notice how only mmcblk0 shows up, not p1 or p2:

login as: root
root at 192.168.0.202's password:
root at om-gta02:~# ls /dev

mmcblk0             ram14               usbmon0
mtd0                ram15               usbmon1

I removed lines from the ls listing for brevity; note just though that mtd0
follows mmcblk0, where is p1 and p2?  Now I run fdisk over and over, watch
as the output changes:

First run, nothing:
root at om-gta02:~# fdisk -l /dev/mmcblk0


Second run, finds it but finds no partitions:
root at om-gta02:~# fdisk -l /dev/mmcblk0

Disk /dev/mmcblk0: 7948 MB, 7948206080 bytes
4 heads, 16 sectors/track, 242560 cylinders
Units = cylinders of 64 * 512 = 32768 bytes

Disk /dev/mmcblk0 doesn't contain a valid partition table


Third run, jackpot!
root at om-gta02:~# fdisk -l /dev/mmcblk0

Disk /dev/mmcblk0: 7948 MB, 7948206080 bytes
4 heads, 16 sectors/track, 242560 cylinders
Units = cylinders of 64 * 512 = 32768 bytes

        Device Boot      Start         End      Blocks  Id System
/dev/mmcblk0p1               1         245        7832  83 Linux
/dev/mmcblk0p2             246      242560     7754080  83 Linux

But /dev/mmcblk0p1 and p2 still don't exist, despite what fdisk says!
 However, if I run fdisk and manually write the partition table with 'w' -
even though I've made no changes - then when ioctl reloads the /dev files
will be there, and I can mount them.  After that, I ran fsck -vf (verbose
and force check) 3 times in a row on 300 megs of data on the card (a Debian
root file system there) and never got a single error once I got the card to
show up.

It is not related to the suspend/resume bug, at least not directly.  The
suspend/resume problem is when Linux writes garbage to the partition table
and wipes it out.  Although the partition table is "disappearing" in this
case as well, the partition table is still there, you just can't see it.
 But if you take the card out of the phone and put it into a reader on a PC,
it's fine.  In short, the suspend/resume bug is a writing problem but this
is a reading problem.  Well maybe a read/write problem:  format had trouble
writing superblocks, and dd couldn't zero the mbr; but the point is in the
suspend/resume bug it was doing an unexpected write to the partition table,
and that write in particular is not happening here.

I think it's a time out issue on reading the first block -- I read a bug
report about the glammo time out register not being big enough to hold a
time out long enough for slow cards, and that the work around was to lower
the SD clock.  I think it might be timing out the first time, but not later.
 Or it could be something else...  If I get time I plan to make a
modification to the mmc driver kernel module so that it will never fail, but
just hang the system trying over and over until it gets a proper read from
the SD card.  That obviously wouldn't be something we'd want as a permanent
solution but it might force things to work, and if it does work it will get
us closer to knowing what is wrong.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.openmoko.org/pipermail/support/attachments/20080918/37fa5c92/attachment-0001.htm 


More information about the support mailing list