strange problem with Intenso 4GB SDHC card
Joerg Reisenweber
joerg at openmoko.org
Sat Jul 26 04:52:33 CEST 2008
Am Do 24. Juli 2008 schrieb Andy Green:
> Somebody in the thread at some point said:
> | Hi all,
> |
> | I can now reliably reproduce the issue, as dd'ing the mbr back to the
> | card so far restores sane behaviour :
> |
> | If sd_drive is set to "0", then after a resume from "sync && apm -s" the
> | MBR of my 4GB SanDisk is wiped - so far I haven't noticed any other
> | errors, but have not looked very closely.
> ...
>
> | PS: Can somebody please tell me how to re-initialize the card without
> | going through another suspend/resume cycle ?
>
> sd_drive setting isn't actually used until next time we access the card,
> so provoking an access will do it, eg, touch /something ; sync.
>
> But the two explanations for what goes on seem mixed still here, we
> affect sd_drive and we do a suspend. My guess / hope is that this
> problem is coming from the suspend action alone and the change of
> sd_drive is bogus here. Maybe you can bang on it a little more trying
> to disprove that hypothesis?
>
> -Andy
As I think this seems to be quite a good clue to what's really happening here,
quote from the OLPC ticket #6532:
(HTH)
>>>>>
cc dilinger added
I've spend some time digging deep into the bowels of the VFS and block layer
and gathering some debug output and have an explanation for the partition
table corruption:
Upon coming out of resume, the SD code, with CONFIG_MMC_UNSAFE_SUSPEND
enabled, checks to see if there is a card plugged into the system and whether
that card is the same as the one that was plugged into the system at suspend
time. This is accomplished by reading the card ID of the device and for some
reason, very possibly #1339, we fail this detection. In this case, the kernel
removes the old device from the system and in this execution path, the
partition information for this device is zeroed.
Even though the device is removed, the device is still mounted and upon
unmount, ext2 syncs the superblock, even if the file system is sync'd
beforehand. The superblock is block 0 of the partition and the block layer
adds to this the partition start offset before submitting the write to the
lower layers. As the partition information has already been zeroed out, we
end up writing to block 0 of the disk itself, overwriting the partition table
and the geometry information. I've verified this by both gathering debug
output and 'dd' + 'hexdump' of corrupted and uncorrupted media.
Some interesting points:
We are able to delete a block device even though it is still mounted.
Even though the device has been deleted, the write submitted to it does not
fail.
Note that this is still not 100% reproducible and in certain cases the
superblock write during unmount does fail with block I/O errors, meaning that
the queue is properly deleted. As per dilinger's comments on IRC, the VFS has
lots of refcounts and there is a timing issue/race condition that we're
hitting. As per #1339, we may be able to add an OLPC specific hackto wait
500ms or so upon resume to get around this. I will try this but I don't think
this is acceptable given our suspend/resume requirements.
Something I don't quite understand at the moment is how/when our userland env
(journal specifically I think?) unmounts the device as I've been testing via
command line suspend mount, and unmount while running in console mode.
Next steps:
Get an understanding of the what is happening with our userland and brainstorm
with cjb about the possibility of simply unmounting the SD device upon
suspend. There are issues around this as we may have files open and that will
keep us from suspending.
Test adding a timeout to the resume path to see if it solves our problem to
validate that it is indeed something related to our HW.
Dig into the unmount/write to non-existing bdev some more nad discuss this
upstream if needed.
(Adding dilinger to cc:)
<<<<<<<<<<<<<<<<<<<<<<<<
/j
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: This is a digitally signed message part.
Url : http://lists.openmoko.org/pipermail/community/attachments/20080726/0a58af79/attachment.pgp
More information about the community
mailing list