[PATCH 0/2] Improve GTA02 NAND read performance by 41%
laforge at openmoko.org
Tue Oct 21 22:51:37 CEST 2008
On Tue, Oct 21, 2008 at 08:04:00PM +0200, Holger Freyther wrote:
> On Tuesday 21 October 2008 11:22:31 Harald Welte wrote:
> > I'm really lost here, don't know what else to do. I'll get some profiles
> > on a soft-ECC and on a non-irq-based-NAND kernel to compare the results and
> > see if they also show this 'artefact'. Maybe 'top' is actually wrong? Any
> > ideas?
> @oprofile: You could catch Richard on irc, IIRC he did work on oprofile for
> ARM and XScale and might know if some thing are not accounted because of
> missing hooks.
I'll catch up with him once I get back to this topic, right now I'm afraid I'll
have to work on other stuff for some time.
Right now I just discovered that for some reason some part of the profile is
accounted to 'vma 00000000' instead of the vmlinux vma. But if I resolve the
symbol adresses manually in 'objdump -d' output, it makes a lot of sense.
Still, it sounds like more than 50% CPU is burnt in the actual PIO read from
the controller, and some 27% are in default_idle.
Comparing interrupt based and busy-wait based profiles shows almost zero
difference. busy-waiting in nand_wait_ready() is about 0.238 percent (!),
s3c2410_nand_devready() accounts for only 0.0069%
I believe the profiles in as far as there is little difference between
interrupt and busy-wait based NAND read.
What I still don't get is
1) why is there so much default_idle in the profiles but top says ~ 100%
2) why is there so much time in default_idle rather than s3c24xx_default_idle
3) why is the CPU idle that much time, given that the NAND chip supposedly is
> @nand read:
> - If we can not read faster, we can read fewer? The untested ideas are
> attached. On S3C2442 there is a 2nd ECC hardware register bank, by setting
> the ecc.size higher we might kill half of your read call's (if we can
> actually read 512 byte at a time...) and the second ECC register is actually
> doing something.
I don't think it changes much. As indicated before, the NAND flash caches an
entire page + oob (2048+64 = 2112) bytes internally, so you don't pay the
penalty of fetching the data twice from the actual NAND cells.
Also, they way how I understand the current code, it actually executes one
READ0 command and then just calls read_buf four times (256bytes each) plus
once for the 64bytes ECC.
So this actually should never read any data multiple times. It just fetches
different parts of the 2112 bytes page that was read into different buffers.
> maybe this sounds stupid or you already have tested these...
I haven't tested it,
> nice that you work on this
As indicated, it's unlikely I'll get back to this during the remaining week.
If anyone is interested in my interrupt patches, I'll attach them to this mail.
They're for testing only, not for merging them.
- Harald Welte <laforge at openmoko.org> http://openmoko.org/
Software for the world's first truly open Free Software mobile phone
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 0 bytes
Desc: not available
Url : http://lists.openmoko.org/pipermail/openmoko-kernel/attachments/20081021/65c41002/attachment.patch
More information about the openmoko-kernel