[PATCH 0/2] Improve GTA02 NAND read performance by 41%

Harald Welte laforge at openmoko.org
Mon Oct 20 20:46:45 CEST 2008


Hi!

As part of Swisscom's efforts to speed up the boot process, I discovered
that the NAND timings that GTA02 uses are very suboptimal.  The actual
access cycle for one byte is 45ns, but current u-boot and kernel code
use access cycles up to 190ns per byte.

The NAND timing calculations are as follows (values given for timings
after my patches are applied):

tR = 25uS
tACLS = 0ns
tWRPH0 = 30ns
tWRPH1 = 20ns

cmd_addr_cycle = tACLS + tWRPH0 + tWRPH1 = 50ns
data_cycle = tWRPH0 + tWRPH1 = 50ns
read = 0x00 + 5*cmd_addr_cycle + 0x30 + tR + (2048+64)*data_cycle
read = 7 * cmd_addr_cycle + 25us + 2112 * data_cycle
read = 7 * 50ns + 25us + 2112 * 50ns 
read = 130.95us (15.639 MByte/sec)

The latter is the theoretical maximum read performance of the NAND
flash that GTA02 uses.

If we use the timings of the various existing bits of code, then we get the
following results: 

read_old_uboot (30/80/80) 	364.25us	 5.623 MByte/sec 
read_old_kernel (30/70/30)	237.11us	 8.637 MByte/sec
read_new_kernel (0/30/20)	130.95us	15.639 MByte/sec
theoretical (0/25/15)		120.36us	17.016 MByte/sec

Therefore, by using the correct timings, I expect a 81% improvement
of the theoretical read performance.

But lets look at some measurements:

old 2.6.24 kernel:
========================
s3c2440-nand s3c2440-nand: Tacls=3, 30ns Twrph0=7 70ns, Twrph1=3 30ns
root at om-gta02:/dev# time dd if=/dev/mtd6 of=/dev/null
505088+0 records in
505088+0 records out
real    0m 54.16s
user    0m 0.48s
sys     0m 53.25s
root at om-gta02:/dev# time dd bs=2048 if=/dev/mtd6 of=/dev/null
126272+0 records in
126272+0 records out
real    0m 51.05s
user    0m 0.08s
sys     0m 50.37s
========================
result (based on 512byte dd): 505088*512 = 252.544 MByte => 4.646MByte/sec

Thus, the actual performance is somewhere around 53% of the theoretical
speed, given the timings of the old (current) GTA02 kernel source.  This
is disappointing, and requires further investigation.

new 2.6.24 kernel (using my timing related patches):
========================
s3c2440-nand s3c2440-nand: Tacls=1, 10ns Twrph0=3 30ns, Twrph1=2 20ns
root at om-gta02:~# time dd if=/dev/mtd6 of=/dev/null 
505088+0 records in
505088+0 records out
real    0m 38.31s
user    0m 0.36s
sys     0m 37.93s
root at om-gta02:~# time dd bs=2048 if=/dev/mtd6 of=/dev/null 
126272+0 records in
126272+0 records out
real    0m 35.18s
user    0m 0.12s
sys     0m 35.03s
========================
result (based on 512byte dd): 6.592MByte/sec	(41.9% speed-up)

So instead of a calculated expected 81% improvement, we only get 41.9%.

Still quite significant.

Comparing the theoretical throughput for the new timings with the actual
throughput of the new timings, we only get to 42% of what should be possible.

In order to see how much effect hardware ECC has, I tried a kernel with
'hardware_ecc=1' in the bootargs:

new 2.6.24 kernel with hwecc:
========================
root at om-gta02:~# time dd if=/dev/mtd6 of=/dev/null
505088+0 records in
505088+0 records out
real    0m 27.46s
user    0m 0.37s
sys     0m 27.09s
========================
result (based on 512byte dd): 9.197MByte/sec	(98% speed-up)

So with hardware-ECC and the new timings we get a 98% speed-up compared
to the original kernel.

And comparing new timings with soft-ecc and hard-ecc, we see a 39% improvement.

Finally, using HWECC we get to 58% of the theoretical throughput.  This is
already good, but there's still a software bottleneck somewhere.

Furthermore, the CPU load during NAND read is still close to 100%.  To some
extent, this is expected.  The S3C24xx NAND controller cannot do DMA and
thus we need to read each word from the controller (PIO).

However, I don't think that all of the time is spent copying data, but rather
polling for when data is finished. The s3c244x (not 2410) support a RnB
interrupt which should solve this issue.

The mainline kernel NAND code doesn't have infrastructure for this yet,
but I'm working on this right now.

In any case, I'd recommend to test+apply my patches. 41.9% increased NAND
performance are probably of good use to every GTA02 user :)

Cheers,
-- 
- Harald Welte <laforge at openmoko.org>          	        http://openmoko.org/
============================================================================
Software for the world's first truly open Free Software mobile phone
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.openmoko.org/pipermail/openmoko-kernel/attachments/20081020/a4a457ce/attachment.pgp 


More information about the openmoko-kernel mailing list