[PATCH 0/2] Improve GTA02 NAND read performance by 41%

Mon Oct 20 22:29:24 CEST 2008

Wow, this is a great discovery even if there still is room for improvement..!

Does this only concern GTA02 or can a similar thing be applied to GTA01 as well?

/Micael

On Mon, Oct 20, 2008 at 8:46 PM, Harald Welte <laforge at openmoko.org> wrote:
> Hi!
>
> As part of Swisscom's efforts to speed up the boot process, I discovered
> that the NAND timings that GTA02 uses are very suboptimal.  The actual
> access cycle for one byte is 45ns, but current u-boot and kernel code
> use access cycles up to 190ns per byte.
>
> The NAND timing calculations are as follows (values given for timings
> after my patches are applied):
>
> tR = 25uS
> tACLS = 0ns
> tWRPH0 = 30ns
> tWRPH1 = 20ns
>
> cmd_addr_cycle = tACLS + tWRPH0 + tWRPH1 = 50ns
> data_cycle = tWRPH0 + tWRPH1 = 50ns
> read = 0x00 + 5*cmd_addr_cycle + 0x30 + tR + (2048+64)*data_cycle
> read = 7 * cmd_addr_cycle + 25us + 2112 * data_cycle
> read = 7 * 50ns + 25us + 2112 * 50ns
> read = 130.95us (15.639 MByte/sec)
>
> The latter is the theoretical maximum read performance of the NAND
> flash that GTA02 uses.
>
> If we use the timings of the various existing bits of code, then we get the
> following results:
>
> read_old_uboot (30/80/80)       364.25us         5.623 MByte/sec
> read_old_kernel (30/70/30)      237.11us         8.637 MByte/sec
> read_new_kernel (0/30/20)       130.95us        15.639 MByte/sec
> theoretical (0/25/15)           120.36us        17.016 MByte/sec
>
> Therefore, by using the correct timings, I expect a 81% improvement
> of the theoretical read performance.
>
> But lets look at some measurements:
>
> old 2.6.24 kernel:
> ========================
> s3c2440-nand s3c2440-nand: Tacls=3, 30ns Twrph0=7 70ns, Twrph1=3 30ns
> root at om-gta02:/dev# time dd if=/dev/mtd6 of=/dev/null
> 505088+0 records in
> 505088+0 records out
> real    0m 54.16s
> user    0m 0.48s
> sys     0m 53.25s
> root at om-gta02:/dev# time dd bs=2048 if=/dev/mtd6 of=/dev/null
> 126272+0 records in
> 126272+0 records out
> real    0m 51.05s
> user    0m 0.08s
> sys     0m 50.37s
> ========================
> result (based on 512byte dd): 505088*512 = 252.544 MByte => 4.646MByte/sec
>
> Thus, the actual performance is somewhere around 53% of the theoretical
> speed, given the timings of the old (current) GTA02 kernel source.  This
> is disappointing, and requires further investigation.
>
> new 2.6.24 kernel (using my timing related patches):
> ========================
> s3c2440-nand s3c2440-nand: Tacls=1, 10ns Twrph0=3 30ns, Twrph1=2 20ns
> root at om-gta02:~# time dd if=/dev/mtd6 of=/dev/null
> 505088+0 records in
> 505088+0 records out
> real    0m 38.31s
> user    0m 0.36s
> sys     0m 37.93s
> root at om-gta02:~# time dd bs=2048 if=/dev/mtd6 of=/dev/null
> 126272+0 records in
> 126272+0 records out
> real    0m 35.18s
> user    0m 0.12s
> sys     0m 35.03s
> ========================
> result (based on 512byte dd): 6.592MByte/sec    (41.9% speed-up)
>
> So instead of a calculated expected 81% improvement, we only get 41.9%.
>
> Still quite significant.
>
> Comparing the theoretical throughput for the new timings with the actual
> throughput of the new timings, we only get to 42% of what should be possible.
>
> In order to see how much effect hardware ECC has, I tried a kernel with
> 'hardware_ecc=1' in the bootargs:
>
> new 2.6.24 kernel with hwecc:
> ========================
> root at om-gta02:~# time dd if=/dev/mtd6 of=/dev/null
> 505088+0 records in
> 505088+0 records out
> real    0m 27.46s
> user    0m 0.37s
> sys     0m 27.09s
> ========================
> result (based on 512byte dd): 9.197MByte/sec    (98% speed-up)
>
> So with hardware-ECC and the new timings we get a 98% speed-up compared
> to the original kernel.
>
> And comparing new timings with soft-ecc and hard-ecc, we see a 39% improvement.
>
> Finally, using HWECC we get to 58% of the theoretical throughput.  This is
> already good, but there's still a software bottleneck somewhere.
>
> Furthermore, the CPU load during NAND read is still close to 100%.  To some
> extent, this is expected.  The S3C24xx NAND controller cannot do DMA and
> thus we need to read each word from the controller (PIO).
>
> However, I don't think that all of the time is spent copying data, but rather
> polling for when data is finished. The s3c244x (not 2410) support a RnB
> interrupt which should solve this issue.
>
> The mainline kernel NAND code doesn't have infrastructure for this yet,
> but I'm working on this right now.
>
> In any case, I'd recommend to test+apply my patches. 41.9% increased NAND
> performance are probably of good use to every GTA02 user :)
>
> Cheers,
> --
> - Harald Welte <laforge at openmoko.org>                   http://openmoko.org/
> ============================================================================
> Software for the world's first truly open Free Software mobile phone
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
>
> iD8DBQFI/NIVXaXGVTD0i/8RAugHAKCkgwN8n88vWQ2Zm+bBlytskEa+ywCdEpxg
> l0+kMHMIlp3p/wd4l4Px6uY=
> =EVl2
> -----END PGP SIGNATURE-----
>
>