[PATCH] fix root cause of NAND trouble
Ben Dooks
ben-linux at fluff.org
Sun Nov 2 16:54:42 CET 2008
On Sun, Nov 02, 2008 at 11:25:05AM -0200, Werner Almesberger wrote:
> Ben Dooks wrote:
> > actually, thinking about it, we can probably get better code by doing:
> >
> > /* mop up any non-word aligned length reads. */
> > for (i = (len & ~3); i != len; i++)
> > ptr[i] = readb(info->regs + S3C2440_NFDATA);
>
> It looks nicer, but, surprisingly, it's one instruction longer and
> three instructions slower (in the normal case, i.e., with word
> alignment), see below (with the "buf" fix).
hmm, one, we should do:
void __iomem *data = info->regs + S3C2440_NFDATA;
and replace all uses of it in the function, the compiler is needlesly
reloading constant data.
as well as that, it looks like the /4 in readsl() is trying to round
up, probably changing it to
readsl(buf, data, len >> 2);
is a much better idea.
> Notation:
> ; <instruction count> +<instructions executed after readsl>
>
> - Werner
>
> ----- for (i = 0; i != (len & 3); i++) ------------------------------------
>
> s3c2440_nand_read_buf:
> @ args = 0, pretend = 0, frame = 0
> @ frame_needed = 1, uses_anonymous_args = 0
> mov ip, sp ; 1
> stmfd sp!, {r4, r5, r6, fp, ip, lr, pc} ; 2
> sub fp, ip, #4 ; 3
> sub sp, sp, #4 ; 4
> ldr r6, [r0, #536] ; 5
> mov r4, r2 ; 6
> ldr r0, [r6, #96] ; 7
> cmp r4, #0 ; 8
> add r2, r2, #3 ; 9
> movge r2, r4 ; 10
> mov r2, r2, asr #2 ; 11
> bic r3, r4, #3 ; 12
> add r0, r0, #16 ; 13
> add r5, r1, r3 ; 14
> bl __raw_readsl ; 15
> mov r2, #0 ; 16 +1
> b .L212 ; 17 +2
> .L213:
> ldr r3, [r6, #96] ; 18
> ldrb r3, [r3, #16] @ zero_extendqisi2 ; 19
> strb r3, [r5, r2] ; 20
> add r2, r2, #1 ; 21
> .L212:
> and r3, r4, #3 ; 22 +3
> cmp r2, r3 ; 23 +4
> bne .L213 ; 24 +5
> ldmfd sp, {r3, r4, r5, r6, fp, sp, pc} ; 25 +6
>
> ----- for (i = (len & ~3); i != len; i++) ---------------------------------
>
> s3c2440_nand_read_buf:
> @ args = 0, pretend = 0, frame = 0
> @ frame_needed = 1, uses_anonymous_args = 0
> mov ip, sp ; 1
> stmfd sp!, {r4, r5, r6, fp, ip, lr, pc} ; 2
> sub fp, ip, #4 ; 3
> sub sp, sp, #4 ; 4
> ldr r6, [r0, #536] ; 5
> mov r5, r2 ; 6
> ldr r0, [r6, #96] ; 7
> cmp r5, #0 ; 8
> add r2, r2, #3 ; 9
> movge r2, r5 ; 10
> mov r2, r2, asr #2 ; 11
> add r0, r0, #16 ; 12
> mov r4, r1 ; 13
> bl __raw_readsl ; 14
> bic r1, r5, #3 ; 15 +1
> add r4, r4, r1 ; 16 +2
> mov r2, #0 ; 17 +3
> b .L212 ; 18 +4
> .L213:
> ldr r3, [r6, #96] ; 19
> ldrb r3, [r3, #16] @ zero_extendqisi2 ; 20
> strb r3, [r4], #1 ; 21
> .L212:
> rsb r3, r1, r5 ; 22 +5
> cmp r2, r3 ; 23 +6
> add r2, r2, #1 ; 24 +7
> bne .L213 ; 25 +8
> ldmfd sp, {r3, r4, r5, r6, fp, sp, pc} ; 26 +9
The compiler 'optimisation' here is apalling, which gcc are you using?
--
--
Ben
Q: What's a light-year?
A: One-third less calories than a regular year.
More information about the openmoko-kernel
mailing list