nWAITing for Glamo

Carsten Haitzler (The Rasterman) raster at openmoko.org
Tue Jul 15 19:01:27 CEST 2008


On Tue, 15 Jul 2008 10:39:01 +0100 Andy Green <andy at openmoko.com> babbled:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Somebody in the thread at some point said:
> 
> | | actually worse as you also have dma setup overhead etc. at best it
> | seemed to
> | | pull about half the throughput of copying with the cpu, and gained zero
> |
> | Something funny there then, it shouldn't've been any worse.  If most of
> | the trouble is coming from Glamo just covering its ears it might not
> | have made much odds but should only have been the same or faster.
> 
> Carsten are you sure you tested the same thing each time?  It sounds
> like the first time you spammed Glamo memory with a constant, so the
> memory bus was dedicated to Glamo access actions and the CPU ran from
> cache (and had its constant from CPU register).

both times we were trying to
1. test throughput - write to the glamo as fast as possible. in this case we
were in waitstates using the cpu waiting for the glamo to accept all our
writes. this is one of the big performance issues - when doing redraws of a big
region of the screen an upload may be 200kb up to 600kb - for 1 frame. rinse
and repeat. we just get stuck waiting for glamo.
2. test throughput with DMA to see if it is at least the same as without DMA -
result was in the BEST case (largest transfer sizes) it had about 1/2 the
throughput of writing with the CPU, and degraded more as the dma block sizes
went down. so even at a best case it was a performance loss. at the SAME time
we wanted to see if the memory bus was locked for the SOC while doing this - it
seemed to be as any other memcpy()'s or memory accesses for the SOC were
blocked until DMA was done. so really no gain at all in any way :(

> But when you did DMA, by definition it is a bitblit action where you
> both read the bitmap from external RAM and then copy / write it to
> Glamo.  Then it makes sense you only find half the throughput since the
> external bus experiences twice the traffic, half getting the data from
> our DRAM and half writing it to Glamo RAM and it can only do one at a time.

well as such the data sizes should never have fitted in cache anyway - so no
matter how u look at it - fetch with cpu or with dma from "external ram" and
then write to glamo are fundamentally the same.

> - -Andy
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
> Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org
> 
> iEYEARECAAYFAkh8cDAACgkQOjLpvpq7dMoqLQCfelVRHKCHGY1A714JMpr2kzQ0
> qAwAn1q7FbI2kwEPOP2hO651zmGCP3fJ
> =F3jA
> -----END PGP SIGNATURE-----


-- 
Carsten Haitzler (The Rasterman) <raster at openmoko.org>




More information about the openmoko-kernel mailing list