investigating DMA based pixmap migration in Xglamo

Carsten Haitzler (The Rasterman) raster at
Thu Feb 21 02:20:05 CET 2008

On Mon, 18 Feb 2008 13:11:27 +0100 Dodji Seketeli <dodji at>

> Hello,
> I have been investigating the use of DMA to transfer pixmap from system
> ram to the video memory of the glamo chip, on gta02.
> Problem statement
> =================
> The bus that interconnects the glamo GPU to the CPU of the s3c2442 chip
> is very slow. Some measurements of the throughput when memcpy-ing data
> from system ram to the video ram showed that goes at 30Mbytes/sec,
> whereas doing copies in the system ram goes at around 130Mbytes/sec, if
> not higher.

i measured 160mb/sec system -> system memcpy()'s so 30 vs 160... sucketh
muchly :)

> Kaa acceleration architecture we are using allows us to use hardware
> capabilities to accelerate the transfer of pixmaps from system ram to
> video ram. That transfer is called pixmap migration in X lingo.
> On the other hand, the s3c2442 SoC has a DMA module that can perform
> data copies without using the processor. In our case, using that DMA
> module would not help us go faster than 30MB/sec because that speed is
> a bus limitation. It will however (hopefully) free the processor to do
> something else during the transfer.

indeedily :) the 30m/sec means that the cpu is locked into wasting cycles doing
the copy and really slows down uploads of data - and since xrender and such are
not accelerated yet (and at best can only be partly accelerated), anything
"fancy" will require lots of uploads. *IF* the x client and xserver are
implemented correctly, we get the uploads "for free", as long as the client has
something it can do while the upload happens (for example - re-calculate the
next frame of animation). i do know evas does this in its rendering model, so
if my benchmarking number are right, this might get about a 40-50% framerate
increase. not shabby. if you block and wait for the transfer before continuing,
we do free up the cpu to go idle or run some other process outside the x client
doing the drawing/upload and xserver, but you won't see framerate increases.

> What is needed
> ==============
> So we need the framebuffer driver to expose entry points to perform
> pixmap copies using the DMA module of the s3c2442 chip. That entry
> point would then be called from withing the Xglamo server to perform
> the pixmap migrations.
> What has been done
> ===================
> I spent the last week understanding the s3c2442 DMA module and kernel
> api. I started putting together an implementation of a DMA based pixmap
> copy. You can find the patch I have written attached to bug
> .
> There are a few gotchas in that patch, so here is what it does:
> 1/ it creates a new blocking ioctl (called
> FBIO_GLAMO_UPLOAD_PIXMAP_TO_VRAM) to the framebuffer device driver.
> That ioctl is meant to copy a pixmap that resides in system ram to a
> destination in video ram.
> 2/ the pixmap is first copied in an in-kernel buffer that is DMA
> friendly. The DMA transfer will then copy the pixmap from this
> in-kernel buffer to the destination, in vram. The ultimate
> implementation should get rid of this copy to in-kernel DMA friendly
> buffer. I needed to this for now, for the sake of simplicity, to get
> things going, and have a chance to have the whole chain working first.
> 3/ the pixmap is then tranfered to vram, _line by line_ using DMA.
> The line by line is important to notice here because it is bad
> performance wise. But as pixmap copy must be done inherently line by
> line, there is no easy way to that for now. Ultimately though, I should
> be using a bounce buffer in offscreen vram. A bounce buffer is a buffer
> allocated in offscreen (non visible) vram. The pixmap data would be
> bulk copied in there first. Then, when that is done, the driver will
> use the glamo blitter to do the proper copy of the pixmap (line by
> line) to the actual final destination in vram. That way, the DMA won't
> be done line by line, and the processor won't be used to do the copy
> either.
> 4/ I was obliged to hack the s3c24xx DMA api a bit to make it support
> the type of DMA transfer I needed. Actually there are two types of DMA
> transfer supported by the s3c2442 module: software mode, and hardware
> mode. In the software mode, the software basically triggers the
> transfer, whereas in the hardware mode, it is the device where the data
> is transfered to (or from) that triggers the transfer. That requires a
> special wiring between the device and the s3c2442 chip.
> In the case of the glamo chip though, from an s3c2442 DMA module
> perspective, accessing vram is like accessing normal memory, so there
> is not special wiring in place to do hardware more DMA. We must then do
> software mode DMA. Unfortunately, software mode DMA was not really
> supported by the s3c24xx DMA api that is in the kernel right now. So I
> hacked it a bit to support it. That is in the patch attached to the bug
> I referred to earlier; look in the file
> linux-2.6.24/arch/arm/plat-s3c24xx/dma.c.
> 5/ I wrote a test application named test-glamo-dma to test/debug the
> whole thing outside of X. Its OE package source is also attached to the
> bug.
> What needs to be done
> ======================
> Well, continue hacking on this and make it actually usable from an
> Xglamo perspective. When that is done, make Xglamo use it, and see if
> it is fast enough.
> That's all folks.
> Thanks for reading so far :-)
> Dodji.

sounds great. :) all the bits there - just need to work, then be streamlined :)

Carsten Haitzler (The Rasterman) <raster at>

More information about the openmoko-devel mailing list