I have been investigating the use of DMA to transfer pixmap from system
ram to the video memory of the glamo chip, on gta02.

Problem statement

The bus that interconnects the glamo GPU to the CPU of the s3c2442 chip
is very slow. Some measurements of the throughput when memcpy-ing data
from system ram to the video ram showed that goes at 30Mbytes/sec,
whereas doing copies in the system ram goes at around 130Mbytes/sec, if
not higher.

Kaa acceleration architecture we are using allows us to use hardware
capabilities to accelerate the transfer of pixmaps from system ram to
video ram. That transfer is called pixmap migration in X lingo.

On the other hand, the s3c2442 SoC has a DMA module that can perform
data copies without using the processor. In our case, using that DMA
module would not help us go faster than 30MB/sec because that speed is
a bus limitation. It will however (hopefully) free the processor to do
something else during the transfer.

What is needed

So we need the framebuffer driver to expose entry points to perform
pixmap copies using the DMA module of the s3c2442 chip. That entry
point would then be called from withing the Xglamo server to perform
the pixmap migrations.

What has been done

I spent the last week understanding the s3c2442 DMA module and kernel
api. I started putting together an implementation of a DMA based pixmap
copy. You can find the patch I have written attached to bug .

There are a few gotchas in that patch, so here is what it does:

1/ it creates a new blocking ioctl (called
FBIO_GLAMO_UPLOAD_PIXMAP_TO_VRAM) to the framebuffer device driver.
That ioctl is meant to copy a pixmap that resides in system ram to a
destination in video ram.

2/ the pixmap is first copied in an in-kernel buffer that is DMA
friendly. The DMA transfer will then copy the pixmap from this
in-kernel buffer to the destination, in vram. The ultimate
implementation should get rid of this copy to in-kernel DMA friendly
buffer. I needed to this for now, for the sake of simplicity, to get
things going, and have a chance to have the whole chain working first.

3/ the pixmap is then tranfered to vram, _line by line_ using DMA.
The line by line is important to notice here because it is bad
performance wise. But as pixmap copy must be done inherently line by
line, there is no easy way to that for now. Ultimately though, I should
be using a bounce buffer in offscreen vram. A bounce buffer is a buffer
allocated in offscreen (non visible) vram. The pixmap data would be
bulk copied in there first. Then, when that is done, the driver will
use the glamo blitter to do the proper copy of the pixmap (line by
line) to the actual final destination in vram. That way, the DMA won't
be done line by line, and the processor won't be used to do the copy

4/ I was obliged to hack the s3c24xx DMA api a bit to make it support
the type of DMA transfer I needed. Actually there are two types of DMA
transfer supported by the s3c2442 module: software mode, and hardware
mode. In the software mode, the software basically triggers the
transfer, whereas in the hardware mode, it is the device where the data
is transfered to (or from) that triggers the transfer. That requires a
special wiring between the device and the s3c2442 chip.
In the case of the glamo chip though, from an s3c2442 DMA module
perspective, accessing vram is like accessing normal memory, so there
is not special wiring in place to do hardware more DMA. We must then do
software mode DMA. Unfortunately, software mode DMA was not really
supported by the s3c24xx DMA api that is in the kernel right now. So I
hacked it a bit to support it. That is in the patch attached to the bug
I referred to earlier; look in the file

5/ I wrote a test application named test-glamo-dma to test/debug the
whole thing outside of X. Its OE package source is also attached to the

What needs to be done

Well, continue hacking on this and make it actually usable from an
Xglamo perspective. When that is done, make Xglamo use it, and see if
it is fast enough.

