Some questions I always wanted to ask (Glamo)

Carsten Haitzler (The Rasterman) raster at rasterman.com
Wed Apr 22 05:26:22 CEST 2009


On Wed, 22 Apr 2009 02:29:48 +0200 Rask Ingemann Lambertsen <rask at sygehus.dk>
said:

> On Mon, Apr 20, 2009 at 10:58:30PM +1000, Carsten Haitzler wrote:
> 
> > the 2d unit also lacked large blobs of a 2d pipeline to
> > accelerate beyond the most basic x ops (blit, fill).
> 
>    Does it not even do line drawing for the width=0 case?

i dont even bother with linedrawing. do you know how LITTLE it is actually used
these days? i didn't even pay attention to that. but it does to copies (with
limits), fills (with limits), masked fills (with limits), alpha blends (With
lots of limits), and image scaling (with limits). no arbitrary rotation, no
ARGB dest support (onlt 16bit dest) no separate alpha masks for r,g and b (thus
aa text with sub-pixel rendering is not possible), only 4bit mask (need 8bit
for xrender and xft accel) etc. etc.

> [Glamo bandwidth]
> > no idea. i had no idea from the paperdocs either. until much later when i
> > started going "why is this so slow?" and actually did some benchmarks...
> 
>    How much is known about the bandwidth limitation? Did you benchmark more
> than just plain memcpy()? For example, could you do e.g. a YUV->RGB
> conversion at the same speed as plain memcpy() by spreading out the memory
> accesses? Is the slowness caused by lack of memory bandwidth inside the
> Glamo (e.g. because not much is left after display refresh) or it is the
> external bus interface which is the bottleneck? Are any logic analyzer
> traces available?

dont know why its slow. frankly smedia were saying you should get 2-3m/sec
access and we were getting 7. i specifically asked them what kind of
read/.write bandwidth should we expect. i remember them saying "2-3m/s". when
you are already tripling the vendors expectations of perfromance... you accept
it as a limit and move along. :)

but no - i havent tried different memcpy patterns, but the case is the same
for reads and writes to/from video memory. in the end x uploads (and downloads)
data as a memcpy - app prepares the data - x uplaods from shm segment. doing a
yuv->rgb while you copy is not useful as client processes are unable to do that
as they have no direct access. glamo already has yuv->rgb conversion and
scaling hardware that is accelerated via XV - there is little point to doing it.

as for other cases - evas has a rgb32->16+dither+copy routine so it can
convert, dither and copy all in one. i saw no signficant improvements over
multi-stage (convert+dither to a mem buffer, them copy mem  buffer to fb).
evas's routine only works when drawing to the fb directly. when in x it becomes
a 2 stage copy. the speedup i see is what i expect anywhere which is simply
better cache coherency by not needing an intermediate buffer. so to me it
looks like the writes to video memory (in this case) simply stall the cpu in
waitstates until the glamo ok's the writes.

> > glamo has no planned
> > successor. and that made me doubt there was much value in doing full
> > drivers as they have no lifespan beyond the gta02. they will never be used
> > again anywhere - they will likely curl up and die in a corner of the
> > internet :(
> 
>    But now that SMedia Tech was bought by ITE Tech, has there been any
> contact with ITE Tech wrt. releasing the Glamo documentation?

no idea


-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    raster at rasterman.com




More information about the Gta03 mailing list