Centralization of graphical awesomeness

Gennady Kupava gb at bsdmn.com
Wed Oct 28 22:19:26 CET 2009

On p. 1.

Why not to make some 'viewport' instead of moving objects? This way, it
is possible to render whole background, then whole moved 'contents', and
finally scrolling we'll have only one blit operation per redraw. To take
care of animation, it's possible to store list of 'animated' areas,
where rendering must be redone each time, and this will slow down things
only then that animation really needed.

So, by 'pre-render' I mean rendering 'contents' in advance.

I looked to qt and see that it is almost as compicated as E:
a. background is not moving
b. the 'selected' item in list is highlighted with transparent gradient
rectangle which is fading from black to green on selection.
c. if i launch something I would get transparent 'clock' with animation,
with menu still moving behind it.
d. It slows much then moving selection.

On p. 2. 

I wish someone who knows hardware to answer. Why memory copy looks so
slow? Is this situation will be same on gta02-core? I've run test on
nokia 710 and got 400Mb/s...

On p. 3.:

Really, such behavour of sliding top item (having noting below it while
it desappears) is 100% visual bug. Will you handle it? How within
current concept of 'clearing cache'?

Sure I know from where everything came from on any computer :) But you
want to tell that redecoding and rereading png from slow device worth
it? Which memory footprint will have E it we'll completely desable
removeing things from that cache until they will be destroyed by their
owners? And btw, how to change size of cache?

On p. 5:

Really, we have kernel which operates at 200Hz, so per slice we can work
with according to my computation -  34Mb/200 170Kb of memory, or
400Mhz/200 = 2 millon operations. This enough to make context switching
cache cleanup important. People report as 0.25% runtime.

On p. 7:

'Work on' and 'have target' are different things in many cases. :)

On p. 8:

All this is about making money, not good things. Crap :)

On p. 9:

As I already wrote, it's never 'doesn't matter', at least for me. I am
still using wmaker on 2Ghz Xeon at home. Why? Because it has latency
Xms, while gnome have 10*X ms. And, unfortunately for me, I can notice
this. :)


В Срд, 28/10/2009 в 01:58 +1100, Carsten Haitzler пишет:
> On Tue, 27 Oct 2009 17:11:08 +0300 Gennady Kupava <gb at bsdmn.com> said:
> > I am sorry, but my letter is not about trolling and blaming but about
> > optimization, qt and e, speed is interesting for me, not blaming. Calm
> > down guys! I've numbered separate points overwise my letter will look
> > endless :)
> > 
> > 1. First, bit about qt scrolling - It's not so simple Carlsen want it to
> > be. I see background image, rendered text and 1-2 relativaly small image
> > each line. "Apllications" menu have ~40 entries. All scrolling very
> > smooth, and no rectangles where. Carlsen, have you run qtmoko? Buttons
> > are changing only then you press them. What prevents E from prerendering
> > contents of scrollable area, it is not changing on the fly? Lack of this
> > optimization makes menus and scrollable areas much slower. 
> scrolling isnt any special operation in efl. it's moving some objects around.
> that's all it is. a scroller just moves its child around. moving an object
> queues redraws for previous and current positions. evas' merges all redraws at
> render time and just does them. it will avoid drawing things that will be
> later overwritten by solid pixels. as long as it knows that they are solid (eg
> solid rect, image without an alpha channel etc.) scrolling is done very
> differently. you can't "pre-render" as they get rendered on the fly. everythign
> does. evas has caches to save copies of scaled images (if smooth scaled) to
> save computation making the smooth when scaling on every redraw. but it's still
> a draw. this is done this way because it is increidbly flexible. you get the
> ability to have translucent items and all sorts of goodies. a draw in the end
> is a copy from some source and a write to a dest in evas. the more reads you do
> and writes - the worse it gets. worse is alpha blend as its read source, dest
> then write to dest (after some calculations).
> now... if your list in elementary had NO backgroun except the selecte item ALL
> it woudl do it draw the changes in test items - ie fill in the background
> (solid color would be writes onlt, image woudl be read then write) and then
> draw text on top (an alpha blend op with source data being only 8bit alpha).
> and this only for where the text is.
> for qte/qtopia/qtwhatever it is called now, if you have a background that moves
> with the text that scrolls, then it is a simple copy (copy current area up N
> pixels or down) thus a read and write, then draw new area. if the display is
> with a static bg and scrolling text - it's the same as evas. evas's scroling is
> ONLY this method. if you configured the theme to be the same as qt (from memory
> it was solid black bg's etc.) you'd end up with approximately the same speed.
> evas would do a bit more work as it'd alpha blend the text, but it would avoid
> copying areas of the list that didnt change (eg strings dont fill the entire
> line and only part of it).
> it's Carsten btw. "t" not "l" :) and i have run qtmoko... before the freerunenr
> was even out. i tememebr it being orange for selected list items (rectangle),
> empty if not (just text) and a greyish "qt" logo background with some visible
> dither patterns on scale up.
> > 2. Second point of my letter was that Glamo seem should not be blamed
> > for everything. I wrote simple program to measure simple memcpy speed on
> > om... This program just allocates 2 buffers of defined size and outputs
> > count of memcpys of defined size it did in 1 second (interrupt via
> > alarm()). Initally I want to see how arm cache cleanup and task switch
> > influences parallel memory access tasks. Result were surprising for me:
> glamo is one of the big problems. a write to video memory - eg a new screen
> fram is.. based on your numbers below, about 1/5h the speed. it is as IF you
> copied 5x the data from memory to memory. thats a heavy cost. 
> > OM:
> > buffer_size average_number computed_throughput
> > 128 1260880 153Mb/s
> > 256  540900 132Mb/s
> > 512  252399 123Mb/s
> > 1024 121988 119Mb/s
> > 2048  58827 114Mb/s
> > 4096  29000 113Mb/s
> > 8192  14000 109Mb/s
> > 16384  3660 57Mb/s
> > 32768  1105 36Mb/s
> > 65536   553 34.5Mb/s
> > 131072  274 34.2Mb/s
> > 262144  135 33.7Mb/s
> > 524288   69 34.5Mb/s
> only the last really counts. the first are just caching effects.
> > I did same test on my very-old-Celeron 600 router:
> > 256     2522958         615Mb/s
> > 512     2088723         1019Mb/s
> > 1024    1554162         1571Mb/s
> > 2048    1019996         1992Mb/s
> > 3072    762667          2234Mb/s
> > 4096    109489          427Mb/s
> > 16384   27389           427Mb/s
> > 262144  318             79Mb/s
> > 524288  151             75Mb/s
> > 1048576 76              76Mb/s
> yes. better. the 2442 in the gta02 is an ooold arm cpu. it's not too modern.
> given when the gta02 was released... it'd like making a pentium4 laptop and
> releasing it and selling it as new in todays shops.
> > On desktop (xeon 2Gz):
> > x64 binary:
> > 26214400 74              1850 Mb/s
> > 262144   31971           7992 Mb/s
> > 256      53994512        13232 Mb/s
> >                      
> > x32 binary: 
> > 26214400        59                 1475 Mb/s
> > 262144          29068              7267 Mb/s
> > 256             20810406           5080 Mb/s
> > 
> > Old arm-based device at my work (at91rm9200, 180Mhz)
> > 2560000 16	39Mb/s
> > 256000	167	40Mb/s
> > 256 	418044	102Mb/s
> yup. even that is better. :)
> > So, we can see that we have speed of 34Mb/s (it's ever only 5 times
> > declared 7Mb/s for Glamo!) can someone comment? why memcpy is so slow -
> well it possibly is partly memcpy itself. i'd have to check but you may be able
> to improve it with some asm. i know on newer arms you really want to use vfp or
> neon especially for memcpy's - u can get something like 2x the speed.
> > it is 2 times slower than ancient celeron, and on par with very old
> > arm-based machine, it is not related to glamo anyhow! We can even skip
> > results with cache, where om 10 times slower old machine.
> correct. never claimed the cpu was fantastic :)
> > 3. ... e - every N seconds (see config dialogs for what it is set
> > to there, but let's say 60 seconds) will flush caches. ... and things are
> > having to be repopulated from disk ...
> > 
> > >From disk?! This is cost or having small memory footprint? This looks very
> > >wrong.
> where do u think images come from ? the arrows on buttons? the buttons
> themselves? icons? all that data comes from a disk - from a file on disk. if it
> isnt needed anymore (it's invisible) and it cycles into speculative cache, it
> can be flushed. eg.
> a png icon on disk might be 24kb - in ram it becomes 64kb. those 64kb's add up.
> this is a tunable thand can be disabled - if you want. but this keeps memory
> usae low. the default cache fir e is 4mb of ram. it will keep the most recent
> decoded images in there. (For images - more caches for other things). it's
> loaded then kept - dereferenced when not visible. if you want to be sure,
> strace and see when its actually opening files.
> > 3. ... Actually, yes the GTA01 is very noticeably faster in
> > graphics. ...
> > Can you expose a bit more details: How much it is faster: x2 times, x3,
> > x1.5, x1.2?
> ask kenyoung. he said it:
> "Actually, yes the GTA01 is very noticeably faster in graphics.
> I've got both, and I've run 'em side-by-side.   The glamo actually
> is a graphics DEcellerator.   That's why GTA02-core is kicking it out."
> quoted from his mail to this list yesterday. i dont have a gta01 anymore. it
> went missing at some point, but i'd believe it.
> > 4. ... for every second spent uploading contents to glamo, you CANNOT
> > spend it calculating a new fram. ... 
> > Yes, this is bad... But qt works :)
> efl works too. see above. not comparing apples vs apples.
> > 5. ... that's because you have 2 processes competing for the cpu to
> > render. ...
> > My measurements of parallel memcpy showed that this is neglibible.
> that will be totally wrong. 2 processes doing 2 copies from 2 different source
> and destination addresses WILL have performance suffer - likely less than half
> performance as you'd have cache flushing between context switches etc. i'd say
> the bencmarking method is not right? you can't have 2 processes compete for
> memory and both get all they want when it's a limited resource and 1 cpu only
> to share around. and my statement was not just cpu, but x ang glamo. x will be
> sharing requests from 2 processes at once trying to draw to it. and... i stated
> IO. :)
> > 6. ... you wont find routines for rendering faster in most of the
> > world. ...
> > I will. I can recall you previous posts on the topic.
> go for it. they exist. but not in most of the world. in some corners. find a
> faster alpha blender or a faster super-sampling/sub-sampling scaler or... :) (i
> am excluding neon asm here as its off topic and not on gta02, but if you had
> something with neon you wouldnt have this conversation as performance would be
> fine)
> > 7. but.. if i were smart.. i'd not develop apps for the freerunner. it's
> > a "dead product". it has no more being produced. it has no evolution
> > path. there won't be a gtao3, 04, 05 etc. everyone quit or was fired/let
> > go from om that worked on phones.. or worked on pretty much anything.
> > your future is other devices.. and these don't suck with EFL. i'd not
> > compromise the future if i were smart.
> > 
> > Frankly speaking you never developed for GTA02, yes? you aim seem always
> > were in future, and this is ok. I am sure that for example scrolling
> > area pre-rendering if good for future.
> wait? never? maybe you forget. i WORKED at openmoko before gta02 was released,
> during and after. i very much did develop on and for it. i was kicking glamo
> around long before it was sold. there were REASON why i punted questions to
> this list like "what would you guys say if we dropped to qvga?" as there was a
> replacement lcd with the same dimensions but qvga. i stopped fiddling with my
> gta02 some time late last year/early this year as i got hardware that is much
> better in my hands.
> > 8. ... most games i know of are written to work on the highest end
> > graphics cards at the time. why? ...
> > Best games are written with other objectives in mind, this games are
> > really interesting for anyone from time to time and for sure will live
> > in ages (chess, nethack and so), our grandchilds will play nethack, be
> > sure. Is it better to make pefrect things? 
> > And optimization is always good - you can feel that 10ms latency and
> > 100ms latency is different even both are more than enoght for UI, but
> > you feel that 10ms latency is much better.
> ok. talking different worlds of games here. i'm talking the ones that come out
> for ps3, xbox, and the pc games you buy on a shelf - not "chess" or
> "solitaire". regardless.. there's a multi-billion dollar industry for the
> quakes of this world. not so much for chess or solitaire :) so maybe i didnt
> explain that well - i apologize. i was thinking THESE as games, not chess
> etc. :)
> > 9. ... BUSINESS CHOICE ...
> > Everyone here follows it's goals. Carlsen make E. Other want to do
> > hardware. Others want to use free hardware. Others want to increase
> > development skills and hack that HW. Others just feel fun reading this
> > book. Others have this job. Someone even makeing money from OM. ;) All
> > this is ok, and I see nothing bad on making some great E developer to
> > think a bit about optimizations - nobody loose from optimizing of E and
> > writing a bit of technical descriptions :)
> trust me. optimisation is what i do. i have an xrender engine for evas. it's
> complete. it does everything. why isn't it used? because my own software
> rendering code has outperformed xrender year on year. i am still waiting for
> xrender with its partial hardware or claimed "full hardware acceleration" to
> beat the software i wrote. i  have been waiting for years. i have an OpenGL and
> GLES engine. i have benchmark suites that compare engines.. apples vs apples.
> they do the exact same operations. the same drawing (within the limits of their
> system). and yes - OpenGL on my desktop (Nvidia 8600GTS vs core2-duo 3ghz).
> opengl... is 2x the speed of software. but considering thats software... thats
> not too bad. a modern high end dedicated gpu is only doing 2x the software
> speed. i know something of optimising. i know something of playing tricks to
> avoid work. in fact evas is avoiding work all over the place. but none of the
> themes are apples vs apples. i know just where evas has performance problems,
> and some of them i just chalk up to "well.. it is simply not worth my time and
> effort to try as frankly.. the problem is already solved - newer systems are
> fast enough were it "doesn't matter"". some others its more a matter of just
> not pushing efl so far. if you have to sit and compare. make sure your
> comparison is fair. apples vs apples. 

More information about the community mailing list