Centralization of graphical awesomeness
Carsten Haitzler (The Rasterman)
raster at rasterman.com
Thu Oct 29 02:03:48 CET 2009
On Thu, 29 Oct 2009 00:19:26 +0300 Gennady Kupava <gb at bsdmn.com> said:
> On p. 1.
> Why not to make some 'viewport' instead of moving objects? This way, it
> is possible to render whole background, then whole moved 'contents', and
> finally scrolling we'll have only one blit operation per redraw. To take
> care of animation, it's possible to store list of 'animated' areas,
> where rendering must be redone each time, and this will slow down things
> only then that animation really needed.
i could add such objects, but i'm not going to. why? because the don't work
well with opengl for starters, and because it will be a LOT of work changing
everything to sue them AND they will consume more memory. and finally... the
time for needing to do so for me has long passed. i am busy enough with evas
and efl and just don't have this time. i have to prioritize what i do and this
is low on a priority list.
> So, by 'pre-render' I mean rendering 'contents' in advance.
> I looked to qt and see that it is almost as compicated as E:
> a. background is not moving
> b. the 'selected' item in list is highlighted with transparent gradient
> rectangle which is fading from black to green on selection.
> c. if i launch something I would get transparent 'clock' with animation,
> with menu still moving behind it.
> d. It slows much then moving selection.
so it has its slownesses too (move the selection). :) but all the items drom
memory are BLANK except the selected one. not so in the themes for SHR and
default. make themes equal and then we can compare.
> On p. 2.
> I wish someone who knows hardware to answer. Why memory copy looks so
> slow? Is this situation will be same on gta02-core? I've run test on
> nokia 710 and got 400Mb/s...
can't comment on the 710 - but 400m/s seems pretty high to me. for that level
of device. something doesn't seem right in the benchmark / timing etc.? (even
the s3c56410 i have clocks in at 150 or 170m/s memcpy() speeds (according to
lmbench - you should try lmbench. at least its a good standard that runs pretty
> On p. 3.:
> Really, such behavour of sliding top item (having noting below it while
> it desappears) is 100% visual bug. Will you handle it? How within
> current concept of 'clearing cache'?
as i said before alreayd... its a CONFIG VALUE - its configuration. it's not
hard coded! you can determine how long between flushes and how big caches are!
little sliders to do it all for you under settings->advanced->performance
(advanced mode). i won't handle anything as its already a user tunable
if the delays are not the re-loading from disk (io) then they may be in-memory
cache population (eg of scaled versions of images) to speed up future draws -
though these also get flushed in the above flushing... but.. if it's not.. then
its either the xserver pausing/halting internally for some reason or the kernel
hiccuping handling some interrupts etc. i don't know which it is and i don't
intend to sit down and find out. i'm pointing at likely candidates. i don't
have the time here to sit and trace at the code level kernel, xserver, and
app thats drawing when i am pretty certain such pauses are "not my bug". if its
cache flushing - it's tunable. if its something else...
> Sure I know from where everything came from on any computer :) But you
> want to tell that redecoding and rereading png from slow device worth
> it? Which memory footprint will have E it we'll completely desable
> removeing things from that cache until they will be destroyed by their
> owners? And btw, how to change size of cache?
ugh.. see above... it's a config value it's tunable. you can ask for 50mg of
cache and set flushes to every few hours instead of 60 seconds! meemory
footprint will expand by the size of the cache set. if 50 m - then e will grow
by that amount. well actually more as the scale cache also shares size with the
normal cache - so possibly 100m). it depends on thhe theme, and how much data
is actually loaded and uses.
> On p. 5:
> Really, we have kernel which operates at 200Hz, so per slice we can work
> with according to my computation - 34Mb/200 170Kb of memory, or
> 400Mhz/200 = 2 millon operations. This enough to make context switching
1hz != 1 operation. :) eg a divide may be 50 or 100x slower than an add. a
bitshift often comes for free with your operation (on arm). it depends.
> cache cleanup important. People report as 0.25% runtime.
> On p. 7:
> 'Work on' and 'have target' are different things in many cases. :)
i have gta02's. i WORKED ON THEM too. for quite a while. a year or so. i don't
anymore. i have better hardware.
> On p. 8:
> All this is about making money, not good things. Crap :)
eh? making money puts food in your mouth and a roof over your head. you might
want to stop working or running your own business for a while - give all your
money and possessions away and see how great life is then. money is important.
> On p. 9:
> As I already wrote, it's never 'doesn't matter', at least for me. I am
> still using wmaker on 2Ghz Xeon at home. Why? Because it has latency
> Xms, while gnome have 10*X ms. And, unfortunately for me, I can notice
> this. :)
and e also has very good response on such slow machines. ask all the people
using it on them who go to use e after trying so many desktops and wm's. for
many it provides the best balance of power, looks and speed. i happen to mostly
have much faster hardware (except my nice old sony vaio x505. gorgeous machine.
an old 1ghz pentium-m with 512m ram. but still gorgeous),
> В Срд, 28/10/2009 в 01:58 +1100, Carsten Haitzler пишет:
> > On Tue, 27 Oct 2009 17:11:08 +0300 Gennady Kupava <gb at bsdmn.com> said:
> > > I am sorry, but my letter is not about trolling and blaming but about
> > > optimization, qt and e, speed is interesting for me, not blaming. Calm
> > > down guys! I've numbered separate points overwise my letter will look
> > > endless :)
> > >
> > > 1. First, bit about qt scrolling - It's not so simple Carlsen want it to
> > > be. I see background image, rendered text and 1-2 relativaly small image
> > > each line. "Apllications" menu have ~40 entries. All scrolling very
> > > smooth, and no rectangles where. Carlsen, have you run qtmoko? Buttons
> > > are changing only then you press them. What prevents E from prerendering
> > > contents of scrollable area, it is not changing on the fly? Lack of this
> > > optimization makes menus and scrollable areas much slower.
> > scrolling isnt any special operation in efl. it's moving some objects
> > around. that's all it is. a scroller just moves its child around. moving an
> > object queues redraws for previous and current positions. evas' merges all
> > redraws at render time and just does them. it will avoid drawing things
> > that will be later overwritten by solid pixels. as long as it knows that
> > they are solid (eg solid rect, image without an alpha channel etc.)
> > scrolling is done very differently. you can't "pre-render" as they get
> > rendered on the fly. everythign does. evas has caches to save copies of
> > scaled images (if smooth scaled) to save computation making the smooth when
> > scaling on every redraw. but it's still a draw. this is done this way
> > because it is increidbly flexible. you get the ability to have translucent
> > items and all sorts of goodies. a draw in the end is a copy from some
> > source and a write to a dest in evas. the more reads you do and writes -
> > the worse it gets. worse is alpha blend as its read source, dest then write
> > to dest (after some calculations).
> > now... if your list in elementary had NO backgroun except the selecte item
> > ALL it woudl do it draw the changes in test items - ie fill in the
> > background (solid color would be writes onlt, image woudl be read then
> > write) and then draw text on top (an alpha blend op with source data being
> > only 8bit alpha). and this only for where the text is.
> > for qte/qtopia/qtwhatever it is called now, if you have a background that
> > moves with the text that scrolls, then it is a simple copy (copy current
> > area up N pixels or down) thus a read and write, then draw new area. if the
> > display is with a static bg and scrolling text - it's the same as evas.
> > evas's scroling is ONLY this method. if you configured the theme to be the
> > same as qt (from memory it was solid black bg's etc.) you'd end up with
> > approximately the same speed. evas would do a bit more work as it'd alpha
> > blend the text, but it would avoid copying areas of the list that didnt
> > change (eg strings dont fill the entire line and only part of it).
> > it's Carsten btw. "t" not "l" :) and i have run qtmoko... before the
> > freerunenr was even out. i tememebr it being orange for selected list items
> > (rectangle), empty if not (just text) and a greyish "qt" logo background
> > with some visible dither patterns on scale up.
> > > 2. Second point of my letter was that Glamo seem should not be blamed
> > > for everything. I wrote simple program to measure simple memcpy speed on
> > > om... This program just allocates 2 buffers of defined size and outputs
> > > count of memcpys of defined size it did in 1 second (interrupt via
> > > alarm()). Initally I want to see how arm cache cleanup and task switch
> > > influences parallel memory access tasks. Result were surprising for me:
> > glamo is one of the big problems. a write to video memory - eg a new screen
> > fram is.. based on your numbers below, about 1/5h the speed. it is as IF you
> > copied 5x the data from memory to memory. thats a heavy cost.
> > > OM:
> > > buffer_size average_number computed_throughput
> > > 128 1260880 153Mb/s
> > > 256 540900 132Mb/s
> > > 512 252399 123Mb/s
> > > 1024 121988 119Mb/s
> > > 2048 58827 114Mb/s
> > > 4096 29000 113Mb/s
> > > 8192 14000 109Mb/s
> > > 16384 3660 57Mb/s
> > > 32768 1105 36Mb/s
> > > 65536 553 34.5Mb/s
> > > 131072 274 34.2Mb/s
> > > 262144 135 33.7Mb/s
> > > 524288 69 34.5Mb/s
> > only the last really counts. the first are just caching effects.
> > > I did same test on my very-old-Celeron 600 router:
> > > 256 2522958 615Mb/s
> > > 512 2088723 1019Mb/s
> > > 1024 1554162 1571Mb/s
> > > 2048 1019996 1992Mb/s
> > > 3072 762667 2234Mb/s
> > > 4096 109489 427Mb/s
> > > 16384 27389 427Mb/s
> > > 262144 318 79Mb/s
> > > 524288 151 75Mb/s
> > > 1048576 76 76Mb/s
> > yes. better. the 2442 in the gta02 is an ooold arm cpu. it's not too modern.
> > given when the gta02 was released... it'd like making a pentium4 laptop and
> > releasing it and selling it as new in todays shops.
> > > On desktop (xeon 2Gz):
> > > x64 binary:
> > > 26214400 74 1850 Mb/s
> > > 262144 31971 7992 Mb/s
> > > 256 53994512 13232 Mb/s
> > >
> > > x32 binary:
> > > 26214400 59 1475 Mb/s
> > > 262144 29068 7267 Mb/s
> > > 256 20810406 5080 Mb/s
> > >
> > > Old arm-based device at my work (at91rm9200, 180Mhz)
> > > 2560000 16 39Mb/s
> > > 256000 167 40Mb/s
> > > 256 418044 102Mb/s
> > yup. even that is better. :)
> > > So, we can see that we have speed of 34Mb/s (it's ever only 5 times
> > > declared 7Mb/s for Glamo!) can someone comment? why memcpy is so slow -
> > well it possibly is partly memcpy itself. i'd have to check but you may be
> > able to improve it with some asm. i know on newer arms you really want to
> > use vfp or neon especially for memcpy's - u can get something like 2x the
> > speed.
> > > it is 2 times slower than ancient celeron, and on par with very old
> > > arm-based machine, it is not related to glamo anyhow! We can even skip
> > > results with cache, where om 10 times slower old machine.
> > correct. never claimed the cpu was fantastic :)
> > > 3. ... e - every N seconds (see config dialogs for what it is set
> > > to there, but let's say 60 seconds) will flush caches. ... and things are
> > > having to be repopulated from disk ...
> > >
> > > >From disk?! This is cost or having small memory footprint? This looks
> > > >very wrong.
> > where do u think images come from ? the arrows on buttons? the buttons
> > themselves? icons? all that data comes from a disk - from a file on disk.
> > if it isnt needed anymore (it's invisible) and it cycles into speculative
> > cache, it can be flushed. eg.
> > a png icon on disk might be 24kb - in ram it becomes 64kb. those 64kb's add
> > up. this is a tunable thand can be disabled - if you want. but this keeps
> > memory usae low. the default cache fir e is 4mb of ram. it will keep the
> > most recent decoded images in there. (For images - more caches for other
> > things). it's loaded then kept - dereferenced when not visible. if you want
> > to be sure, strace and see when its actually opening files.
> > > 3. ... Actually, yes the GTA01 is very noticeably faster in
> > > graphics. ...
> > > Can you expose a bit more details: How much it is faster: x2 times, x3,
> > > x1.5, x1.2?
> > ask kenyoung. he said it:
> > "Actually, yes the GTA01 is very noticeably faster in graphics.
> > I've got both, and I've run 'em side-by-side. The glamo actually
> > is a graphics DEcellerator. That's why GTA02-core is kicking it out."
> > quoted from his mail to this list yesterday. i dont have a gta01 anymore. it
> > went missing at some point, but i'd believe it.
> > > 4. ... for every second spent uploading contents to glamo, you CANNOT
> > > spend it calculating a new fram. ...
> > > Yes, this is bad... But qt works :)
> > efl works too. see above. not comparing apples vs apples.
> > > 5. ... that's because you have 2 processes competing for the cpu to
> > > render. ...
> > > My measurements of parallel memcpy showed that this is neglibible.
> > that will be totally wrong. 2 processes doing 2 copies from 2 different
> > source and destination addresses WILL have performance suffer - likely less
> > than half performance as you'd have cache flushing between context switches
> > etc. i'd say the bencmarking method is not right? you can't have 2
> > processes compete for memory and both get all they want when it's a limited
> > resource and 1 cpu only to share around. and my statement was not just cpu,
> > but x ang glamo. x will be sharing requests from 2 processes at once trying
> > to draw to it. and... i stated IO. :)
> > > 6. ... you wont find routines for rendering faster in most of the
> > > world. ...
> > > I will. I can recall you previous posts on the topic.
> > go for it. they exist. but not in most of the world. in some corners. find a
> > faster alpha blender or a faster super-sampling/sub-sampling scaler
> > or... :) (i am excluding neon asm here as its off topic and not on gta02,
> > but if you had something with neon you wouldnt have this conversation as
> > performance would be fine)
> > > 7. but.. if i were smart.. i'd not develop apps for the freerunner. it's
> > > a "dead product". it has no more being produced. it has no evolution
> > > path. there won't be a gtao3, 04, 05 etc. everyone quit or was fired/let
> > > go from om that worked on phones.. or worked on pretty much anything.
> > > your future is other devices.. and these don't suck with EFL. i'd not
> > > compromise the future if i were smart.
> > >
> > > Frankly speaking you never developed for GTA02, yes? you aim seem always
> > > were in future, and this is ok. I am sure that for example scrolling
> > > area pre-rendering if good for future.
> > wait? never? maybe you forget. i WORKED at openmoko before gta02 was
> > released, during and after. i very much did develop on and for it. i was
> > kicking glamo around long before it was sold. there were REASON why i
> > punted questions to this list like "what would you guys say if we dropped
> > to qvga?" as there was a replacement lcd with the same dimensions but qvga.
> > i stopped fiddling with my gta02 some time late last year/early this year
> > as i got hardware that is much better in my hands.
> > > 8. ... most games i know of are written to work on the highest end
> > > graphics cards at the time. why? ...
> > > Best games are written with other objectives in mind, this games are
> > > really interesting for anyone from time to time and for sure will live
> > > in ages (chess, nethack and so), our grandchilds will play nethack, be
> > > sure. Is it better to make pefrect things?
> > > And optimization is always good - you can feel that 10ms latency and
> > > 100ms latency is different even both are more than enoght for UI, but
> > > you feel that 10ms latency is much better.
> > ok. talking different worlds of games here. i'm talking the ones that come
> > out for ps3, xbox, and the pc games you buy on a shelf - not "chess" or
> > "solitaire". regardless.. there's a multi-billion dollar industry for the
> > quakes of this world. not so much for chess or solitaire :) so maybe i didnt
> > explain that well - i apologize. i was thinking THESE as games, not chess
> > etc. :)
> > > 9. ... BUSINESS CHOICE ...
> > > Everyone here follows it's goals. Carlsen make E. Other want to do
> > > hardware. Others want to use free hardware. Others want to increase
> > > development skills and hack that HW. Others just feel fun reading this
> > > book. Others have this job. Someone even makeing money from OM. ;) All
> > > this is ok, and I see nothing bad on making some great E developer to
> > > think a bit about optimizations - nobody loose from optimizing of E and
> > > writing a bit of technical descriptions :)
> > trust me. optimisation is what i do. i have an xrender engine for evas. it's
> > complete. it does everything. why isn't it used? because my own software
> > rendering code has outperformed xrender year on year. i am still waiting for
> > xrender with its partial hardware or claimed "full hardware acceleration" to
> > beat the software i wrote. i have been waiting for years. i have an OpenGL
> > and GLES engine. i have benchmark suites that compare engines.. apples vs
> > apples. they do the exact same operations. the same drawing (within the
> > limits of their system). and yes - OpenGL on my desktop (Nvidia 8600GTS vs
> > core2-duo 3ghz). opengl... is 2x the speed of software. but considering
> > thats software... thats not too bad. a modern high end dedicated gpu is
> > only doing 2x the software speed. i know something of optimising. i know
> > something of playing tricks to avoid work. in fact evas is avoiding work
> > all over the place. but none of the themes are apples vs apples. i know
> > just where evas has performance problems, and some of them i just chalk up
> > to "well.. it is simply not worth my time and effort to try as frankly..
> > the problem is already solved - newer systems are fast enough were it
> > "doesn't matter"". some others its more a matter of just not pushing efl so
> > far. if you have to sit and compare. make sure your comparison is fair.
> > apples vs apples.
> Openmoko community mailing list
> community at lists.openmoko.org
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler) raster at rasterman.com
More information about the community