Idea for an optimized Emulator

Rashid Kratou rashid at milacom.de
Tue Jun 8 13:34:50 CEST 2010


Dont know if its realy qt moko related but hey the emulator can maybe
used in qt moko too :).

Hi guys, I'm trying to port a snes emulator to the pandora. The problem
is that around 60% of the cpu time is taken at 30 FPS by sending the
data to the glamo. There is an optimizing idea from a very clever guy
who has coded and optimized some emulators for arm devices. The basic
idea is to split the bus between glamo and cpu so that the cpu has some
work todo while the bus is blocked by the glamo. 

Like: 
	cpu gets a short time to get data from the RAM into his cache.
	glamo takes the bus for a short time an gets data
	cpu cant access the bus and works with the data in his cache.
	cpu takes the bus back
	glamo works with the data in his cache
	[...]

Would this be possible? Can you do it without knowing assembler and from
a normal C programm? Or do you have to hack the kernel for it? 

Here is the chatlog:

(01:10:28) emulator developer: Anyway, DMA occurs on the SoC itself, so
it's possible to perform it from and to anything on the bus, presumably.

(01:10:36) emulator developer: It was already shown to work with the
Glamo.

(01:10:47) emulator developer: Albeit at much lower bandwidth than
expected.

(01:11:03) emulator developer: The problem is, if you try to DMA one
large chunk at a time the DMA ends up stealing the bus for that entire
duration of time.

(01:11:04) me: but the cpu can work during the DMA operation or?

(01:11:36) emulator developer: As soon as the CPU ends up requiring the
bus due to a write buffer writeback or cache miss/uncached access it'll
end up being stalled until the DMA finishes.

(01:12:18) emulator developer: But since the DMA can be set up to step
in small chunks it's possible to make it not stall the bus for long,
meaning you let the CPU access the bus without a lot of wait and then
the CPU can go on doing other things. Since the CPU doesn't need the bus
100% of the time it can still get work done.

(01:12:59) me: as long as there is some work todo inside the cache or?

(01:13:22) emulator developer: There's always work to do inside the
cache.

(01:14:24) me: ok but when the glamo uses the whole bus. the cpu can
finish its work inside the his cache and then wait until the bus is free
or? Or did I misunderstood you?

(01:15:29) emulator developer: The Glamo won't use the whole bus if the
DMA is staged like I described.

(01:15:39) emulator developer: It'll give other things a chance to use
it.

(01:16:12) me: trying to understand what you described

(01:17:37) me: do you send only a piece of the frame at one time to the
glamo so the bus isnt used 100% by the glamo?

(01:17:58) emulator developer: A timer is set up to automatically do
that.

(01:18:34) me: hmmm ok

(01:19:05) me: and then the cpu has his cache and keep working, and the
glamo gets his data and when it's there then he swaps the buffers.

(01:19:17) me: ok if i understand it right, it sounds like a realy good
idea :)

(01:20:04) emulator developer: Someone else mentioned it once. But it
hasn't been done, sooo..

(01:20:23) me: so it would be kernel hacking...

(01:20:34) me: or can we do it from a normal programm?

(01:22:04) emulator developer: I don't know.

(01:22:11) me: hmm ok

(01:22:25) me: but your idea seems to be realy promising

(01:23:32) me: are you sure it would work this way? if it would be
possible it would realy help other game too :)

(01:24:57) emulator developer: No, am not sure.

(01:25:07) emulator developer: And I'm not going to develop on a
Freerunner, so count that out ;p

(01:25:18) me: *g*

(01:25:27) me: i never expected you would develop something for me

(01:25:49) me: im sorry that it sounded like i was searching for a nice
guy to exploit so he is writing something i want for me

(01:25:57) me: in the first post of the board

(01:27:28) me: i learned much. i know now that the glamo bandwith is the
problem, that the cpu is blocked when the glamo is getting normal data.
and the you can bypass the problem if you use DMA and take care that the
cpu and the glamo share the bus in a way that both have enough data in
their cache to work

(01:29:32) emulator developer: But it's a compromise.

(01:29:44) emulator developer: It's not like you'd ever be able to do
60fps or anything.

(01:29:49) me: thats fine

(01:36:09) emulator developer: You'd be frame skipping.

(01:36:19) me: thats ok

(01:39:21) me: what you would think can be reached?

(01:39:51) emulator developer: Dunno.

(01:40:19) emulator developer: BTW, there's no OGL ES on Freerunner.

(01:40:31) emulator developer: The Glamo has hardware capability for it
but no drivers. And the company went under.

(01:40:37) me: ahhhh

(01:40:40) me: damn NDA

(01:40:41) emulator developer: But never publicly released a datasheet.

(01:41:02) me: ok it can play mpeg4 movies

(01:41:04) me: thats cool

(01:41:14) me: but the way it was placed on the freerunner realy sux

(01:41:46) emulator developer: Yeah it was pretty dumb, I bet the CPU
could do MPEG4 as well on its own.

(01:41:53) emulator developer: If it was attached to a fast framebuffer.




More information about the community mailing list