nWAITing for Glamo
Carsten Haitzler (The Rasterman)
raster at openmoko.org
Wed Jul 16 00:01:58 CEST 2008
On Tue, 15 Jul 2008 20:16:00 +0200 "andrzej zaborowski" <balrogg at gmail.com>
> 2008/7/14 The Rasterman Carsten Haitzler <raster at openmoko.org>:
> > On Mon, 14 Jul 2008 14:26:59 +0100 Andy Green <andy at openmoko.com> babbled:
> > aye. this gives the nitties of what the glamo is causing in performance
> > issues. basically this is the crux of the "bus bandwidth to the glamo"
> > issue. while reading or writing to the glamo the cpu gets stalled waiting
> > for the glamo - limiting throughput to about 7m/sec. unfortunately for us
> > the cpu is hung waiting on the slow glamo when it could be off doing
> > something more useful with its time (even if we accepted limited write/read
> > rates, if they could be async we'd be much better off). this was the crux
> > behind the DMA experiment dodji did. the problem was that the DMA was
> > on-soc and memory to memory and would hold up the cpu in wait states anyway
> > - so you don't win over using the cpu.
> Ouch. If the DMA has to block the whole SoC when waiting for the
> nWAIT then it's really bad, and it seems to be confirmed by what I saw
> (i.e. if I disabled the clock and tried to write to glamo, the whole
> SoC would hang, even the JTAG wouldn't respond). I'm pretty sure the
> S3C series must special in this regard, and someone should really
> consider a completely different SoC for the next models. There's so
> many things wrong with them (the timers (16-bit!), the DMA (see
yup. i was mistaken in thinking dma would alleviate our problem. i was indeed
wrong. i was assuming too much goodness on the part of our venerable
samsung2442 and its memory to memory dma engine... :(
> below), the power management, the documentation silently taken away -
> this should be alerting). Their only advantage I see is the hw team
> is already familiar with them. I don't believe the cost is a blocker
> for switching to, say, OMAP.
i'm all for ... in the future ... getting an SOC that is good. what SOC we will
use in future is up in the air and for what model. but things are clear:
1. it must be easy to support - i.e. the vendor needs to have a linux story
already. we dont want to spend the next 12 months getting linux to boot at all
2. it must be OPEN. that means all of it. every hw element needs docs that are
open and allow us to produce open drivers to drive that element as it is
intended to be driven (eg gsm on the end of an at-commandset api is fine,
graphics with a binary-only driver is not. graphics where we only get to use a
small subset of the features (eg basic 2d) as open is not (and more advanced
stuff being closed).
3. the more grunt the better
4. good power saving.
5. i think we likely would want arm as thats what we have... and likely will
far into the future.
at least that's my list.
any suggestions are always good - i've taken a sniff at qualcomms' snapdragon.
qualcomm dont have a reputation for being open - but something hinted that it
has an ATI graphics core in it... and that may give me hope. other than that it
looks like one nice little beastie... :)
anyway... back to our programme...
> But here's an idea: we aleady know that we will have to "nWAIT" for a
> moment before every write to Glamo - we can spend that time just
> nWAITing or perhaps we can do something else on the CPU, and then
> start our transfer just in the right moment for the write to be
> instant or almost instant. The lengths of periods we spend nWAITing
> may have some non-trivial pattern but it's easy to research.
> How to do that: the OMAP dma can be told to wait a couple of clocks
> before every atomic transfer (element, in omap speak), the s3c can't
> do that, I just checked - but it can be synchronised with a timer. In
> Dodjis test the transfers were not synchronised - they were like
> memcpy/memset, i.e. a new transfer starts a soon as the previous one
> finishes. Instead the DMA can stop after every 8, 16 or 32 bits or N
> times that, and wait for the PWM timer. I'll try that some weekend.
hmmmm. interesting idea. this is only possible to do sanely with dma - and ven
then we still have the "small dma segments will be more overhead to setup and
kick off than it cots to just blindly memcpy() with the cpu", so there will come
a fragmentation point at which a dma segment size is too small to bother with,
that can be tuned of course, BUT... it could work... maybe...
Carsten Haitzler (The Rasterman) <raster at openmoko.org>
More information about the openmoko-kernel