cpu usage way too high (?)

thomasg thomas at gstaedtner.net
Mon Feb 2 03:12:49 CET 2009


On Mon, Feb 2, 2009 at 1:19 AM, Nils Faerber
<nils.faerber at kernelconcepts.de> wrote:
>
> I am sorry but I cannot let this stand.
> There are reasons for not using CPUs with FPU and it is not because ARM
> CPUs are old (or the Ssamsung flavour of ARM used in the GTA02 would be).
> If you look at the chip layout of a modern x86 CPU (regardless of brand
> and actual type) you will easily find several areas that look similar.
> The largest today is the on-die CPU cache - ever growing because in
> comparison to the CPU cycles the RAM bus is becoming more and more the
> performance bottleneck. The next bigger parts one can see are the fixed
> point CPU and ALU along with a larger chunk for the MMU and finally a
> quite big part, this is the FPU. And here comes the reason for getting
> rid of the FPU: It eats too many transistor gates! Transistor gates mean
> chip size and mean power consumption. An FPU can be emulated in software
> but parts like cache and MMU can not. So in order to reduce power
> consumption the FPU gets stripped first. As easy as that.

You are right, the caches make a big part of the die-size today.
However - while modern x86-Quadcores come with up to 12 MB on-die
caches, the Freerunner's s3c comes with 32k (that's about 1/380 - ok,
the impact is slightly different because of the 45nm vs. 130nm
processes, but still), so while caches on x86 take up to 2/3 of die
size, the impact on arm is totally different. The VFP is not _that_ a
big part on the picture (which unfortunately aren't as public as they
are in x86 world) - the arm9 vfp that would have been available by ARM
for the s3c24xx design takes 1mm^2 in 130nm, the s3c2442 package is
14mm^2. The benefits are not only the ease of programming, the VFP
does the job better than a fixed point implementation.
Of course it always depends on the target of use, but the s3c2442 is
an application processor for multi-purpose devices - and I haven't
seen any new design (>armv4, except xscale) without FPU in this field.
Also, ARM states that the VFP is <0.16W (max.) at 400 MHz.

> And this has nothing to do with ARM - other embedded CPUs follow the
> same trick in order to reduce the number of gates, look at MIPS, Hitachi
> SH, embedded PowerPC, AVR32, etc.

Ok, but this are mostly special purpose processors, like TI AR7 that's
highly integrated and surely not meant for anything multimedia
related.
The same for PowerPC, only absolute low-end comes without FPU, mostly
because it just isn't needed (no multimedia-stuff, and in any case we
use them (telco), the real work is done by some highly specialized
DSP's)

> Multimedia stuff can usually be optimzed for fixed point, like it was
> done for OGG with Tremor, MP3 with Madplay, etc. But this kind of
> software is based on a long series of mathematical calculations, lot of
> them involving frequency spectrum anylisis like FFT. This can easier be
> developed using floating point. But once you know how your code shall
> look like you can, in most cases, get rid of floating point again. But
> this is extra work and many developers omit that.
> So there is no intrinsic problem with multimedia that forced it to be
> floating point heavy - it is rather the "laziness" of the developers.

Ok, it can be done, but wasn't it the job for hardware in the last
decades to make the life easier for the programmer?
Why go back?
Even tremor (the fixed point vorbis decoder) was only created to bring
ogg/vorbis to ultra-low-power portable media devices - nowadays they
don't come with tiny microcontrollers anymore, e.g. apple uses
high-end arm11 SoCs with FPU (and much more) in their ipod nano (same
s3c64xx as in their phone and the high-end players).

> ARM is a clever company. If you are a chip manufacturer you can get a
> customized individual CPU design from ARM, this is their business model,
> they sell building blocks and the integration. You can have almost
> everything you like, like MMU, DSP, a large variety of interfaces, and
> yes, if you have the chipspace and power to feed it, even an FPU.

And that's why ARM can be fun if a company want's to provide cool
products with a lot of features (e.g. TI's OMAP proves this on the
beagleboard).

> The S3C2442B is not exactly "really old" - but this depends on the point
> of view I guess.

Yeah, that's my point, the s3c isn't that old, but the arm9 design it
is built upon is.

> This is not exactly true. It may look like this from a PC user's point
> of view because you are used to this confusing GHz ralley going on
> between Intel and AMD. But the GHz do not tell you anything about the
> real technology underneath - or why do you think Atom CPUs systems still
> need a fan? There is ancient cruft in those x86 thingies.
> The embedded CPUs just cause less PR. And they consume way less energy -
> a modern smartphone can run for weeks! How long does your average laptop
> last? Comparisons are difficult in this area but neither are those
> embedded designs old nor are their lifecycles longer. Look at the Intel
> PXA CPUs e.g. - they got replaced almost as fast as new Intel x86 CPUs
> were introduced. For the largest part of this market the average user
> simply does not get to know what happens - and most don't even care
> since in most cases you do not have a choice, either you buy that phone
> with this CPU or you don't but you cannot get the same one with a
> different CPU.

The awesome thing is, that modern smartphones can last long even with
a ton of features (from 3d graphics over wifi, gps, ... - no reason to
spare the fpu here :).
To the PXA: they've been around a long time under Intel, and I think
they always showed what ARM can do in smartphones and so on - 5 years
ago PXAs have been more powerful than the s3c2442, though they had no
fpu either.

Intel didn't update on any newer ARM IP, so it was obvious, that they
wouldn't hold on this much longer - still, there wasn't really a
reason to try to push x86 into low-power computing. The biggest
advantage of x86 is, that it can run WinNT, so I think Atom was not
only an Intel initiative.



On Mon, Feb 2, 2009 at 1:31 AM, Werner Almesberger <werner at openmoko.org> wrote:
> In fact, Samsung made some radical changes while we were already in
> the middle of the design. So it was a lot more bleeding edge than we
> had bargained for ;-)

The s3c2442 design could have been released yesterday - and it still
would be outdated :)
All I'm saying is, that a newer ARM cores would (will) be much more
fun, and with an FPU also allow to use much more available software
without having to re-implement in fixed point arithmetic.



More information about the hardware mailing list