ASU - out of memory?

Carsten Haitzler (The Rasterman) raster at openmoko.org
Mon Aug 25 03:48:32 CEST 2008


On Fri, 22 Aug 2008 11:13:19 +0200 Tilman Baumann <tilman at baumann.name> babbled:

> Pardon.
> I don't care for the warm and fuzzy feeling you get by having malloc  
> fail on you.

the problem is.. it doesn't say it failed. it says it succeeded. it returns a
pointer. program now moves on and should be able to safely assume that the
memory it asked for is it's to use. overcommit makes success from malloc a
"gamble". it may or may not provide you with usable memory. this makes any kind
of attempt to do error checking pretty useless. as success can also be an
error that is not detectable (until it's too late and you've segv'd).
 
> It does not give you a bit more system stability! The one app  
> receiving malloc errors is just not app of many. They all have a  
> problem then.
> Imagine, the browser catches a failed malloc, because some other  
> stupid app has eaten almost all ram.
> What is the benefit of telling the browser about low mem? It could  
> only safe itself from crashing. Well done.

and that is what it SHOULD do. programs SHOULD check the return of their malloc
()'s and "unwind" safely and remain in a safe non-crashing state. they may
inform the user of such an issue or may chose some other method to deal with
it, but it is the job of every app to deal with it as gracefully as possible.

> And malloc btw. never gives you bad memory. You just have to check,  
> even with overcommit.

with overcommit it CAN give you bad memory. without it can't (or shouldn't - if
it did - it'd be a bug). NULL = "no memory could be given to you of the size
you asked for". deal with it, anything else is giving you the memory you asked
for.

with overcommit it SAYS it gives you that memory - but once you try and access
it (read or write) at ANY time, you may or may not segv. because the kernel may
have overcommitted and decided to throw out that memory because it thinks it is
not used (for whatever reason), but now you want to use it.

let me give an example:

my app starts - i alloc 64kb for a quick lookup-table for a dictionary. (this
is a real-life thing i actually do btw.) this lets me find words in a
dictionary file when you type to match quickly (as opposed to do a full linear
search of the file). this lets me jump to a subsection (block) of the file
quickly based on the first few letters of a word then do a limited linear
search. this is low on memory usage as its a limited lookup table. this gets
set up on app start. now. lets say you don't type anything for a while - hours
or days. in the meantime you run apps (dialler, address lookup etc.) and just
passively browse the web (no entering fields - just using bookmarks). lots of
memory is allocated and freed, but the pages of ram for the keyboard dictionary
are untouched. it hasn't needed them. now you use your browser and it teats up
gobs of ram. well what happens now you need to type. you bring up the keyboard
and type in your name in an entry field. as it does this it may need to lookup a
dictionary. the kernel has since overcommitted and thrown away some pages in
the 64kb lookup table for the dictionary. it decided that since they werre idle
- they may aswell be ditched as it REALLY needed that ram for that background
jpeg in the web page! this was new, urgent and active. well - now as you type
the keyboard code tries to do a lookup.. and finds the pages of its dictionary
gone! kernel took them away. they were successfully allocated and filled, but
now have been nuked. result? keyboard segv's. no bug in kbd. it did everything
right. it gets punished for the memory hogging of another app. THAT is what
overcommit does.

> It's just so that you have to be aware that the kernel who is watching  
> over all resources (the evil MCP muahahah)
> may need to lay off some programms in a critical system condition.  
> (oom_kill)
> No one wants this, so the strategy is to avoid this. We are talking  
> here about the same thing. We need some userspace helper to clean up  
> before it gets critical.

yes. agreed. and we need to not use overcommit. you can't go punish apps that
have done everything right and by the book. overcommit does this.

> Yes, it sounds bad to have a error handling for out of memory and then  
> never use it.
> Ask my mon, she will tell you this is horrible. But my mom does not  
> know the whole picture.

its not just the error handling -the SUCCESS case is also now broken. see
above. you are liable to be able to segv if you handle the error case or not.

> And once again. Not having overcommit wastes memory. Why do you think  
> this is worth it?

and having it makes apps segv that would not have segv'd before. it wastes
developers chasing gdb backtraces and bug reports for something that doesn't
even exist as a bug. see above. you'd get an utterly bizarre crash somewhere it
just doesn't make sense - you'd possibly spend days or weeks hunting it to no
avail as it just isn't what you think it is! the kernel literally has removed
your ram from you! there is no bug to hunt!.

> Ah, and btw. Nice analogy.
> 
> If you malloc (money alloc) some money at your bank they will give you  
> some paper.
> You know it is virtual value. You know they vastly overcommit.
> You know, they will give you virtual money even if they can't know if  
> they can keep that promise?

and in the fine print they DO commit to a certain value. (eg 50,000 or
100,000). up to that amount your money is guaranteed. you know what i do? i put
my money in several banks so all my values are below their "commit level".
malloc() has a manual - a contract with you. a guarantee. with overcommit this
means arbitrarily you may have that broken. what happens in the real world is..
you then sue the bank! they broken their contract with you.

> Sounds horrible, doesn't it?
> It kind of is.  But they have swap too. And they try to fix it when  
> they become aware that there is a problem.
> If you are lucky the banking kernel finds a solution. If not, we are  
> all screwed.
> 
> Come on, no one would ever do this! If they could not trust the paper  
> they get, how could they know if there is a problem?
> It's a horrible system, let's go back to real values like gold...
> 
> No, we don't. See you at the bank. ;)

i'm well versed with banks. i have multiple accounts in multiple countries in a
whole host of currencies. i read the fineprint. :) like with malloc. i read the
manual page - the fine print. and work within what it guarantees. overcommit
breaks the contract. it is bad.

> Am 22.08.2008 um 02:33 schrieb Carsten Haitzler (The Rasterman):
> 
> > On Thu, 21 Aug 2008 18:40:52 +0200 Tilman Baumann  
> > <tilman at baumann.name> babbled:
> >
> >>
> >> Not being able to malloc memory and not having any physical memory  
> >> left
> >> are just two sepereate things. At least on modern linux systems.
> >
> > no they are not. i write code.
> >
> > myptr = malloc(mysize);
> >
> > it returns failuer (NULL) then i need to deal with it.
> > it returns a pointer - it succeeded. i asked for memory - it gave it  
> > to me. the
> > problem is with overcommit success returns are lies. they MAY have  
> > the memory -
> > they may not. part way through using the pages it returned and SAID  
> > i could
> > have, it can just segv - as it is overcommitted and out of pages.  
> > this means
> > that suddenly return values can't be trusted for memory allocations  
> > anymore.
> > any attempts to handle NULL returns may as well not exist as success  
> > cases can
> > be undetectable failures. it's just a stupid policy. sure - its  
> > there to work
> > around stupid userspace code that goes off allocing massive blobs of  
> > ram that
> > it then never goes and uses, but the kernel should go punish all  
> > apps for this
> > - those apps being stupid should be punished and have their code  
> > fixed.
> >
> >> Memory overcommit saves (physical) memory. And not just a bit.
> >> And with some swap safety net it is reasonably save to do so.
> >>
> >> And how should it be better if _any_ app gets knocked out by  
> >> running in
> >> the memory limit? Usually it can't help the situation. Especially if
> >> some other app that is eating all memory.
> >> The effect is more or less the same. Some random poor app has to be
> >> killed. (Since suicide is often the only way to react to malloc  
> >> fails)
> >>
> >> Turning overcommit off is in my eyes only a poor 'bury one's head  
> >> in the
> >> sand' solution which effectively does improve nothing.
> >
> > i disagree. overcommit is a "bury head in sand" solution. it means  
> > you just go
> > and avoid the original problem of allocing much more memory than you  
> > really
> > need.
> >
> >> Overcommit and swap is just the winner. Everyone does it so. And in  
> >> my
> >> eyes rightly so. It is fast and efficient.
> >> And even without swap, i would not want to turn overcommit off.
> >>
> >> Swap and overcommit is just the dream team. Fast and efficient  
> >> memory.
> >> And if something goes wrong, you have plenty of time (and mem) for
> >> solving the problem.
> >> Contrary to without, because how could you fix any low memory  
> >> condition
> >> with not allocating any more memory? Driving something into swap to  
> >> be
> >> able to do something about it is just right.
> >>
> >> And just if you thought so. Overcommit is not just lazy don't care  
> >> for
> >> errors behaviour but a really smart optimisation.
> >
> > wrong. it means i can no longer trust a successful malloc() calloc()  
> > realloc()
> > or even alloca() return. ever. the return of a valid pointer which  
> > according to
> > the manuals provided ever since for these calls, could be an error  
> > state too.
> > and thew only way to handle it is have your own sigsegv/bus handlers  
> > and on a
> > segv which accesses a known allocated chunk of memory, realise that  
> > overcommit
> > just screwed you and try and save yourself (and frankly by this  
> > stage - you
> > may as well just segv, exit or try re-exec yourself freshly).
> >
> > is is not elegant. it is not good. it is a cute hack that makes  
> > badly-behaved
> > apps not impact things as much. it just sticks the head in the sand  
> > and doesn't
> > go and fix the apps. it means the kernel tries to pick up the badly  
> > bloated
> > pieces for you and tends to punish all apps as a result.
> >
> >> For example it saves memory by only copying a page if the page was
> >> written. (copy on write)
> >> For example:
> >> If you fork, the memory space is duplicated. You end up with twice  
> >> the
> >> memory. But linux does not copy the memory, it overcommits.
> >> Only if a page is really changed it get duplicated first.
> >>
> >> This is elegant, fast and efficient. But yes, you loose the context  
> >> of a
> >> error condition if some happens. But i say, if a system runs out of
> >> memory, letting a program know that there is no memory left helps
> >> nothing. The poor program that gehts the error is usually neither  
> >> able
> >> to save much or even to solve the system wide problem.
> >>
> >> -- 
> >> Drucken Sie diese Mail bitte nur auf Recyclingpapier aus.
> >> Please print this mail only on recycled paper.
> >
> >
> > -- 
> > Carsten Haitzler (The Rasterman) <raster at openmoko.org>
> >
> > _______________________________________________
> > Openmoko community mailing list
> > community at lists.openmoko.org
> > http://lists.openmoko.org/mailman/listinfo/community
> 
> 
> _______________________________________________
> Openmoko community mailing list
> community at lists.openmoko.org
> http://lists.openmoko.org/mailman/listinfo/community


-- 
Carsten Haitzler (The Rasterman) <raster at openmoko.org>




More information about the community mailing list