andy git 06/15 suspend/resume observations

Mike (mwester) mwester at dls.net
Thu Jun 19 01:34:11 CEST 2008


Sean McNeil wrote:
> Mike,
> 
> First I'd like to apologize for the tone of the previous email. I
> re-read it and realized it was not as I meant to express things. I do
> have a strong opinion on the subject, but "Gross Hack" was too strong.
> 
> Mike Westerhof wrote:
>> Sean McNeil wrote:
>>  
>>> Mike (mwester) wrote:
>>>    
>>>>  Andy Green wrote:
>>>> Somebody in the thread at some point said:
>>>> | Somebody in the thread at some point said:
>>>> | | Andy,
>>>> | |
>>>> | | I've now confirmed it is from GSM wakeup. If I do not initialize
>>>> the
>>>> GSM
>>>> | | then the phone never locks up.
>>>> |
>>>> | EXCELLENT, thanks a lot.
>>>> |
>>>> | Mike can this plug into the serial resume problems?
>>>>
>>>>      
>>>>> I haven't taken a thorough look at the GTA02; with the console safely
>>>>> out of the way on port 2 on this device there should be absolutely no
>>>>> reason for any suspend/resume ordering issue to cause lockup/hang
>>>>>         
>>>> problems.
>>>>
>>>> |
>>>> | How can one provoke GSM wakes then?  Although I am in runlevel 3 I do
>>>> | actually have a SIM card in and am running gsmd -- last night
>>>> before I
>>>> | went to bed though I put it in suspend, and it woke 100% perfect
>>>> | thismorning after 7 - 8 hours suspended.  But it didn't wake before
>>>> that
>>>> | from GSM.... you really have to ring the phone?
>>>>
>>>>      
>>>>> Depends on your rootfs; the other emails on this thread outline the
>>>>> common things.  One can enable the "nspy" feature to find out.  Echo
>>>>>         
>>>> "1"
>>>>      
>>>>>  to the nspy_enable sysfs file to turn it on.  Then when the phone
>>>>> wakes, repeatedly "cat" the nspy_buffer sysfs file to dump out the
>>>>>         
>>>> event
>>>>      
>>>>> buffer.  You'll see the suspend/resume events from the point of
>>>>> view of
>>>>> the serial and neo1973_pm_gsm drivers, and you'll see the data stream
>>>>> from the UART identifying the reason for the wakeup.
>>>>>         There is one other possibility -- when the GSM first powers
>>>>> up, it
>>>>> always issues a GSM wakeup interrupt, although it has no data to send.
>>>>> Is there a possibility that the GSM is being unpowered and powering
>>>>>         
>>>> back up?
>>>>
>>>> Hm what's going on here... in the resume:
>>>>
>>>>     /* We must defer the auto flowcontrol because we resume before
>>>>      * the serial driver */
>>>>     if (!schedule_work(&gsmwork))
>>>>         dev_err(&pdev->dev,
>>>>             "Unable to schedule GSM wakeup work\n");
>>>>
>>>> but in the work function there
>>>>
>>>> static void gsm_resume_work(struct work_struct *w)
>>>> {
>>>>     printk(KERN_INFO "%s: waiting...\n", __FUNCTION__);
>>>>     nspy_add(NSPY_TYPE_RESUME, 'W', jiffies);
>>>>     if (gsm_autounlock_delay)                    <=== zero on GTA02
>>>>         msleep(gsm_autounlock_delay);        <=== no delay
>>>>
>>>>      
>>>>> User-space on the GTA02 is expected to explicitly manage the
>>>>> flow-control by means of the sysfs flag (preferred), or by means of
>>>>> changing the modem control lines on the serial port (deprecated, IMO),
>>>>> or as a last resort (and highly discouraged) to set the auto-unlock
>>>>> delay to some non-zero value.
>>>>>         
>>> This is a gross hack. The whole point of device drivers having
>>> suspend/resume callbacks is to put them in a proper state so that they
>>> are there when you come back up. I, for one, do not want to use sysfs
>>> flags at all. I already set the hardware flow control on opening the
>>> device. Are you saying that the user-space must now monitor that is has
>>> been resumed and then write to some sysfs file? I repeat: Gross Hack.
>>>     
>>
>> Feel free to write something better.  There are wheels within wheels
>> here, though.
>>
>> I explained the entire process some time ago (please search the archives
>> to find those series of multi-page emails describing this in absurd
>> detail!), but let me address your concerns in as brief a manner as is
>> possible:
>>
>> 1 - I agree in principle, and have repeatedly suggested that we need a
>> different device driver to support interfacing to the GSM, because the
>> semantics do not quite match with what the serial driver actually does.
>>  I've not had very good (!) response to that proposal, hence workarounds
>> have been applied.
>>
>> 2 - You say that setting hardware flow-control should suffice - and I
>> think you have a right to expect that it remains honored.  I was rather
>> surprised, to say the least, to find the code in the low-level
>> suspend/resume stuff that wraps saving the UARTs state in #ifdefs.  But
>> to modify the serial driver to do this in what I think would be a more
>> rational way would make it very specific to the GTA0x and would probably
>> impact the higher layers that are not specific to the GTA0x devices.
>>
>> 3 - However, even if the previous points are addressed, it does not
>> really solve the problem at a higher level.  Oh, sure you've not lost
>> any data anymore -- it's sitting there in the UART and the kernel
>> buffers waiting for you to read it.  So when your phone wakes up for
>> some other reason hours later, it'll be able to start ringing in
>> response to the SMS message that actually arrived hours earlier, but had
>> the misfortune to be sent from the GSM to the UART in the window between
>> the user-space process being frozen and the serial driver being
>> suspended.
>>
>> So, since the latter point (#3) requires that your application do
>> something to ensure that *somebody* is watching the GSM in order to
>> generate the wakeup interrupt anyway, then it simply seems more
>> reasonable to do the "Gross Hack", as you call it.  I really can't see
>> that rewriting the entire S3C24xx serial driver, and modifying the layer
>> above it, just to solve this problem, is going to result in code that's
>> going upstream.  So all we have left to argue about (unless you submit a
>> solution that is less of a "Gross Hack", of course) is what the means is
>> that we use for the application to signal that it is preparing to
>> suspend, and that the GSM should shut up and interrupt instead of
>> blasting data.
>>
>>   
> 
> This is where I disagree. The user-space has no reason to need to know
> or care that it got the message while it was sleeping.

Correct.

> The radio
> interface layer works just fine on a gta02 as things stand. 

Yes.

> Maybe there
> is some window that needs to be closed where serial data can come in
> during suspend and thus a wakeup interrupt doesn't occur.

This is the higher-level problem.

> In this case,
> the suspend should setup for the wakeup interrupt and then check the
> serial fifo. If something is there then deny the suspend.

It's no longer in the FIFO.  The application was frozen first, data
arrived, and was processed by the still-functional driver.  The data is
waiting in some data structure owned by the generic UART driver above
the S3C24xx serial driver.  One could, I suppose, have that code check
the data structures to see if unread data is present.  But this brings
up the problem of what do you do if the application that is supposed to
be reading the data has gone and borked itself?  Perhaps that data has
been there for hours, and in that case, the serial driver has been
denying every attempt to suspend the system - killing the battery all
the while.  We could write some code to "age" the data, and make sure
that we only refuse the suspend if the data in the buffer is fresh
enough.  Seems a lot of trouble when the application already has special
code to put the GSM "to bed" in preparation for suspend anyway -- might
as well write a "1" to a sysfs file at the same time and be done with
the in-kernel timestamp dancing and all.

> As for a call,
> you might lose the first +CRING or whatever, but you'll wake up on the
> next. So I guess you are only worried about something like an SMS
> message.

I strongly disagree with your implied contention that the first +CRING
is unimportant.  The lost time represented by missing that first
indication may mean the difference between being able to pick up the
call, or missing it.

This is perhaps an area of differing assumptions.  I suppose that if one
is building a PDA or a very small laptop that "happens to have a GSM",
then  perhaps it matters less -- if it usually takes four rings before
it goes to voice mail, but once in a while it's only three, one might
shrug it off.  And if we miss the occasional SMS, no big deal.

I'm looking at this from the point of a communications device.  This is
a phone, that also happens to do other stuff.  So I expect consistency,
and to simply say that my phone may fail to alert me on the delivery of
some small number of SMSs is disconcerting.  More so when we can easily
close that hole.  And it's also frustrating when a call goes to
voicemail early.  For a communications device, effective communications
its reason for being -- we need to strive for as close to 100% as we
can, and I think to skip this hole because the solution is considered
inelegant would be a poor choice.

Of course right now, someone is reaching for their keyboard to call out
how unlikely the timing of all this is, etc, etc -- but consider that
the new ASU images suspend in 90 seconds, and that other devices (Treo)
are even more aggressive.  Also consider that the gap in question
between the application and the driver suspend is not milliseconds --
it's far more than one second (!) last time I tried to measure it on the
GTA01 with gsmd and apmd.  Additionally, I've observed data arriving in
this interval on two separate occasions during my testing. So anecdotal
evidence seems to strongly suggest that this is not as uncommon a case
as we'd like to think.

> Also, no one in the user-space can generate a wakeup so I'm
> cloudy on this logic.

Yes, user-space is suspended and cannot generate a wake.  What I'm
referring to is aborting the suspend by a driver -- it's
indistinguishable from a "wake" event by application code.

> Finally, the application can't do what you are
> proposing anyway.

I don't know what your application is, so I can't argue about your
application.  But for GSM management applications (Qtopia, gsmd, etc.)
they all need to have a means to inform the GSM that it should cease
sending "uninteresting" unsolicited messages prior to the device
suspending.  If the GSM management application does not do this, it will
be constantly waked by numerous trivial events.  I suppose one could
simply not enable those trivial events in the first place, but that
leaves user-niceties such as signal strength meters to polling
techniques.  So the application not only can do this, the application
already has a logical place to add this code as well.  I put a patch for
Qtopia on the openmoko-devel list last week; it's really quite simple
where the user-space code is patched in, and its an extremely simple bit
of code too.  For the old gsmd, one doesn't even need to modify the
code.  Instead we simply do the job using shell script with apmd - ugly,
and not production-quality, but a remarkably simple proof-of-concept.

> It is timing critical as you point out. So having the
> application say "prepare to suspend" and interrupt me instead of
> blasting data is not possible. Come to think of it, this is exactly what
> suspend/resume in the driver is suppose to do.

Correct.  The application just forces flow-control on the GSM so that
the GSM must interrupt instead of sending data.  A late-suspend function
in the neo1973_pm_gsm driver checks to see if a GSM interrupt occurred,
and if so, it aborts the suspend.  The application ends up resumed, just
as if the GSM data arrived while the device was suspended.

> Now, on the gta01 things are very different I gather. This is because
> the uart is multiplexed as a console as well, no? I'm guessing that when
> switched over to be the GSM more work should be done to make that
> handover complete so that there is no longer any interference with the
> console.

If there was just a single PCB layout change that we go back in time and
have made, muxing the console with the GPS instead of the GSM would save
so much grief...

> 
>> All said, you don't have to do it if it's so repugnant for user-space to
>> take the initiative to manage the GSM in this fashion.  I sense intense
>> resistance on the part of many people on this point, frankly, and I'm
>> utterly perplexed as to why this is the case.  Am I missing something
>> here?  Is there a specification that outlines a permissible percentage
>> for missed GSM events?
>>
>> (I should also mention that the above is the GTA02; if you call the
>> software solution a "Gross Hack" I'm truly curious what words you have
>> for the hardware challenge the GTA01 poses in regard to this!)
>>
>>   
> 
> Again, please accept my apology for such strong wording. So far, I am
> unconvinced there is a problem that should be pushed to the user for this.
> 
>>>>> The GTA01 had different defaults because I thought (at the time)
>>>>>         
>>>> that by
>>>>      
>>>>> using the auto-unlock technique and the horrible hack in the serial
>>>>> driver, we could avoid the overrun problem.  It turns out that just
>>>>> defers it, so there's no longer any point to having GTA01 handled any
>>>>> differently than GTA02.
>>>>>         Which (as I mentioned in an earlier email) leaves us with
>>>>> three
>>>>>         
>>>> means to
>>>>      
>>>>> handle flowcontrol of the GSM, which is two too many.  This particular
>>>>> bit of code will be removed, along with the code in the serial driver
>>>>> that does this function from the modem-control code.
>>>>>         
>>>>     if (gsm_auto_flowcontrolled) {
>>>>         nspy_add(NSPY_TYPE_SPECIAL, '+', jiffies);
>>>>         if (machine_is_neo1973_gta01())
>>>>             s3c24xx_fake_rx_interrupt(10000);
>>>>         s3c2410_gpio_cfgpin(S3C2410_GPH1, S3C2410_GPH1_nRTS0);
>>>>         gsm_auto_flowcontrolled = 0;
>>>>     }
>>>>     nspy_add(NSPY_TYPE_RESUME, 'Z', jiffies);
>>>>     printk(KERN_INFO "%s: done.\n", __FUNCTION__);
>>>> }
>>>>
>>>> There's no schedule_delayed_work, no msleep, this could execute right
>>>> away, and yet it says in the comment we need to wait for serial driver
>>>> !?!?
>>>>
>>>>      
>>>>> We need to wait for the serial driver in order to avoid data loss --
>>>>>         
>>>> not
>>>>      
>>>>> because of any hangs or lockups that I've ever observed.  The issue is
>>>>> that depending on whether the low-level debug for suspend/resume is
>>>>> enabled in the defconfig, the UART registers may be restored on
>>>>>         
>>>> wake, or
>>>>      
>>>>> they may be reset to the default boot-time settings and then reset to
>>>>> the settings specified in the termio structures.  Because of the
>>>>> shared
>>>>> console on the GTA01, that device actually sets the port to
>>>>> function as
>>>>> a console, then immediately sets it to the desired settings -- it is
>>>>> this "diddling about" with the UART status that requires that we keep
>>>>> the GSM from sending anything until after things have stabilized.
>>>>>         
>>> OK, then it should be stabilized in the resume process.
>>>     
>>
>> That's fine with me.  It looks like Andy has put some code together to
>> do that more elegantly.  But IMO it doesn't matter -- the application
>> managing communications with the GSM needs to manage flow-control of the
>> GSM during suspend/resume, so it becomes a moot point.
>>
>>   
> 
> Yes, it is looking promising at this point. God how I wish Linux had a
> generic suspend/resume dependency mechanism. So many issues here.
>>>>  I check the resume ordering
>>>>
>>>> [ 7187.755000] neo1973-pm-gsm neo1973-pm-gsm.0: resuming
>>>>
>>>> [ 7187.755000] gsm_resume_work: waiting...
>>>>
>>>> [ 7187.755000] gsm_resume_work: done.
>>>>
>>>> ...
>>>> [ 7187.755000] s3c2440-uart s3c2440-uart.0: resuming
>>>>
>>>> [ 7187.755000] s3c24xx_serial_set_mctrl: GSM mctrl=0x00000000
>>>>
>>>> [ 7187.755000] s3c24xx_serial_set_mctrl: GSM mctrl=0x00000006
>>>>
>>>> [ 7187.755000] s3c2440-uart s3c2440-uart.1: resuming
>>>>
>>>> [ 7187.755000] s3c2440-uart s3c2440-uart.2: resuming
>>>>
>>>> Hum what happens when that completes and the uarts aren't up?
>>>>
>>>> -Andy
>>>>        Mike (mwester)
>>>>       
>> Mike (mwester)
>>
>>   

Mike (mwester)




More information about the openmoko-kernel mailing list