Suspend/Resume oversight wrt GSM handling

Mike (mwester) mwester at dls.net
Thu May 15 06:42:48 CEST 2008


Mike (mwester) wrote:
> I'll be happy to provide my logic and reasoning for why this is a 
> problem, but if there's something else that guarantees this (hardware 
> somewhere?), perhaps it we could avoid needlessly discussing something 
> that's a non-issue -- what is it that would force the GSM to be 
> flow-controlled during suspend/resume processing and while in the 
> suspended state?

Well, ok.  I don't want to waste anyone's time, and I fear that the lack of 
interest this issue has garnered in the several weeks I've been working on it 
means that this may be a non-issue.  Here's the detail; sorry for the length:

UART asserts RTS when it has room for data in its FIFO.  It will un-assert RTS 
(or "flowcontrol") when the FIFO is almost full.

When the phone is suspending, there will be no data in the FIFO (or else the 
suspend would not have been started in the first place).  Thus, as the 
user-space code is frozen, and the drivers suspend, the GSM is not flow-controlled.

When the serial driver suspends, the generic UART code seems to issue a request 
for the s3c24xx driver to un-assert RTS -- however this request is not fulfilled 
because there is no code to support this in the current driver.  Moreover, if 
the code was present in the normal way, there is still no reason to believe that 
RTS will be held in the un-asserted state.  The S3C2410 documentation is silent 
as it regards the state of the UART control signals while the processor is 
suspended, or while the UART clocks are disabled.

It appears that the GSM becomes flowcontrolled at some point during this 
process, because we do get an interrupt - but this may well be due to the mux's 
response to a floating input as much as it might be due to undocumented behavior 
of the SoC itself.

Other potential issues exist - timing, to be specific.  There are race 
conditions possible in the code in several places.  A big hole is in the time 
between the user-space gsm-handling code being frozen, and the time when the GSM 
actually ends up flow-controlled.  During this time, incoming data from the GSM 
will not stop the suspend from happening, and neither will it result in a wakeup 
interrupt.  Depending on when the flowcontrol really happens (when the UART 
clocks stop, or when the SoC suspends), we may or may not actually get the data 
in the serial driver.  The result is that the message is lost.  The argument 
presented at one point was that the GSM will simply "ring" again on incoming 
call, so this doesn't matter -- this is true for incoming calls, but not the 
case for SMS.  I have observed just such an event involving a missed SMS during 
my testing, which confirms that this is not as unlikely an event as it might seem.

Another area of concern is the resume processing.  The SoC registers are all 
saved, with the exception of the UARTs.  I don't know why this is done, but the 
PM code will not save the UART state unless low-level debugging of the PM code 
is enabled(!)  This means that the UARTs need to be completely re-initialized by 
the serial driver on resume -- so unless great caution is taken, this provides 
another opportunity for the GSM to transmit data to the UART at a time when the 
serial driver is not set up correctly, resulting in data loss again.  I can 
confirm that the driver does not initialize consistently with 
hardware-flow-control enabled, although eventually it seems to "get there".

It seems very clear to me at this point that unless we follow the original 
design as described by Harald, we end up relying on undocumented behavior of the 
SoC along with poorly-understood behavior of the serial driver itself, and even 
if both of those behaviors are correct, we still have a few very large race 
conditions to contend with.

Harald's solution is utterly simple.  Convert the RTS output from the UART to a 
GPIO output when necessary.  The GPIO output will hold its state even as the 
UART is shut down, during the period when the SoC is suspended, and during the 
period of time when the serial driver is resetting UART FIFOs and all the rest.

 From an implementation point-of-view, we need to find the right way to do this. 
  I've implemented a fail-safe mechanism in the suspend/resume code for the GSM 
mux -- if nothing else has converted the signal to GPIO mode during suspend, 
this code does so.  Resume is a bit more tricky, as this driver resumes before 
the serial port is ready - but by using a work thread, the switching back of the 
GPIO can be scheduled for some point after the serial driver is ready.

This fail-safe mechanism is lacking in that it cannot prevent the race 
conditions mentioned above during suspend.  So, the same code implements a 
second technique, intended to be run from apmd suspend and resume scripts - we 
expose a control via sysfs that allows a user-space process to explicitly ask 
that the GSM be flowcontrolled, or released.

Frankly, it works pretty well.  It has the side-effect that it delays the 
processing of the burst of data from the GSM until after the GTA01 is fully 
resumed; without that side-effect the interrupt latency during that initial data 
transfer would be such that the UART would be overrun each and every time the 
device resumed.

Qtopia presents another potential problem.  A conversation over IRC indicates 
that it explicitly sets the modem-control signals to flow-control the GSM when 
it prepares to suspend.  This is a quite reasonable approach -- were it not for 
the fact that the S3C24xx serial driver completely ignores user-space requests 
to manage the modem-control signals.  So if this is, in fact, how Qtopia does 
it, it will suffer from all the same problems listed above.

The simple solution is to do a special partial implementation of the modem 
control code, in which a request from user-space to de-assert RTS would convert 
the pin to a GPIO output as described above.  Unfortunately, it seems that 
user-space isn't the only thing that diddles the modem-control lines; during 
suspend something else is manipulating those signals as well.

So there is more work to do if we wish to ensure that the Qtopia technique, as I 
understand it, will do what we want without leaving any potential for races or 
dropped data.  I think this is solvable, but I need some time (and the ability 
to actually get a successful Qtopia build).

So that's the entire story as I currently understand it.  I guess the whole 
thing becomes much less of a problem if somebody out there has additional 
information that I lack -- perhaps an addendum on the SoC that documents what it 
does with the RTS line when the UART clocks are disabled, or when the SoC is 
suspended?  Is there anybody who can convince me that what I've outlined is not, 
in fact, a problem?

Regards,
Mike (mwester)

(Are we having fun yet?)




More information about the openmoko-kernel mailing list