Suspend/Resume oversight wrt GSM handling
mwester at dls.net
Thu May 15 06:42:48 CEST 2008
Mike (mwester) wrote:
> I'll be happy to provide my logic and reasoning for why this is a
> problem, but if there's something else that guarantees this (hardware
> somewhere?), perhaps it we could avoid needlessly discussing something
> that's a non-issue -- what is it that would force the GSM to be
> flow-controlled during suspend/resume processing and while in the
> suspended state?
Well, ok. I don't want to waste anyone's time, and I fear that the lack of
interest this issue has garnered in the several weeks I've been working on it
means that this may be a non-issue. Here's the detail; sorry for the length:
UART asserts RTS when it has room for data in its FIFO. It will un-assert RTS
(or "flowcontrol") when the FIFO is almost full.
When the phone is suspending, there will be no data in the FIFO (or else the
suspend would not have been started in the first place). Thus, as the
user-space code is frozen, and the drivers suspend, the GSM is not flow-controlled.
When the serial driver suspends, the generic UART code seems to issue a request
for the s3c24xx driver to un-assert RTS -- however this request is not fulfilled
because there is no code to support this in the current driver. Moreover, if
the code was present in the normal way, there is still no reason to believe that
RTS will be held in the un-asserted state. The S3C2410 documentation is silent
as it regards the state of the UART control signals while the processor is
suspended, or while the UART clocks are disabled.
It appears that the GSM becomes flowcontrolled at some point during this
process, because we do get an interrupt - but this may well be due to the mux's
response to a floating input as much as it might be due to undocumented behavior
of the SoC itself.
Other potential issues exist - timing, to be specific. There are race
conditions possible in the code in several places. A big hole is in the time
between the user-space gsm-handling code being frozen, and the time when the GSM
actually ends up flow-controlled. During this time, incoming data from the GSM
will not stop the suspend from happening, and neither will it result in a wakeup
interrupt. Depending on when the flowcontrol really happens (when the UART
clocks stop, or when the SoC suspends), we may or may not actually get the data
in the serial driver. The result is that the message is lost. The argument
presented at one point was that the GSM will simply "ring" again on incoming
call, so this doesn't matter -- this is true for incoming calls, but not the
case for SMS. I have observed just such an event involving a missed SMS during
my testing, which confirms that this is not as unlikely an event as it might seem.
Another area of concern is the resume processing. The SoC registers are all
saved, with the exception of the UARTs. I don't know why this is done, but the
PM code will not save the UART state unless low-level debugging of the PM code
is enabled(!) This means that the UARTs need to be completely re-initialized by
the serial driver on resume -- so unless great caution is taken, this provides
another opportunity for the GSM to transmit data to the UART at a time when the
serial driver is not set up correctly, resulting in data loss again. I can
confirm that the driver does not initialize consistently with
hardware-flow-control enabled, although eventually it seems to "get there".
It seems very clear to me at this point that unless we follow the original
design as described by Harald, we end up relying on undocumented behavior of the
SoC along with poorly-understood behavior of the serial driver itself, and even
if both of those behaviors are correct, we still have a few very large race
conditions to contend with.
Harald's solution is utterly simple. Convert the RTS output from the UART to a
GPIO output when necessary. The GPIO output will hold its state even as the
UART is shut down, during the period when the SoC is suspended, and during the
period of time when the serial driver is resetting UART FIFOs and all the rest.
From an implementation point-of-view, we need to find the right way to do this.
I've implemented a fail-safe mechanism in the suspend/resume code for the GSM
mux -- if nothing else has converted the signal to GPIO mode during suspend,
this code does so. Resume is a bit more tricky, as this driver resumes before
the serial port is ready - but by using a work thread, the switching back of the
GPIO can be scheduled for some point after the serial driver is ready.
This fail-safe mechanism is lacking in that it cannot prevent the race
conditions mentioned above during suspend. So, the same code implements a
second technique, intended to be run from apmd suspend and resume scripts - we
expose a control via sysfs that allows a user-space process to explicitly ask
that the GSM be flowcontrolled, or released.
Frankly, it works pretty well. It has the side-effect that it delays the
processing of the burst of data from the GSM until after the GTA01 is fully
resumed; without that side-effect the interrupt latency during that initial data
transfer would be such that the UART would be overrun each and every time the
Qtopia presents another potential problem. A conversation over IRC indicates
that it explicitly sets the modem-control signals to flow-control the GSM when
it prepares to suspend. This is a quite reasonable approach -- were it not for
the fact that the S3C24xx serial driver completely ignores user-space requests
to manage the modem-control signals. So if this is, in fact, how Qtopia does
it, it will suffer from all the same problems listed above.
The simple solution is to do a special partial implementation of the modem
control code, in which a request from user-space to de-assert RTS would convert
the pin to a GPIO output as described above. Unfortunately, it seems that
user-space isn't the only thing that diddles the modem-control lines; during
suspend something else is manipulating those signals as well.
So there is more work to do if we wish to ensure that the Qtopia technique, as I
understand it, will do what we want without leaving any potential for races or
dropped data. I think this is solvable, but I need some time (and the ability
to actually get a successful Qtopia build).
So that's the entire story as I currently understand it. I guess the whole
thing becomes much less of a problem if somebody out there has additional
information that I lack -- perhaps an addendum on the SoC that documents what it
does with the RTS line when the UART clocks are disabled, or when the SoC is
suspended? Is there anybody who can convince me that what I've outlined is not,
in fact, a problem?
(Are we having fun yet?)
More information about the openmoko-kernel