Duped Messages (OffTopic but important to lists.openmoko.org operations)
Joel Newkirk
freerunner at newkirk.us
Tue Sep 2 05:02:59 CEST 2008
Sarton O'Brien wrote:
> On Monday 01 September 2008 23:23:21 Joel Newkirk wrote:
>
>> Rui Miguel Silva Seabra wrote:
>>
>>> Since we're talking about the mailing lists, I still receive (randomly),
>>> three repeated mails, two repeated mails, etc...
>>>
>>>
>> per day, I suspect that either spam filtering or some testing for ML
>> (IE, 'is this sender permitted to post?') takes place AFTER receiving
>> the message but BEFORE telling the sending server that message was
>> received, and is periodically taking longer than the sending server's
>> SMTP timeout, so the sender gives up on the connection and tries again -
>> meanwhile the exim server handling the ML eventually accepts the
>> message. My suspicion is that the load on the (virtual?) server hosting
>> the ML is getting to a level where processing messages sometimes takes
>> longer than some sending servers are willing to wait. Properly, in such
>>
>
> Sounds like a decent theory. I've experienced abnormalities when pre-
> processing facilities (greylisting/virus scanning/spam filtering) are
> configured incorrectly or terminate abruptly or incorrectly.
>
> I imagine just looking at the logs around the time of duplication would give a
> fair indication as to where the issue is. Otherwise system load sounds
> reasonable, swap in anything time critical is not ideal.
>
> Sarton
>
Tested, insofar as I am able to test it being on the 'outside'. I
telnetted to sita.openmoko.org port 25 and manually sent a few
help-request emails to community-request at lists.openmoko.org.
After that final period (a period by itself on a line ends an SMTP
message transfer) it sits there thinking for 12-15 seconds in silence,
then responds - slightly more delay that I'm used to in such tests, but
not unusual, (particularly if filtering is performed at that stage,
which would allow unacceptable emails to be rejected instead of having
to bounce - better for spam control that way) and one test it took only
7.5 seconds.
Then there was the test (same procedure, same entries, second test)
where after the period-enter I waited 4 minutes with no '250 OK'
response. Lacking that response code, the sending mailserver eventually
presumes the communication to have failed, and queues the message for
redelivery attempt. I finally killed my telnet session, and sure enough
the message "The results of your email commands" came back to me from
Mailman.
***HOWEVER: RFC-2821 specifies the timeout period while awaiting '250
OK' response is 10 minutes, so strictly speaking this is a
misconfiguration or bug in the SENDING mailserver, NOT Exim 4.63 on
sita.openmoko.org.***
But I wish you luck if you try to convince Google that their mailservers
are broken... This RFC was drafted in 2001, expanding (ESMTP) on the
1982 RCF-821 (SMTP), at which time waiting 10 minutes for a
500-character data transfer to be acknowledged may have seemed
reasonable. But with today's common end-user bandwidth, computer
speeds, and user expectations, that's just not sensible. (particularly
the last - imagine users waiting up to 10 minutes for Outlook Express to
finish sending a one-line email... if their outbound SMTP server did
this there'd be a lot of phone calls)
So the gist of it is that the mailserver sita.openmoko.org is confirmed
as the source of the dupes - though not strictly its fault - and whether
through resource scarcity on itself or a problem elsewhere (eg: DNS
server it uses for RBL lookups could be barfing) I can't determine.
Misconfiguration is a possibility, if it leads to resource
overcommitment. Strictly speaking the problem is at the sending
servers, but realistically it's unlikely to be fixed on most of them, so
if I were postmaster I'd be checking logs and looking for a local fix.
j
{successful dialog pasted here, failed was identical except for lack of
'250 OK' response - my entries prepended here with '>'}
newkirk at blakbox-k:/usr/share/misc$ telnet sita.openmoko.org 25
Trying 88.198.124.203...
Connected to sita.openmoko.org.
Escape character is '^]'.
220 sita.openmoko.org ESMTP Exim 4.63 Tue, 02 Sep 2008 03:20:42 +0200
>HELO newkirk.us
250 sita.openmoko.org Hello rrcs-70-62-125-137.midsouth.biz.rr.com
[70.62.125.137]
>MAIL FROM: freerunner at newkirk.us
250 OK
>RCPT TO: community-request at lists.openmoko.org
250 Accepted
>DATA
354 Enter message, ending with "." on a line by itself
>SUBJECT: Help
>
>
>help
>testing
>123
>
>
>.
250 OK id=1KaKZp-00035l-IE
>quit
221 sita.openmoko.org closing connection
Connection closed by foreign host.
newkirk at blakbox-k:/usr/share/misc$
{excerpt from RFC 2821 http://www.faq.org/rfcs/rfc2821.html }
DATA Termination: 10 minutes.
This is while awaiting the "250 OK" reply. When the receiver gets
the final period terminating the message data, it typically
performs processing to deliver the message to a user mailbox. A
spurious timeout at this point would be very wasteful and would
typically result in delivery of multiple copies of the message,
since it has been successfully sent and the server has accepted
responsibility for delivery. See section 6.1 for additional
discussion.
More information about the community
mailing list