Duped Messages (OffTopic but important to lists.openmoko.org operations)

Joel Newkirk freerunner at newkirk.us
Tue Sep 2 05:02:59 CEST 2008


Sarton O'Brien wrote:
> On Monday 01 September 2008 23:23:21 Joel Newkirk wrote:
>   
>> Rui Miguel Silva Seabra wrote:
>>     
>>> Since we're talking about the mailing lists, I still receive (randomly),
>>> three repeated mails, two repeated mails, etc...
>>>
>>>       
>> per day, I suspect that either spam filtering or some testing for ML
>> (IE, 'is this sender permitted to post?') takes place AFTER receiving
>> the message but BEFORE telling the sending server that message was
>> received, and is periodically taking longer than the sending server's
>> SMTP timeout, so the sender gives up on the connection and tries again -
>> meanwhile the exim server handling the ML eventually accepts the
>> message.  My suspicion is that the load on the (virtual?) server hosting
>> the ML is getting to a level where processing messages sometimes takes
>> longer than some sending servers are willing to wait.  Properly, in such
>>     

>
> Sounds like a decent theory. I've experienced abnormalities when pre-
> processing facilities (greylisting/virus scanning/spam filtering) are 
> configured incorrectly or terminate abruptly or incorrectly.
>
> I imagine just looking at the logs around the time of duplication would give a 
> fair indication as to where the issue is. Otherwise system load sounds 
> reasonable, swap in anything time critical is not ideal.
>
> Sarton
>   
Tested, insofar as I am able to test it being on the 'outside'. I 
telnetted to sita.openmoko.org port 25 and manually sent a few 
help-request emails to community-request at lists.openmoko.org. 

After that final period (a period by itself on a line ends an SMTP 
message transfer) it sits there thinking for 12-15 seconds in silence, 
then responds - slightly more delay that I'm used to in such tests, but 
not unusual, (particularly if filtering is performed at that stage, 
which would allow unacceptable emails to be rejected instead of having 
to bounce - better for spam control that way) and one test it took only 
7.5 seconds.

Then there was the test (same procedure, same entries, second test) 
where after the period-enter I waited 4 minutes with no '250 OK' 
response.  Lacking that response code, the sending mailserver eventually 
presumes the communication to have failed, and queues the message for 
redelivery attempt.  I finally killed my telnet session, and sure enough 
the message "The results of your email commands" came back to me from 
Mailman.

***HOWEVER: RFC-2821 specifies the timeout period while awaiting '250 
OK' response is 10 minutes, so strictly speaking this is a 
misconfiguration or bug in the SENDING mailserver, NOT Exim 4.63 on 
sita.openmoko.org.*** 

But I wish you luck if you try to convince Google that their mailservers 
are broken...  This RFC was drafted in 2001, expanding (ESMTP) on the 
1982 RCF-821 (SMTP), at which time waiting 10 minutes for a 
500-character data transfer to be acknowledged may have seemed 
reasonable.  But with today's common end-user bandwidth, computer 
speeds, and user expectations, that's just not sensible. (particularly 
the last - imagine users waiting up to 10 minutes for Outlook Express to 
finish sending a one-line email... if their outbound SMTP server did 
this there'd be a lot of phone calls)

So the gist of it is that the mailserver sita.openmoko.org is confirmed 
as the source of the dupes - though not strictly its fault - and whether 
through resource scarcity on itself or a problem elsewhere (eg: DNS 
server it uses for RBL lookups could be barfing) I can't determine.  
Misconfiguration is a possibility, if it leads to resource 
overcommitment.  Strictly speaking the problem is at the sending 
servers, but realistically it's unlikely to be fixed on most of them, so 
if I were postmaster I'd be checking logs and looking for a local fix.

j

{successful dialog pasted here, failed was identical except for lack of 
'250 OK' response - my entries prepended here with '>'}

newkirk at blakbox-k:/usr/share/misc$ telnet sita.openmoko.org 25
Trying 88.198.124.203...
Connected to sita.openmoko.org.
Escape character is '^]'.
220 sita.openmoko.org ESMTP Exim 4.63 Tue, 02 Sep 2008 03:20:42 +0200
 >HELO newkirk.us
250 sita.openmoko.org Hello rrcs-70-62-125-137.midsouth.biz.rr.com 
[70.62.125.137]
 >MAIL FROM: freerunner at newkirk.us
250 OK
 >RCPT TO: community-request at lists.openmoko.org
250 Accepted
 >DATA
354 Enter message, ending with "." on a line by itself
 >SUBJECT: Help
 >
 >
 >help
 >testing
 >123
 >
 >
 >.
250 OK id=1KaKZp-00035l-IE
 >quit
221 sita.openmoko.org closing connection
Connection closed by foreign host.
newkirk at blakbox-k:/usr/share/misc$


{excerpt from RFC 2821 http://www.faq.org/rfcs/rfc2821.html }

   DATA Termination: 10 minutes.
      This is while awaiting the "250 OK" reply.  When the receiver gets
      the final period terminating the message data, it typically
      performs processing to deliver the message to a user mailbox.  A
      spurious timeout at this point would be very wasteful and would
      typically result in delivery of multiple copies of the message,
      since it has been successfully sent and the server has accepted
      responsibility for delivery.  See section 6.1 for additional
      discussion.







More information about the community mailing list