Duped Messages (was Re: Please split this list!)

Sarton O'Brien roguemoko at roguewrt.org
Tue Sep 2 02:30:55 CEST 2008


On Monday 01 September 2008 23:23:21 Joel Newkirk wrote:
> Rui Miguel Silva Seabra wrote:
> > Since we're talking about the mailing lists, I still receive (randomly),
> > three repeated mails, two repeated mails, etc...
> >
> > This mail from Vasco... I received it three times already! :)
> >
> > Rui
> >
> > On Mon, Sep 01, 2008 at 12:24:03PM +0100, vasco.nevoa at sapo.pt wrote:
>
> If you examine the email headers you can see that the 'received' header
> documenting where the mail server handling the list (sita.openmoko.org
> running exim 4.63) received the message from the sender's mailserver
> (IE, their ISP's mailserver) differs from one copy to the next - this
> means that either the sending server is failing to recognize that the
> message has been delivered and resends, possibly the receiving server is
> failing to send the acknowledgement of receipt at the end of the SMTP
> transaction (at least in a timely fashion), so the sending server
> automatically retries.
>
> Received: from relay1.ptmail.sapo.pt ([212.55.154.21] helo=sapo.pt)
> 	by sita.openmoko.org with smtp (Exim 4.63)
> 	(envelope-from <vasco.nevoa at sapo.pt>) id 1Ka7yh-0002Ow-Ha
> 	for community at lists.openmoko.org; Mon, 01 Sep 2008 13:55:37 +0200
>
> Received: from relay1.ptmail.sapo.pt ([212.55.154.21] helo=sapo.pt)
> 	by sita.openmoko.org with smtp (Exim 4.63)
> 	(envelope-from <vasco.nevoa at sapo.pt>) id 1Ka7fK-0005dL-Pf
> 	for community at lists.openmoko.org; Mon, 01 Sep 2008 13:55:37 +0200
>
> Received: from relay1.ptmail.sapo.pt ([212.55.154.21] helo=sapo.pt)
> 	by sita.openmoko.org with smtp (Exim 4.63)
> 	(envelope-from <vasco.nevoa at sapo.pt>) id 1Ka7ZO-0003bU-8Q
> 	for community at lists.openmoko.org; Mon, 01 Sep 2008 13:55:21 +0200
>
>
> Notice that the message ID differs between the three copies.
> This tells us that this is the point (when sapo.pt mailserver delivers
> to sita.openmoko.org mailserver) where the failure occurs.  If it were
> the mailinglist server sending dupes, this header would be identical
> among all copies.
>
> Also, it doesn't depend on sending mailserver software, I've noted it
> happening with qmail as sender, exim, gmail.com, and others.  (even
> mail.openmoko.org sometimes, such as Andy Green's reply to the '3G
> modem' thread)  Based on past experience as admin of a cluster of
> mailservers that sometimes exceeded 1 million incoming SMTP connections
> per day, I suspect that either spam filtering or some testing for ML
> (IE, 'is this sender permitted to post?') takes place AFTER receiving
> the message but BEFORE telling the sending server that message was
> received, and is periodically taking longer than the sending server's
> SMTP timeout, so the sender gives up on the connection and tries again -
> meanwhile the exim server handling the ML eventually accepts the
> message.  My suspicion is that the load on the (virtual?) server hosting
> the ML is getting to a level where processing messages sometimes takes
> longer than some sending servers are willing to wait.  Properly, in such
> a situation, the sending server is supposed to resend, and the receiving
> server is supposed to discard the message it failed to fully receive
> before the connection was broken.
>
> Unfortunately my mailserver experience is with surgemail, qmail,
> sendmail, and some exchange (ick), but I've never worked with Exim, so I
> can't suggest anything specific to check in the server config.  (When
> I've seen this caused by receiving mailserver it was most often qmail,
> and was caused by improperly configured/limited spawning that exceeded
> available RAM instead of deferring excess inbound connections - once
> dipping into swap, all bets are off regarding timely responses)

Sounds like a decent theory. I've experienced abnormalities when pre-
processing facilities (greylisting/virus scanning/spam filtering) are 
configured incorrectly or terminate abruptly or incorrectly.

I imagine just looking at the logs around the time of duplication would give a 
fair indication as to where the issue is. Otherwise system load sounds 
reasonable, swap in anything time critical is not ideal.

Sarton




More information about the community mailing list