My environment: Exchange Server 2000 Version 6 Build 4417.6, installed on Windows Server 2000 5.00.2195 SP4. The server itself sits behind a Watchguard Firebox 700 w/ an SMTP Proxy enabled.
Problem: When sending mail to an external domain, the server tries to deliver mail to the domain's Primary MX record. If the corresponding IP address accepts traffic on port 25, the message is delivered without problem. If however, the IP address does not accept smtp traffic, two things occur:
1. The server tries to send the message to the Secondary MX record. I can see in the smtp logs that my server sends an EHLO and receives a "250-Requested+mail+action+okay,+completed" response, however, my server then sends a QUIT, and ends the session.
2. At this point, I can find the message stuck in an outgoing SMTP queue in System Manager, with a "Retry" state. The server is set to attempt resends every 15 minutes for 48hrs, but never makes it out.
For example: lets say I'm trying to send email to email@example.com, and their primary MX record is mail.example.com - 22.214.171.124, and their secondary MX record is mx.example.com - 126.96.36.199. If their primary record accepts mail, then the message goes thru fine. If not, this is what my smtp logs look like:
[DATE] [TIME] 188.8.131.52 OutboundConnectionResponse SMTPSVC1 MYSERVER - 25 - - 421+SMTP+service+not+available,+closing+transmission+channel 0 0 60 0 10187 SMTP - - - -
[DATE] [TIME] 184.108.40.206 OutboundConnectionCommand SMTPSVC1 MYSERVER - 25 QUIT - - 0 0 4 0 10187 SMTP - - - -
[DATE] [TIME] 220.127.116.11 OutboundConnectionResponse SMTPSVC1 MYSERVER - 25 - - 220+mx.example.com+Microsoft+ESMTP+MAIL+Service,+Version:+6.0.3790.3959+ready+at++Wed,+17+Jun+2009+21:58:10++0300+ 0 0 117 0 687 SMTP - - - -
[DATE] [TIME] 18.104.22.168 OutboundConnectionCommand SMTPSVC1 MYSERVER - 25 EHLO - mail.mydomain.com 0 0 4 0 687 SMTP - - - -
[DATE] [TIME] 22.214.171.124 OutboundConnectionResponse SMTPSVC1 MYSERVER - 25 - - 250-Requested+mail+action+okay,+completed 0 0 41 0 1172 SMTP - - - -
[DATE] [TIME] 126.96.36.199 OutboundConnectionCommand SMTPSVC1 MYSERVER - 25 QUIT - - 0 0 4 0 1172 SMTP - - - -
[DATE] [TIME] 188.8.131.52 OutboundConnectionResponse SMTPSVC1 MYSERVER - 25 - - 221+2.0.0+mx.example.com+Service+closing+transmission+channel 0 0 64 0 1390 SMTP - - - -
These same logs repeat every 15 minutes, for each queued retry attempt.
SPECIAL NOTE: To throw a monkey wrench into the whole thing, if there is 1 message stuck in the queue for example.com's domain, and I send a 2nd message, when the queue retries, the original (1st) message gets delivered to the secondary MX record just fine after it fails to the primary MX record, and the 2nd message is then stuck in the queue. In essense, the 2nd message "pushes" the 1st message thru.
Troubleshooting: I am able to telnet from my mail server to the domain's secondary MX record, and successfully send mail. I don't believe this is a DNS issue.
I've been banging my head on this issue for a few weeks now, and I can't honestly say that I know what is causing it. I've been able to reproduce this with 3 seperate domains (2 of which have since corrected their MX records, so it's no longer an issue for them). The problem is either with Exchange, or something to do with my Watchguard Firewall appliance/policy.
I inherited the server configuration and firewall my company had in place when I took the job, and all of these items are going to get replaced "soon", but as a jack of all trades and master of not much... I'd really like to be able to resolve this issue before moving forward.