|
mcadek -> Very strange behavior , routing & SMTP service crash and restart (4.Dec.2006 1:16:47 PM)
|
Let me outline what happened: here is the layout of our servers - 1 front end server with OWA, our PDA's also use this to connect their activesync with it - 1 backend server with 1 information store for entire corporation in US - 1 backend server with 1 information store for our overseas office - back end in US is 2003 enterprise SP2, overseas BE is 2003 standard SP2 , front-end is 2003 standard SP2, - this weekend all of a sudden our company PDA's stopped getting emails - OWA also went down with 503 error , service unavailable I think - Internally exchange was working , but would not send/receive mail (new message would sit in outbox). I am assuming SMTP service was not functioning right. - since I did not have much time for daignosis, I chose to reboot both the FE and BE. Everything started working again, but... - immediately after reboot I was getting IIS crash/illegal operation errors, it was related to the w3... worker process (w3we.exe or something like that, I do not have the screencap right here). It crashed about 10 times and then stopped coming up -everything works fine since except I get very strange errors. on the FE: I get event id 3007 for every user with a PDA/activesync, multiple times at random intervals : exchange mailbox server response timeout: Server: "servername" , user: "usernaem" , exchange activesync server failed to communicate with the exchange mailbox server in a timely manner, veryfy that the exchange mailbox server i workign correctly and is not overloaded. it definitely is NOT overloaded and it seems to work just fine for 150+ users that use it, even the PDA's even though those errors pop up in application event log! on Back end server I get these ID's that show up in blocks, usually these 5 or 6 Id's pop up all within 1 second in application event log 1005 - msexchangetransport - RE service has been started, Version: 6.5.7638.138.1 1008 - msexchangetransport - RE service instance 1 has been started 332 - msexchagnetransport - SMTP service has been started, initializing queues. 334 - msexchangetransport - SMTP service instance 1 has been started. 10302 - msexchangeactivesync - OMA categorizer successfully started 101 - DAVEX - DAVEX to be shutdown sometimes followed by DAVEX has successfully started event ID 100 in the system log the following entries provide more general overview of what is happening: Event Type: Warning Event Source: W3SVC Event Category: None Event ID: 1013 Description: A process serving application pool 'ExchangeApplicationPool' exceeded time limits during shut down. The process id was '3132'. Event Type: Information Event Source: W3SVC Event Category: None Event ID: 1082 Description: A worker process with pid '3132' that serves application pool 'ExchangeApplicationPool' has been determined to be unhealthy (see previous event log message), but because a debugger is attached to it, the World Wide Web Publishing Service will ignore the error. Event Type: Error Event Source: Service Control Manager Event Category: None Event ID: 7031 Description: The IIS Admin Service service terminated unexpectedly. It has done this 12 time(s). The following corrective action will be taken in 1 milliseconds: Run the configured recovery program. Event Type: Error Event Source: Service Control Manager Event Category: None Event ID: 7034 Date: 12/4/2006 Description: The Microsoft Exchange Routing Engine service terminated unexpectedly. It has done this 12 time(s). Event Type: Error Event Source: Service Control Manager Event Category: None Event ID: 7034 Date: 12/4/2006 Description: The Simple Mail Transfer Protocol (SMTP) service terminated unexpectedly. It has done this 12 time(s). Event Type: Warning Event Source: W3SVC Event Category: None Event ID: 1009 A process serving application pool 'ExchangeApplicationPool' terminated unexpectedly. The process id was '5852'. The process exit code was '0xffffffff'. then the affected services restart on their own and all is well for another 18 minutes (or more, seems random, last period was 18 minutes...) I expect my exchange organization will die shortly again, I searched the logs and this behavior started 3 days ago, so possibly in 3 more days it will lock up again... any ideas? I would be very grateful for any help you can give me. Thank you!
|
|
|
|