Your SOP (Full Version)

All Forums >> [Microsoft Exchange 2007] >> Management



Message


catzodellamarina -> Your SOP (5.Feb.2010 10:30:10 AM)

This is my 2nd Exch cluster I've admin'd in the 2nd company. My previous experince was complete freedom to failover and use the cluster high availibility if needed for my administrative purposes. As we all know, a failover of Exch causes a slight burp in Outlook, which is easily recoverable or goes unnoticed many times. Not enough to measure downtime.

In the current company, mgmt won't allow a manual cluster failover unless after hours or weekends. Any slight interruption of Outlook is a denial of service to them. Anyone else live with this SOP?




mark@mvps.org -> RE: Your SOP (5.Feb.2010 10:39:20 AM)

If it's CCR you never fail back. The whole idea is that the two are identical and you carry on going until the next patch tuesday or the next failover. CCR has two way log replication so you don't have to re-seed unless something went wrong with a database.

If it's SCC you never fail back during business hours. Never. It's a scheduled outage at an acceptable time.




catzodellamarina -> RE: Your SOP (5.Feb.2010 10:49:56 AM)

Yes it's CCR. Rarely I need to failover but sometimes it's for patch installs that got stuck overnight and need a server reboot. The failover is 10 sec's.




mark@mvps.org -> RE: Your SOP (5.Feb.2010 10:52:15 AM)

Unless you're doing CCR badly (i.e. wrong) then you never should fail back.




catzodellamarina -> RE: Your SOP (5.Feb.2010 10:55:36 AM)

So let me get this straight. CCR never needs manual failover? Regardless of the situation? This means we should be pushing out our monthly updates to 1 cluster node, and let it replicate to the 2nd? No rebooting needed even if it's asking?




mark@mvps.org -> RE: Your SOP (5.Feb.2010 11:05:36 AM)

I said nothing of the sort.
If CCR fails over you go and fix it. At no point do you NEED to fail it back because it's in the same data centre on identical hardware and with identical disk behind it and has the right backup solution on it.
If the node fails over you fix the failed one and make sure that the replication is still working (reseed what you need to) and then you wait until the node it's on fails over again. Failover is automatic. Failback should firstly never be done or if you need to make the other node active the proper term is fail over, not fail back.

Stop over thinking this.




catzodellamarina -> RE: Your SOP (5.Feb.2010 1:05:20 PM)

OK. I had to step back and think "normal" again. You're right, under normal circumstances, it's a cluster and I should be able to run on either operationally. In my unique situation, I need to run on a designated primary node or my 3rd party backup product doesn't run the nightly brick level jobs. I won't get into all of the details but let's just say it's not as cluster aware as it should be. Sorry with my confusion. I misunderstood your direction and I also wasn't telling my whole story.

In the event you need to manually failover, would you always classify it as an afterhours task? Just looking for some feedback.




mark@mvps.org -> RE: Your SOP (5.Feb.2010 1:57:08 PM)

In which case the failover is obviously automatic but the failback to the desired production node would be done manually during an agreed change control window. Always.




Page: [1]