thatdarnguy -> Database corrupt, please help! (20.Dec.2010 9:24:03 AM)
Hi guys, have a major problem on my hands and could use some assistance!
For a bit of background - our Exchange 2007 server is a virtual server sitting on an ESX 3.0.2 platform, this is streamed to our Disaster Recovery server every 4 hours via a 3rd party app called vReplicator. On Thursday last week, I was called late to say that our mailserver was offline, I checked it over and the VM had frozen during a replication, I killed the processes in the console, deleted the snapshots and put the server back on it's feet.
Now, this is where the problems begin, of course our backup (BackupExec 11d) on thursday night failed as the mailserver was offline, however the backup also failed on Friday night, it failed on the mailbox database with the following message:
Backup- \\Mailserver\Microsoft Information Store\First Storage Group WARNING: "\\STEXUK01\Microsoft Information Store\First Storage Group\Mailbox Database" is a corrupt file.
This file cannot verify.
Verify- \\Mailserver\Microsoft Information Store\First Storage Group WARNING: "Mailbox Database" is a corrupt file.
This file cannot verify.
...insert several expletives here.
Now on checking over the Exchange server, since the crash on thursday night, Event# 473 is appearing in the application logs:
MSExchangeIS (3296) First Storage Group: The database page read from the file "G:\EXCHSRVR\MDBDATA\Mailbox Database.edb" at offset 158994898944 (0x0000002504d5a000) (database page 19408556 (0x12826AC)) for 8192 (0x00002000) bytes failed verification due to a page checksum mismatch. The expected checksum was 1496906767379812099 (0x14c614c6238d4f03) and the actual checksum was 1695912612894263757 (0x178917893daac9cd). The read operation will fail with error -1018 (0xfffffc06). If this condition persists then please restore the database from a previous backup. This problem is likely due to faulty hardware. Please contact your hardware vendor for further assistance diagnosing the problem.
I've read into this error further and found this KB:
This is what it says:
Recovering from -1018 Errors
Exchange treats a page that fails with a -1018 error as completely unreadable to prevent action on random data from causing further problems in the database.
A page that fails with a -1018 error cannot be repaired or salvaged. It must be expunged from the database. There are three methods that you can use to expunge the page from the database:
Restore the database from an online backup.
Use the Eseutil.exe /D switch to do an offline defragmentation of the database.
Use the Eseutil.exe /P switch to repair the database.
Now here's what I'm not sure of... I was off work last week due to the snow, and the backup tapes did not get changed in my absence, the last good backup I have of the Exchange Database is from the 10th of this month, that's 10 days ago... Insert more expletives here.
Is there any way to restore that database into a Recovery Storage Group and rebuild the 10 days worth of mail from the logs whilst keeping the main database in production, then unmounting the corrupt and mounting the repaired database, thus eliminating masses of downtime? Or is this just stupidly optimistic?
Downtime is a big issue and naturally I want to minimise it, but I don't want to put data integrity at risk, so the priority is Data integrity, then minimising downtime, so with this in mind, what are my options? My back's against the wall here guys and I'd really appreciate any advice.
Sorry for the massive post too!!