RAC/CRS/Voting disk failover Tests

Posted By Sagar Patil

Failure Scenario How to Test it? Oracle Recovery
Private Network Failure between Nodes Pull PRIVATE port network cable out from RAC node 1 or 2 Oracle will push all connections from node1 to node 2 or vice versa. If an application is certified against RAC then unnoticeable to end users.
Public Network Failure Pull PUBLIC port network cable out from RAC node 1 and 2 Application will return an error as it won’t be able to connect to the database

Rman Backup/Restore/Recovery:

I will put script in place to backup entire database with archive logs for 2 days. The backup sets would be stored in ASM storage area for a quick restore if needed.

Rman Restore:

Assumption : Rman complete backup set is available at FLASH_RECOVERY_AREA

Scenario Database Crash How to Test it? Oracle Recovery
Loss of Control file NO Delete control file from ASM storage A control file will be multiplexed so deleting 1 file won’t pull Oracle database down
Loss of Redo Log NO Delete Redo Log from ASM storage A redo log will be multiplexed so deleting 1 log won’t pull Oracle database down
Loss of on-system Data file NO Delete data file from ASM storage Oracle will raise an alert and continue to function by setting data file as OFFLINE. If an application data was mapped at data file being unavailable then users will receive Oracle error like “file XYZ is offline”.We can restore files from latest rman backup.
Loss of SYSAUX data file NO Delete data files used for table space When SYSAUX table space is lost, it does not result in a database crash.We can restore files from latest rman backup & recover entire database until point in time.
Loss of SYSTEM data file YES Delete data file used for table space Will pull entire RAC system down.We can restore files from latest rman backup & recover entire database.

Cluster Component failure:

Scenario Database
Crash
How to Test it? Oracle Recovery
Loss of Voting Disk NO Disable SAN volume used for Voting Disk RAC will continue to function as far as Private interconnect between RAC nodes is working fine.I will schedule backup of voting disk every 4 hours. Voting disk contains transient data, even old backup is OK for restore.
Loss of Cluster Registry YES Disable SAN volume used for Cluster Registry Cluster registry is multiplexed between SAN volumes.In case of total failure to access OCR volumes, we need to restore it.

Temporary loss of SAN storage

In case SAN is completely lost then entire RAC system will crash. I am considering all SAN volumes used for data/backup/ocr/voting disk are lost and hence chances of data corruption are minimal as data corruption is possible when nodes evict each other and overwrite each others data blocks. With no access to SAN storage the nodes won’t able to carry any tasks.

Check alrtog messages for instance, kill dangling oracle processes and restart clusterware/ RAC instances once SAN is back.

Complete loss of SAN storage

If entire SAN array is blown away, then there is no easy way to recover it. We will have to re-install Oracle RAC s/w and restore old database/ocr/voting disks from TAPE. We have to use rman to rebuild ASM data structures.

Leave a Reply

You must be logged in to post a comment.

Top of Page

Top menu