Thursday, November 1, 2012

RAC and Instance or Crash Recovery


RAC and Instance or Crash Recovery
When an instance fails and the failure is detected by another instance, the second instance performs the following recovery steps:
1.                  During the first phase of recovery, Global Enqueue Services (GES) remasters the enqueues.
2.                  The Global Cache Services (GCS) remasters its resources. The GCS processes remaster only those resources that lose their masters. During this time, all GCS resource requests and write requests are temporarily suspended. However, transactions can continue to modify data blocks as long as these transactions have already acquired the necessary resources.
3.                  After enqueues are reconfigured, one of the surviving instances can grab the Instance Recovery enqueue. Therefore, at the same time as GCS resources are remastered, SMON determines the set of blocks that need recovery. This set is called the recovery set. Because, with Cache Fusion, an instance ships the contents of its blocks to the requesting instance without writing the blocks to the disk, the on-disk version of the blocks may not contain the changes that are made by either instance. This implies that SMON needs to merge the content of all the online redo logs of each failed instance to determine the recovery set. This is because one failed thread might contain a hole in the redo that needs to be applied to a particular block. So, redo threads of failed instances cannot be applied serially. Also, redo threads of surviving instances are not needed for recovery because SMON could use past or current images of their corresponding buffer caches.
4.                  Buffer space for recovery is allocated and the resources that were identified in the previous reading of the redo logs are claimed as recovery resources. This is done to avoid other instances to access those resources.
5.                  All resources required for subsequent processing have been acquired and the Global Resource Directory (GRD) is now unfrozen. Any data blocks that are not in recovery can now be accessed. Note that the system is already partially available.
Then, assuming that there are past images or current images of blocks to be recovered in other caches in the cluster database, the most recent image is the starting point of recovery for these particular blocks. If neither the past image buffers nor the current buffer for a data block is in any of the surviving instances’ caches, then SMON performs a log merge of the failed instances. SMON recovers and writes each block identified in step 3, releasing the recovery resources immediately after block recovery so that more blocks become available as recovery proceeds. Refer to the section “Global Cache Coordination: Example” in this lesson for more information about past images.
6.                  After all blocks have been recovered and the recovery resources have been released, the system is again fully available.
In summary, the recovered database or the recovered portions of the database becomes available earlier, and before the completion of the entire recovery sequence. This makes the system available sooner and it makes recovery more scalable.
Note: The performance overhead of a log merge is proportional to the number of failed instances and to the size of the amount of redo written in the redo logs for each instance.

There are basically two types of failure in a RAC environment: instance and media. Instance failure involves the loss of one or more RAC instances, whether due to node failure or connectivity failure. Media failure involves the loss of one or more of the disk assets used to store the database files themselves.
If a RAC database undergoes instance failure, the first node still available that detects the failed instance or instances will perform instance recovery on all failed instances using the failed instances redo logs and the SMON process of the surviving instance. The redo logs for all RAC instances are located either on an OCFS shared disk asset or on a RAW file system that is visible to all the other RAC instances. This allows any other node to recover for a failed RAC node in the event of instance failure.
Recovery using redo logs allows committed transactions to be completed. Non-committed transactions are rolled back and their resources released.
There are experts with over a dozen years of working with Oracle databases that have yet to see an instance failure result in a non-recoverable situation with an Oracle database. Generally speaking, an instance failure in RAC or in normal Oracle requires no active participation from the DBA other than to restart the failed instance when the node becomes available once again.
If, for some reason, the recovering instance cannot see all of the datafiles accessed by the failed instance, an error will be written to the alert log. To verify that all datafiles are available, the ALTER SYSTEM CHECK DATAFILES command can be used to validate proper access.
Instance recovery involves nine distinct steps.  The Oracle manual only lists eight, but in this case, the actual instance failure has been included:
1. Normal RAC operation, all nodes are available.
2. One or more RAC instances fail.
3. Node failure is detected.
4. Global Cache Service (GCS) reconfigures to distribute resource management to the surviving instances.
5. The SMON process in the instance that first discovers the failed instance(s) reads the failed instance(s) redo logs to determine which blocks have to be recovered.
6. SMON issues requests for all of the blocks it needs to recover.  Once all blocks are made available to the SMON process doing the recovery, all other database blocks are available for normal processing.
7. Oracle performs roll forward recovery against the blocks, applying all redo log recorded transactions.
8. Once redo transactions are applied, all undo records are applied, which eliminates non-committed transactions.
9. Database is now fully available to surviving nodes.
Instance recovery is automatic, and other than the performance hit to surviving instances and the disconnection of users who were using the failed instance, recovery is invisible to the other instances. If RAC failover and transparent application failover (TAF) technologies are properly utilized, the only users that should see a problem are those with in-flight transactions. The following listing shows what the other instance sees in its alert log during a reconfiguration.
Sat Feb 15 16:39:09 2003
Reconfiguration started
List of nodes: 0,
 Global Resource Directory frozen
one node partition
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
 Resources and enqueues cleaned out
 Resources remastered 1977
 2381 GCS shadows traversed, 1 cancelled, 13 closed
 1026 GCS resources traversed, 0 cancelled
 3264 GCS resources on freelist, 4287 on array, 4287 allocated
 set master node info
 
 Submitted all remote-enqueue requests
 Update rdomain variables
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 2381 GCS shadows traversed, 0 replayed, 13 unopened
 Submitted all GCS remote-cache requests
 0 write requests issued in 2368 GCS resources
 2 PIs marked suspect, 0 flush PI msgs
Sat Feb 15 16:39:10 2003
Reconfiguration complete
 Post SMON to start 1st pass IR
Sat Feb 15 16:39:10 2003
Instance recovery: looking for dead threads
Sat Feb 15 16:39:10 2003
Beginning instance recovery of 1 threads
Sat Feb 15 16:39:10 2003
Started first pass scan
Sat Feb 15 16:39:11 2003
Completed first pass scan
 208 redo blocks read, 6 data blocks need recovery
Sat Feb 15 16:39:11 2003
Started recovery at
 Thread 2: logseq 26, block 14, scn 0.0
Recovery of Online Redo Log: Thread 2 Group 4 Seq 26 Reading mem 0
  Mem# 0 errs 0: /oracle/oradata/ault_rac/ault_rac_raw_rdo_2_2.log
Recovery of Online Redo Log: Thread 2 Group 3 Seq 27 Reading mem 0
  Mem# 0 errs 0: /oracle/oradata/ault_rac/ault_rac_raw_rdo_2_1.log
Sat Feb 15 16:39:12 2003
Completed redo application
Sat Feb 15 16:39:12 2003
Ended recovery at
 Thread 2: logseq 27, block 185, scn 0.5479311
 6 data blocks read, 8 data blocks written, 208 redo blocks read
Ending instance recovery of 1 threads
SMON: about to recover undo segment 11
SMON: mark undo segment 11 as available
One word of caution, during testing for this listing, an instance could not be brought back up after failure, a rare occurrence. A kill -9 was done on the SMON process on AULTLINUX1, within the Linux/RAC/RAW environment.  AULTLINUX2 continued to operate and recovered the failed instance; however, an attempted restart of the instance on AULTLINUX1 yielded a Linux Error: 24: Too Many Files Open error. This was actually caused by something blocking the SPFILE link. Once the instance was pointed towards the proper SPFILE location during startup, it restarted with no problems.

No comments:

Post a Comment