RAC and
Instance or Crash Recovery
When an instance fails and the failure is detected by another instance,
the second instance performs the following recovery steps:
1.
During the first phase of recovery, Global Enqueue
Services (GES) remasters the enqueues.
2.
The Global Cache Services (GCS) remasters its
resources. The GCS processes remaster only those resources that lose their
masters. During this time, all GCS resource requests and write requests are temporarily
suspended. However, transactions can continue to modify data blocks as long as
these transactions have already acquired the necessary resources.
3.
After enqueues are reconfigured, one of the
surviving instances can grab the Instance Recovery enqueue. Therefore, at the
same time as GCS resources are remastered, SMON determines the set of blocks
that need recovery. This set is called the recovery set. Because, with Cache
Fusion, an instance ships the contents of its blocks to the requesting instance
without writing the blocks to the disk, the on-disk version of the blocks may
not contain the changes that are made by either instance. This implies that
SMON needs to merge the content of all the online redo logs of each failed
instance to determine the recovery set. This is because one failed thread might
contain a hole in the redo that needs to be applied to a particular block. So,
redo threads of failed instances cannot be applied serially. Also, redo threads
of surviving instances are not needed for recovery because SMON could use past
or current images of their corresponding buffer caches.
4.
Buffer space for recovery is allocated and the
resources that were identified in the previous reading of the redo logs are
claimed as recovery resources. This is done to avoid other instances to access
those resources.
5.
All resources required for subsequent processing
have been acquired and the Global Resource Directory (GRD) is now unfrozen. Any
data blocks that are not in recovery can now be accessed. Note that the system
is already partially available.
Then, assuming that there are past images or current images of blocks to be recovered in other caches in the cluster database, the most recent image is the starting point of recovery for these particular blocks. If neither the past image buffers nor the current buffer for a data block is in any of the surviving instances’ caches, then SMON performs a log merge of the failed instances. SMON recovers and writes each block identified in step 3, releasing the recovery resources immediately after block recovery so that more blocks become available as recovery proceeds. Refer to the section “Global Cache Coordination: Example” in this lesson for more information about past images.
Then, assuming that there are past images or current images of blocks to be recovered in other caches in the cluster database, the most recent image is the starting point of recovery for these particular blocks. If neither the past image buffers nor the current buffer for a data block is in any of the surviving instances’ caches, then SMON performs a log merge of the failed instances. SMON recovers and writes each block identified in step 3, releasing the recovery resources immediately after block recovery so that more blocks become available as recovery proceeds. Refer to the section “Global Cache Coordination: Example” in this lesson for more information about past images.
6.
After all blocks have been recovered and the
recovery resources have been released, the system is again fully available.
In summary, the recovered database or the recovered portions of the
database becomes available earlier, and before the completion of the entire
recovery sequence. This makes the system available sooner and it makes recovery
more scalable.
Note: The performance overhead of a log merge is proportional to the
number of failed instances and to the size of the amount of redo written in the
redo logs for each instance.
There are basically two types of
failure in a RAC environment: instance and media. Instance failure involves the loss of one or more
RAC instances, whether due to node failure or connectivity failure. Media
failure involves the loss of one or more of the disk assets used to store the
database files themselves.
If a RAC database undergoes instance
failure, the first node still available that detects the failed instance or
instances will perform instance recovery on all failed instances using the
failed instances redo logs and the SMON process of the surviving instance. The
redo logs for all RAC instances are located either on an OCFS shared disk asset
or on a RAW file system that is visible to all the other RAC instances. This allows
any other node to recover for a failed RAC node in the event of instance
failure.
Recovery using redo logs allows
committed transactions to be completed. Non-committed transactions are rolled
back and their resources released.
There
are experts with over a dozen years of working with Oracle databases that have
yet to see an instance failure result in a non-recoverable situation with an
Oracle database. Generally speaking, an instance failure in RAC or in normal
Oracle requires no active participation from the DBA other than to restart the
failed instance when the node becomes available once again.
If,
for some reason, the recovering instance cannot see all of the datafiles
accessed by the failed instance, an error will be written to the alert log. To verify
that all datafiles are available, the ALTER SYSTEM CHECK DATAFILES command can
be used to validate proper access.
Instance
recovery involves nine distinct steps. The Oracle manual only lists
eight, but in this case, the actual instance failure has been included:
1.
Normal RAC operation, all nodes are available.
2.
One or more RAC instances fail.
3.
Node failure is detected.
4.
Global Cache Service (GCS) reconfigures to distribute resource management to
the surviving instances.
5.
The SMON process in the instance that first discovers the failed instance(s)
reads the failed instance(s) redo logs to determine which blocks have to be
recovered.
6.
SMON issues requests for all of the blocks it needs to recover. Once all
blocks are made available to the SMON process doing the recovery, all other
database blocks are available for normal processing.
7.
Oracle performs roll forward recovery against the blocks, applying all redo log
recorded transactions.
8.
Once redo transactions are applied, all undo records are applied, which
eliminates non-committed transactions.
9.
Database is now fully available to surviving nodes.
Instance
recovery is automatic, and other than the performance hit to surviving
instances and the disconnection of users who were using the failed instance,
recovery is invisible to the other instances. If RAC failover and transparent
application failover (TAF) technologies are properly utilized, the only users
that should see a problem are those with in-flight transactions. The following
listing shows what the other instance sees in its alert log during a
reconfiguration.
Sat Feb 15 16:39:09 2003
Reconfiguration started
List of nodes: 0,
Global Resource Directory frozen
one node partition
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Resources and enqueues cleaned out
Resources remastered 1977
2381 GCS shadows traversed, 1 cancelled, 13 closed
1026 GCS resources traversed, 0 cancelled
3264 GCS resources on freelist, 4287 on array, 4287 allocated
set master node info
Submitted all remote-enqueue requests
Update rdomain variables
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
2381 GCS shadows traversed, 0 replayed, 13 unopened
Submitted all GCS remote-cache requests
0 write requests issued in 2368 GCS resources
2 PIs marked suspect, 0 flush PI msgs
Sat Feb 15 16:39:10 2003
Reconfiguration complete
Post SMON to start 1st pass IR
Sat Feb 15 16:39:10 2003
Instance recovery: looking for dead threads
Sat Feb 15 16:39:10 2003
Beginning instance recovery of 1 threads
Sat Feb 15 16:39:10 2003
Started first pass scan
Sat Feb 15 16:39:11 2003
Completed first pass scan
208 redo blocks read, 6 data blocks need recovery
Sat Feb 15 16:39:11 2003
Started recovery at
Thread 2: logseq 26, block 14, scn 0.0
Recovery of Online Redo Log: Thread 2 Group 4 Seq 26 Reading mem 0
Mem# 0 errs 0: /oracle/oradata/ault_rac/ault_rac_raw_rdo_2_2.log
Recovery of Online Redo Log: Thread 2 Group 3 Seq 27 Reading mem 0
Mem# 0 errs 0: /oracle/oradata/ault_rac/ault_rac_raw_rdo_2_1.log
Sat Feb 15 16:39:12 2003
Completed redo application
Sat Feb 15 16:39:12 2003
Ended recovery at
Thread 2: logseq 27, block 185, scn 0.5479311
6 data blocks read, 8 data blocks written, 208 redo blocks read
Ending instance recovery of 1 threads
SMON: about to recover undo segment 11
SMON: mark undo segment 11 as available
Reconfiguration started
List of nodes: 0,
Global Resource Directory frozen
one node partition
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Resources and enqueues cleaned out
Resources remastered 1977
2381 GCS shadows traversed, 1 cancelled, 13 closed
1026 GCS resources traversed, 0 cancelled
3264 GCS resources on freelist, 4287 on array, 4287 allocated
set master node info
Submitted all remote-enqueue requests
Update rdomain variables
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
2381 GCS shadows traversed, 0 replayed, 13 unopened
Submitted all GCS remote-cache requests
0 write requests issued in 2368 GCS resources
2 PIs marked suspect, 0 flush PI msgs
Sat Feb 15 16:39:10 2003
Reconfiguration complete
Post SMON to start 1st pass IR
Sat Feb 15 16:39:10 2003
Instance recovery: looking for dead threads
Sat Feb 15 16:39:10 2003
Beginning instance recovery of 1 threads
Sat Feb 15 16:39:10 2003
Started first pass scan
Sat Feb 15 16:39:11 2003
Completed first pass scan
208 redo blocks read, 6 data blocks need recovery
Sat Feb 15 16:39:11 2003
Started recovery at
Thread 2: logseq 26, block 14, scn 0.0
Recovery of Online Redo Log: Thread 2 Group 4 Seq 26 Reading mem 0
Mem# 0 errs 0: /oracle/oradata/ault_rac/ault_rac_raw_rdo_2_2.log
Recovery of Online Redo Log: Thread 2 Group 3 Seq 27 Reading mem 0
Mem# 0 errs 0: /oracle/oradata/ault_rac/ault_rac_raw_rdo_2_1.log
Sat Feb 15 16:39:12 2003
Completed redo application
Sat Feb 15 16:39:12 2003
Ended recovery at
Thread 2: logseq 27, block 185, scn 0.5479311
6 data blocks read, 8 data blocks written, 208 redo blocks read
Ending instance recovery of 1 threads
SMON: about to recover undo segment 11
SMON: mark undo segment 11 as available
One word of caution, during
testing for this listing, an instance could not be brought back up after
failure, a rare occurrence. A kill -9 was done on the SMON process on
AULTLINUX1, within the Linux/RAC/RAW environment. AULTLINUX2 continued to
operate and recovered the failed instance; however, an attempted restart of the
instance on AULTLINUX1 yielded a Linux Error: 24: Too Many Files Open error.
This was actually caused by something blocking the SPFILE link. Once the
instance was pointed towards the proper SPFILE location during startup, it
restarted with no problems.
No comments:
Post a Comment