Oracle RAC: RAC concepts

Showing posts with label RAC concepts. Show all posts

Tuesday, March 11, 2014

Global Resource Background Processes

Atomic Control File to Memory Service (ACMS): In a RAC environment, the ACMS per-instance process is an agent that contributes to ensuring that a distributed SGA memory update is either globally committed on success or globally aborted if a failure occurs.
Global Enqueue Service Monitor (LMON): The LMON process monitors global enqueues and resources across the cluster and performs global enqueue recovery operations.
Global Enqueue Service Daemon (LMD): The LMD process manages incoming remote resource requests within each instance.
Global Cache Service Process (LMS): The LMS process maintains records of the data file statuses and each cached block by recording information in the GRD. The LMS process also controls the flow of messages to remote instances and manages global data block access and transmits block images between the buffer caches of different instances. This processing is part of the cache fusion feature.
Instance Enqueue Process (LCK0): The LCK0 process manages noncache fusion resource requests such as library and row cache requests.
Global Cache/Enqueue Service Heartbeat Monitor (LMHB): LMHB monitors LMON, LMD, and LMSn processes to ensure that they are running normally without blocking or spinning.
Result Cache Background Process (RCBG): This process is used for handling invalidation and other messages generated by server processes attached to other instances in Oracle RAC.

RAC Concepts

A cluster comprises multiple interconnected computers or servers that appear as if they are one server to end users and applications. Oracle RAC enables you to cluster an Oracle database. Oracle RAC uses Oracle Clusterware for the infrastructure to bind multiple servers so they operate as a single system.

Oracle Clusterware is a portable cluster management solution that is integrated with Oracle Database. Oracle Clusterware is also a required component for using Oracle RAC. In addition, Oracle Clusterware enables both noncluster Oracle databases and Oracle RAC databases to use the Oracle high-availability infrastructure. Oracle Clusterware enables you to create a clustered pool of storage to be used by any combination of noncluster and Oracle RAC databases.

Oracle Clusterware is the only clusterware that you need for most platforms on which Oracle RAC operates. You can also use clusterware from other vendors if the clusterware is certified for Oracle RAC.

Noncluster Oracle databases have a one-to-one relationship between the Oracle database and the instance. Oracle RAC environments, however, have a one-to-many relationship between the database and instances. An Oracle RAC database can have up to 100 instances all of which access one database.

Objectives

Explain the necessity of global resources
Describe global cache coordination

Topics

Refer the links below:

Introduction to Oracle RAC
Oracle Real Application Clusters (RAC) 11g Release 2

Object Affinity and Dynamic Remastering

In addition to dynamic resource reconfiguration, the GCS, which is tightly integrated with the buffer cache, enables the database to automatically adapt and migrate resources in the GRD. This is called dynamic remastering. The basic idea is to master a buffer cache resource on the instance where it is mostly accessed. In order to determine whether dynamic remastering is necessary, the GCS essentially keeps track of the number of GCS requests on a per-instance and per-object basis. This means that if an instance, compared to another, is heavily accessing blocks from the same object, the GCS can take the decision to dynamically migrate all of that object’s resources to the instance that is accessing the object most.

The upper part of the graphic shows you the situation where the same object has master resources spread over different instances. In that case, each time an instance needs to read a block from that object whose master is on the other instance, the reading instance must send a message to the resource’s master to ask permission to use the block.

The lower part of the graphic shows you the situation after dynamic remastering occurred. In this case, blocks from the object have affinity to the reading instance, which no longer needs to send GCS messages across the interconnect to ask for access permissions.

Note: The system automatically moves mastership of undo segment objects to the instance that owns the undo segments.

Dynamic Reconfiguration

When one instance departs the cluster, the GRD portion of that instance needs to be redistributed to the surviving nodes. Similarly, when a new instance enters the cluster, the GRD portions of the existing instances must be redistributed to create the GRD portion of the new instance.

Instead of remastering all resources across all nodes, RAC uses an algorithm called lazy remastering to remaster only a minimal number of resources during reconfiguration. This is illustrated in the slide. For each instance, a subset of the GRD being mastered is shown along with the names of the instances to which the resources are currently granted. When the second instance fails, its resources are remastered on the surviving instances. As the resources are remastered, they are cleared of any reference to the failed instance.

Write to Disk Coordination Example

The scenario described in the slide illustrates how an instance can perform a checkpoint at any time or replace buffers in the cache as a response to free buffer requests. Because multiple versions of the same data block with different changes can exist in the caches of instances in the cluster, a write protocol managed by the GCS ensures that only the most current version of the data is written to disk. It must also ensure that all previous versions are purged from the other caches. A write request for a data block can originate in any instance that has the current or past image of the block. In this scenario, assume that the first instance holding a past image buffer requests that the Oracle server write the buffer to disk:

The first instance sends a write request to the GCS.
The GCS forwards the request to the second instance, which is the holder of the current version of the block.
The second instance receives the write request and writes the block to disk.
The second instance records the completion of the write operation with the GCS.
After receipt of the notification, the GCS orders all past image holders to discard their past images. These past images are no longer needed for recovery.

Note: In this case, only one I/O is performed to write the most current version of the block to disk.

Global Cache Coordination Example

The scenario described in the slide assumes that the data block has been changed, or dirtied, by the first instance. Furthermore, only one copy of the block exists clusterwide, and the content of the block is represented by its SCN.

The second instance attempting to modify the block submits a request to the GCS.
The GCS transmits the request to the holder. In this case, the first instance is the holder.
The first instance receives the message and sends the block to the second instance. The first instance retains the dirty buffer for recovery purposes. This dirty image of the block is also called a past image of the block. A past image block cannot be modified further.
On receipt of the block, the second instance informs the GCS that it holds the block.

Note: The data block is not written to disk before the resource is granted to the second instance.

Past Image

To maintain data integrity, a new concept of past image was introduced in the 9i version of RAC. A past image (PI) of a block is kept in memory before the block is sent and serves as an indication of whether it is a dirty block. In the event of failure, GCS can reconstruct the current version of the block by reading PIs. This PI is different from a CR block, which is needed to reconstruct read-consistent images. The CR version of a block represents a consistent snapshot of the data at a point in time.

For example, Transaction-A of Instance-A has updated row-2 on block-5, and later another Transaction-B of Instance-B has updated row-6 on the same block-5. Block-5 has been transferred from Instance-A to B. At this time, the past image (PI) for block-5 is created on Instance-A.

Global Resources Coordination

Cluster operations require synchronization among all instances to control shared access to resources.

RAC uses the Global Resource Directory (GRD) to record information about how resources are used within a cluster database. The Global Cache Services (GCS) and Global Enqueue Services (GES) manage the information in the GRD.

Each instance maintains a part of the GRD in its System Global Area (SGA). The GCS and GES nominate one instance to manage all information about a particular resource. This instance is called the resource master. Also, each instance knows which instance masters which resource.

Maintaining cache coherency is an important part of RAC activity. Cache coherency is the technique of keeping multiple copies of a block consistent between different Oracle instances. GCS implements cache coherency by using what is called the Cache Fusion algorithm.

The GES manages all non–Cache Fusion interinstance resource operations and tracks the status of all Oracle enqueuing mechanisms. The primary resources of the GES controls are dictionary cache locks and library cache locks. The GES also performs deadlock detection to all deadlock-sensitive enqueues and resources.

Global Resources Coordination Example
Write to Disk coordination Example

Cache Fusion and Resource Coordination

Because each node in a Real Application Cluster has its own memory (cache) that is not shared with other nodes, RAC must coordinate the buffer caches of different nodes while minimizing additional disk I/O that could reduce performance.

Cache Fusion is the technology that uses high-speed interconnects to provide cache-to-cache transfers of data blocks between instances in a cluster.

Cache Fusion functionality allows direct memory writes of dirty blocks to alleviate the need to force a disk write and reread (or ping) of the committed blocks. This is not to say that disk writes do not occur; disk writes are still required for cache replacement and when a checkpoint occurs. Cache Fusion addresses the issues involved in concurrency between instances: concurrent reads on multiple nodes, concurrent reads and writes on different nodes, and concurrent writes on different nodes.

Oracle only reads data blocks from disk if they are not already present in the buffer caches of any instance. Because data block writes are deferred, they often contain modifications from multiple transactions. The modified data blocks are written to disk only when a checkpoint occurs. Before I go further, you need to be familiar with a couple of concepts introduced in Oracle 9i RAC: resource modes and resource roles. Because the same data blocks can concurrently exist in multiple instances, there are two identifiers that help to coordinate these blocks:

Resource mode The modes are null, shared, and exclusive. The block can be held in different modes, depending on whether a resource holder intends to modify data or merely read it.
Resource role The roles are locally managed and globally managed.

Global Resource Directory (GRD) is not a database. It is a collection of internal structures and is used to find the current status of the data blocks. Whenever a block is transferred out of a local cache to another instance’s cache, the GRD is updated. The following information about a resource is available in GRD:

Data Block Identifiers (DBA)
Location of most current versions
Data blocks modes (N, S, X)
Data block roles (local or global)

Global Cache Service

GCS (Global Cache Service) and GES (Global Enqueue Service) (which are basically RAC processes) play the key role in implementing Cache Fusion.

GCS ensures a single system image of the data even though the data is accessed by multiple instances. The GCS and GES are integrated components of Real Application Clusters that coordinate simultaneous access to the shared database and to shared resources within the database and database cache.

GES and GCS together maintain a Global Resource Directory (GRD) to record information about resources and enqueues. GRD remains in memory and is stored on all the instances. Each instance manages a portion of the directory. This distributed nature is a key point for fault tolerance of the RAC.

The coordination of concurrent tasks within a shared cache server is called synchronization. Synchronization uses the private interconnect and heavy message transfers. The following types of resources require synchronization: data blocks and enqueues.

GCS maintains the modes for blocks in the global role and is responsible for block transfers between instances. LMS processes handle the GCS messages and do the bulk of the GCS processing.

An enqueue is a shared memory structure that serializes access to database resources. It can be local or global. Oracle uses enqueues in three modes: null (N) mode, share (S) mode, and exclusive (X) mode. Blocks are the primary structures for reading and writing into and out of buffers. An enqueue is often the most requested resource.

GES maintains or handles the synchronization of the dictionary cache, library cache, transaction locks, and DDL locks. In other words, GES manages enqueues other than data blocks. To synchronize access to the data dictionary cache, latches are used in exclusive (X) mode and in single-node cluster databases. Global enqueues are used in cluster database mode.

Necessity of Global Resources

In single-instance environments, locking coordinates access to a common resource such as a row in a table. Locking prevents two processes from changing the same resource (or row) at the same time.

In RAC environments, internode synchronization is critical because it maintains proper coordination between processes on different nodes, preventing them from changing the same resource at the same time. Internode synchronization guarantees that each instance sees the most recent version of a block in its buffer cache.

Note: The slide shows what would happen in the absence of cache coordination. RAC prevents this problem.

Monday, March 10, 2014

Clusters and Scalability

If your application scales transparently on SMP machines, it is realistic to expect it to scale well on RAC, without having to make any changes to the application code.

RAC eliminates the database instance, and the node itself, as a single point of failure, and ensures database integrity in the case of such failures.

The following are some scalability examples:

Allow more simultaneous batch processes.
Allow larger degrees of parallelism and more parallel executions to occur.
Allow large increases in the number of connected users in online transaction processing (OLTP) systems.

Benefits of Using RAC

Oracle Real Application Clusters (RAC) enables high utilization of a cluster of standard, low-cost modular servers such as blades.

RAC offers automatic workload management for services. Services are groups or classifications of applications that comprise business components corresponding to application workloads. Services in RAC enable continuous, uninterrupted database operations and provide support for multiple services on multiple instances. You assign services to run on one or more instances, and alternate instances can serve as backup instances. If a primary instance fails, the Oracle server moves the services from the failed instance to a surviving alternate instance. The Oracle server also automatically load-balances connections across instances hosting a service.

RAC harnesses the power of multiple low-cost computers to serve as a single large computer for database processing, and provides the only viable alternative to large-scale symmetric multiprocessing (SMP) for all types of applications.

RAC, which is based on a shared disk architecture, can grow and shrink on demand without the need to artificially partition data among the servers of your cluster.

RAC also offers a single-button addition of servers to a cluster. Thus, you can easily add a server to or remove a server from the database.

Oracle RAC