Oracle RAC: Oracle Clusterware Architecture

Showing posts with label Oracle Clusterware Architecture. Show all posts

Tuesday, February 25, 2014

Oracle Automatic Storage Management - ASM

ASM is a volume manager and file system.
ASM operates efficiently in both clustered and nonclustered environments.
ASM is installed in the Grid Infrastructure home
Separate from the Oracle Database home.

ASM Key Features and Benefits

Stripes files rather than logical volumes
Provides redundancy on a file basis
Enables online disk reconfiguration and dynamic rebalancing
Reduces the time significantly to resynchronize a transient failure by tracking changes while disk is offline
Provides adjustable rebalancing speed
Is cluster-aware
Supports reading from mirrored copy instead of primary copy for extended clusters
Is automatically installed as part of the Grid Infrastructure

GPnP Architecture Overview

GPnP Service

The GPnP service is collectively provided by all the GPnP agents.
It is a distributed method of replicating profiles.
The service is instantiated on each node in the domain as a GPnP agent.
The service is peer-to-peer; there is no master process. This allows high availability because any GPnP agent can crash and new nodes will still be serviced.
GPnP requires standard IP multicast protocol (provided by mDNS), to locate peer services. Using multicast discovery, GPnP locates peers without configuration. This is how a GPnP agent on a new node locates another agent that may have a profile it should use.

Name Resolution

A name defined within a GPnP domain is resolvable in the following cases:

Hosts inside the GPnP domain use normal DNS to resolve the names of hosts outside of the GPnP domain. They contact the regular DNS service and proceed. They may get the address of the DNS server by global configuration or by having been told by DHCP.
Within the GPnP domain, host names are resolved using mDNS. This requires an mDNS responder on each node that knows the names and addresses used by this node, and operating system client library support for name resolution using this multicast protocol. Given a name, a client executes gethostbyname, resulting in an mDNS query. If the name exists, the responder on the node that owns the name will respond with the IP address.

The client software may cache the resolution for the given time-to-live value.

Machines outside the GPnP domain cannot resolve names in the GPnP domain by using multicast. To resolve these names, they use their regular DNS. The provisioning authority arranges the global DNS to delegate a subdomain (zone) to a known address that is in the GPnP domain. GPnP creates a service called GNS to resolve the GPnP names on that fixed address.

The node on which the GNS server is running listens for DNS requests. On receipt, they translate and forward to mDNS, collect responses, translate, and send back to the outside client. GNS is “virtual” because it is stateless. Any node in the multicast domain may host the server. The only GNS configuration is global:

The address on which to listen on standard DNS port 53
The name(s) of the domains to serviced

There may be as many GNS entities as needed for availability reasons. Oracle-provided GNS may use CRS to ensure availability of a single GNS provider.

SCAN and Local Listeners

When a client submits a connection request, the SCAN listener listening on a SCAN IP address and the SCAN port are contacted on the client’s behalf.

Because all services on the cluster are registered with the SCAN listener, the SCAN listener replies with the address of the local listener on the least-loaded node where the service is currently being offered.

Finally, the client establishes a connection to the service through the listener on the node where service is offered. All these actions take place transparently to the client without any explicit configuration required in the client.

During installation, listeners are created on nodes for the SCAN IP addresses. Oracle Net Services routes application requests to the least loaded instance providing the service.

Because the SCAN addresses resolve to the cluster, rather than to a node address in the cluster, nodes can be added to or removed from the cluster without affecting the SCAN address configuration.

How GPnP Works: Cluster Node Startup

IP addresses are negotiated for public interfaces using DHCP:

VIPs
SCAN VIPs

A GPnP agent is started from the nodes Clusterware home.
The GPnP agent either gets its profile locally or from one of the peer GPnP agents that responds.
Shared storage is configured to match profile requirements.
Service startup is specified in the profile, which includes:

Grid Naming Service for external names resolution
Single-client access name (SCAN) listener

How GPnP Works: Client Database Connections

In a GPnP environment, the database client no longer has to use the TNS address to contact the listener on a target node. Instead, it can use the EZConnect method to connect to the database.

When resolving the address listed in the connect string, the DNS will forward the resolution request to the GNS with the SCAN VIP address for the chosen SCAN listener and the name of the database service that is desired. In EZConnect syntax, this would look like:

scan-name.cluster-name.company.com/ServiceName, where the service name might be the database name. The GNS will respond to the DNS server with the IP address matching the name given; this address is then used by the client to contact the SCAN listener. The SCAN listener uses its connection load balancing system to pick an appropriate listener, whose name it returns to the client in an OracleNet Redirect message. The client reconnects to the selected listener, resolving the name through a call to the GNS.

The SCAN listeners must be known to all the database listener nodes and clients. The database instance nodes cross-register only with known SCAN listeners, also sending them per-service connection metrics. The SCAN known to the database servers may be profile data or stored in OCR.

Controlling Oracle Clusterware

The crsctl utility is used to invoke certain OHASD functions.

To start or stop Oracle Clusterware on all nodes:

# crsctl start cluster

# crsctl stop cluster

To enable or disable Oracle Clusterware for automatic startup on a specific node:

# crsctl enable crs

# crsctl disable crs

To check the status of CRS on the local node:

# crsctl check cluster

Verifying the Status of Oracle Clusterware

$ crsctl check cluster -all

***********************************************************

host01:

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

***********************************************************

host02:

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

***********************************************************

host03:

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

***********************************************************

Viewing the High Availability Services Stack

$ crsctl stat res -init -t

---------------------------------------------------------------

NAME TARGET STATE SERVER STATE_DETAILS

---------------------------------------------------------------

Cluster Resources

---------------------------------------------------------------

ora.asm

1 ONLINE ONLINE host01 Started

ora.cluster_interconnect.haip

1 ONLINE ONLINE host01

ora.crf

1 ONLINE ONLINE host01

ora.crsd

1 ONLINE ONLINE host01

ora.cssd

1 ONLINE ONLINE host01

ora.cssdmonitor

1 ONLINE ONLINE host01

ora.ctssd

1 ONLINE ONLINE host01 OBSERVER

ora.evmd

1 ONLINE ONLINE host01

...

Oracle Clusterware Initialization

During the installation of Oracle Clusterware, the init.ohasd startup script is copied to /etc/init.d . The wrapper script is responsible for setting up environment variables and then starting the Oracle Clusterware daemons and processes.

The Oracle High Availability Services daemon (ohasd) is responsible for starting in proper order, monitoring, and restarting other local Oracle daemons including the crds daemon, which manages clusterwide resources. When init starts ohasd on Clusterware startup, ohasd starts orarootagent,cssdagent, and oraagent. Some of the high availability daemons will be running under the root user with real-time priority, and others will be running under the Clusterware owner with user-mode priorities after they are started. When a command is used to stop Oracle Clusterware, the daemons will be stopped, but the ohasd process will remain running.

When a cluster node boots, or Clusterware is started on a running clusterware node, the init process starts ohasd. The ohasd process then initiates the startup of the processes in the lower, or Oracle High Availability (OHASD) stack.

The cssdagent process is started, which in turn, starts cssd. The cssd process discovers the voting disk either in ASM or on shared storage, and then joins the cluster. The cssdagent process monitors the cluster and provides I/O fencing. This service formerly was provided by Oracle Process Monitor Daemon (oprocd). A cssdagent failure may result in Oracle Clusterware restarting the node.

The orarootagent is started. This process is a specialized oraagent process that helps crsd start and manage resources owned by root, such as the network and the grid virtual IP address.

The oraagent process is started. It is responsible for starting processes that do not need to be run as root.

The oraagent process extends clusterware to support Oracle-specific requirements and complex resources. This process runs server callout scripts when FAN events occur. This process was known as RACG in Oracle Clusterware 11g Release 1 (11.1).

The cssdmonitor is started and is responsible for monitoring the cssd daemon.

Oracle Local Registry

The Oracle Local Registry (OLR) is a registry similar to OCR and is located on each node in a cluster, but contains information specific to each node.
It contains manageability information about Oracle Clusterware, including dependencies between various services.
Oracle High Availability Services uses this information.
OLR is located on local storage on each node in a cluster.
Its default location is in the path Grid_home/cdata/host_name.olr, where Grid_home is the Oracle Grid Infrastructure home, and host_name is the host name of the node.

To check the OLR, execute the ocrcheck -local command on the desired node.

$ ocrcheck –local

To view the contents of the OLR, execute the ocrdump -local command, redirecting the output to stdout:

$ ocrdump -local -stdout

# ocrcheck -local

Status of Oracle Local Registry is as follows :

Version : 3

Total space (kbytes) : 262120

Used space (kbytes) : 2644

Available space (kbytes) : 259476

ID : 250248496

Device/File Name : /u01/app/11.2.0/grid/cdata/host01.olr

Device/File integrity check succeeded

Local registry integrity check succeeded

Logical corruption check succeeded

CSS Voting Disk Function

CSS is the service that determines which nodes in the cluster are available and provides cluster group membership and simple locking services to other processes.
CSS typically determines node availability via communication through a dedicated private network with a voting disk used as a secondary communication mechanism. This is done by sending heartbeat messages through the network and the voting disk as illustrated by the top graphic in the slide.
The voting disk is a file on a clustered file system that is accessible to all nodes in the cluster.
Its primary purpose is to help in situations where the private network communication fails. The voting disk is then used to communicate the node state information used to determine which nodes go offline.
Without the voting disk, it can be difficult for isolated nodes to determine whether it is experiencing a network failure or whether the other nodes are no longer available.
It would then be possible for the cluster to enter a state where multiple subclusters of nodes would have unsynchronized access to the same database files. The bottom graphic illustrates what happens when Node3 can no longer send heartbeats to other members of the cluster. When others can no longer see Node3’s heartbeats, they decide to evict that node by using the voting disk. When Node3 reads the removal message or “kill block,” it generally reboots itself to ensure that all outstanding write I/Os are lost.
Oracle Clusterware supports up to 15 redundant voting disks.
Note: The voting disk or file is usually known as quorum disk in vendor clusterware.

Oracle Cluster Registry-OCR

Cluster configuration information is maintained in the OCR.
The OCR relies on distributed shared cache architecture for optimizing queries, and clusterwide atomic updates against the cluster registry.
Each node in the cluster maintains an in-memory copy of OCR, along with the CRSD that accesses its OCR cache. Only one of the CRSD processes actually reads from and writes to the OCR file on shared storage. This process is responsible for refreshing its own local cache, as well as the OCR cache on other nodes in the cluster.
For queries against the cluster registry, the OCR clients communicate directly with the local CRS daemon (CRSD) process on the node from which they originate. When clients need to update the OCR, they communicate through their local CRSD process to the CRSD process that is performing input/output (I/O) for writing to the registry on disk.
The main OCR client applications are OUI, SRVCTL, Enterprise Manager (EM), the Database Configuration Assistant (DBCA), the Database Upgrade Assistant (DBUA), Network Configuration Assistant (NETCA), and the ASM Configuration Assistant (ASMCA).
The installation process for Oracle Clusterware gives you the option of automatically mirroring OCR. This creates a second OCR file, which is called the OCR mirror file, to duplicate the original OCR file, which is called the primary OCR file. Although it is recommended to mirror your OCR, you are not forced to do it during installation.

The Oracle Grid Infrastructure installation defines three locations for the OCR and supports up to five. New installations to raw devices are no longer supported.

Oracle RAC

Tuesday, February 25, 2014

Oracle Automatic Storage Management - ASM

GPnP Architecture Overview

Controlling Oracle Clusterware

Oracle Clusterware Initialization

Oracle Local Registry

CSS Voting Disk Function

Oracle Cluster Registry-OCR

Blog Archive

About Me