Clusterware Components, Processes and Agents
Overview
·
Oracle
Clusterware Version 11g Release 2 introduces the concept of the agent.
·
Agents
are multi-threaded daemon programs that provide start, start, and cleanup and
check actions for different resource types.
·
For
example, the oraagent for crsd starts ASM, the oracle listener and starting the
SCAN listener.
·
Agents
can also receive, process and forward events to clients.
·
The
standard agents in Oracle Clusterware 11g Release 2 are oraagent, orarootagent
and cssdagent. Additionally there can be an application and script agents.
·
Agents
create their own log files. These log files are contained in either
ORA_CRS_HOME under a directory associated with the name of the agent.
There are a number of
different processes that are associated with Oracle Clusterware. These
processes are rolled up into several different Clusterware components. The
following table lists the Components, associated processes and provide a
description of the function of the component/process(es):
Component
|
Process
|
Description
|
Oracle High Availability Services
(OHAS)
|
Ohasd
|
This process is responsible for
starting the rest of the Oracle Clusterware stack on a given node. Ohasd is a
brand new cluster startup framework in Oracle Clusterware 11g Release 2 that
replaces the old init scripts.
|
Cluster Ready Service (CRS)
|
crsd
|
See the section titled CRS below for
more information on this component and the crsd process.
|
Cluster Synchronization Service
(CSS)
|
ocssd, cssdmonitor, cssdagent
|
See the section titled CSS below for
more information on this component and the crsd process.
|
Event Manager (EVM)
|
evmd, evmlogger
|
Responsible for publishing
Clusterware events.
|
Cluster Time Synchronization Service
(CTSS)
|
octssd
|
Provides time synchronization
services in an Oracle 11g Release 2 cluster.
|
Oracle Notification Service (ONS)
|
ons, enos
|
A publish-and-subscribe service
responsible for communicating Fast Application Notification (FAN) events.
|
Oracle Agent
|
oraagent
|
The Oracle Agent is in conjunction
with FAN to run scripts when specific Fan events occur.
|
Oracle Root Agent
|
orarootagent
|
This agent helps CRSD manage
resources that are owned by root .
|
Grid Naming Sertvice (GNS)
|
gnsd
|
Provides gateway services between
the multicast domain name service (which allows DNS requests) and external
DNS services. GNS provides for name resolution within a cluster.
|
Grid Plug and Play (GPnP)
|
gpnpd
|
Supports Grid Plug and Play
services, new in Oracle Clusterware 11g Release 2. GPnP provides services
that allow you to easily add or remove nodes from a given cluster.
|
Multicast domain name service (mDNS)
|
mdnsd
|
This service services DNS requests.
|
CRS is responsible for
managing HA options within the cluster. The crsd process manages CRS
operations. CRS manages two kinds of resources:
- Cluster resources
- Local resources
A cluster resource is a
resource that is cluster aware and is managed over the entire cluster via
the crsctlcommand. Cluster resources are subject to cross-node
switchover and failover. This means that a resource can be assigned to one or
more nodes, but may be re-assigned to a different node (of failed over to a
different node) on demand. Cluster resources are managed with the CRS daemon
(crsd). The OCR is used by CRS to manage the resource.
A local resource runs on
each node of the cluster. Examples of cluster resources are RAC instances and
listeners. CRS can control these services, starting them, stopping them and
restarting them in the event of a failure.
CSS is a service that is
responsible for determining which nodes of the cluster are available to the
cluster. CSS also supports other cluster processes by providing node membership
information and locking services. The CSS uses the private interconnect for
communications as well as the Clusterware voting disks. Through a combination
of heartbeat messages over the interconnect and the voting disks CSS will
determine the status of each node of the cluster.
CSS is also responsible
for interfacing with any third-party Clusterware vendors. In these
configurations CSS will interface with the vendor Clusterware and maintain the
node membership information.
The CSS service is
critical to Clusterware operations as it fences the operations of the nodes of
the cluster. For example, if the interconnect fails on a given node then the
failed node will no longer be able to communicate with the rest of the cluster.
Without CSS controlling the situation, the isolated node could cause severe
issues on the cluster including corruption of database data. This is what is
known as a split-brain condition.
To avoid split-brain
conditions CSS sends heartbeat messages across the cluster interconnect. If a
node fails (say the interconnect fails or the node freezes) then that node will
no longer send heartbeat messages. The surviving nodes will detect that the
heartbeat messages from the node are no longer being sent. CSS then uses
the voting disks to determine which node has gone offline. CSS will then work
with Oracle Clusterware to evict the missing node from the cluster.
The CSS uses several
different processes. Failure of these process will result in the restart of the
cluster. The CSS process are:
- CSS daemon (ocssd) – Manages
cluster node membership information. It’s also used in non-RAC installs to
provide Group Services (GS). ASM uses GS to register itself and its disk
groups.
- CSS Agent (cssdagent) – Monitors
the cluster and provides fencing services (was oprocd daemon in previous
versions). The CSS Agent is also responsible for monitoring vendor
Clusterware.
- CSS Monitor (cssdmonitor) – This
process monitors for node hangs, monitoris OCSSD processes for hangs and
is also responsible for monitoring vendor Clusterware.
Oracle Clusterware 11g
Release 2 changes the way that Clusterware is started. In a Linux install,
Clusterware is now started with one init script, init.ohasd which replaces a
number of scripts that were previously used. The ohasd daemon sets off a
cascade of processes as outlined in the following graphic:
Note: This graphic only
summarizes the processes started by Oracle Clusterware.
You can control the
startup or shutdown of the cluster via the crsctl command. For
example, use crsctl start cluster to start the cluster
and crsctl stop cluster to stop the cluster. You can also use
the crsctl check cluster command to check on the status of the
cluster. See the section titled “Managing Oracle Clusterware” for more
information on crsctl and managing Oracle Clusterware.
No comments:
Post a Comment