Cache Coherency
GCS synchronizes global cache access, allowing only one instance
at a time to modify the block. Thus, cache coherency is
maintained in the RAC system by coordinating buffer caches located on separate
instances.
GCS ensures that the data blocks cached in different cache buffers
are maintained globally. That is why some people prefer
to call cache fusion a ‘diskless
cache coherency’ mechanism. This is true in a sense, because the
previous Oracle parallel server version (OPS) utilized ‘forced disk writes’ to maintain cache coherency.
Global Cache Service
·
GCS is the main controlling
process for cache fusion.
·
It tracks the location and status
(mode and role) of the data blocks, as well as the access privileges of the
various instances.
·
GCS guarantees data integrity by
employing global access levels.
·
It maintains block modes for data
blocks in the global role.
·
It is also responsible for block
transfers between instances.
In a RAC system, users can connect with multiple
instances to run database queries. Typically, users will be connected to
different nodes but access the same set of data or data blocks. This situation
demands that the data consistency, formerly confined to a single instance, be
effectively extended to multiple instances. Therefore, buffer cache coherence
from multiple instances must be maintained.
Instances require three main types
of concurrency:
·
Concurrent reads on multiple instances — When users on two
different instances need to read the same set of blocks.
·
Concurrent reads and writes on different instances — A user
intends to read a data block that was recently modified, and the read can be
for either the current version of the block, or for a read-consistent previous
version.
·
Concurrent writes on different instances — When the same set
of data blocks are modified by different users on different instances”
·
Cache Coherency demands that even though there are multiple
instances (each with a separate db_cache_size data buffer
region) in which data blocks can reside or brought in, block consistency must
be maintained.
·
Oracle RAC achieves this by following the inter-instance block
transfers through Cache Fusion mechanism.
·
The global cache services (GCS), which is implemented as a set of
processes, organizes this facility.
·
GCS also ensures that only one instance modifies the block at any
given time. Even when the same data block is cached in different instances at
the same time, global consistency is maintained.
Data Block Writing Method
Oracle follows the concept of
Dirty Block and Past Image of the block. Let’s understand what they are.
Whenever a server process changes or modifies a data
block, it becomes a dirty block. Once a server process makes changes to the
data block, the user may commit transactions, or transactions may not be
committed for quite some time. In either case, the dirty block is not
immediately written back to disk.
Writing dirty blocks to
disk takes place under the following two conditions:
·
When a server process cannot find a clean, reusable buffer after
scanning a threshold number of buffers, then the database writer process writes
the dirty blocks to disk.
·
When the checkpoint takes place the database writer process writes
the dirty blocks to disk”
As we are aware, a typical data
block is not written to the disk immediately, even after it becomes dirty as
the result of an update.
When the same dirty data block is
requested by another instance for write or read purposes, an image of the block
is created at the owning instance, and only that block is shipped to the
requesting instance. This backup image of the block is called the past image
(PI) and is kept in memory.
In the event of instance failure,
Oracle can reconstruct the current version of the block by reading the PIs from
RAM. It is also possible to have more than one past image in the memory
depending on how many times the data block was requested in the dirty stage.
The process of writing the blocks back to the I/O device (disk storage unit)
depends on the checkpoint schedule defined by the DBA for the RAC cluster. Once
the checkpoint interval is reached, Oracle’s Database Writer (DBWR) process
initiates an asynchronous write of the dirty blocks to disk.
When the write takes place, a message is sent across
Cache Fusion to change the status for the block in the other instances and the
past images (PI), on all other instances are invalidated and discarded.
For more details, refer to Oracle
Metalink Document Note # 139436.1 titled, “Understanding 9i Real Application
Clusters Cache Fusion.”
Internal
Lock Messaging in RAC
Remember, Oracle uses a
lock escalation mechanism to maintain cache coherency. There can only be one
block buffered in the “xcur” exclusive state in the cluster at any one
time and to modify a block, each instance must assign an xcur state
to the buffer containing the block.
For example, if another instance requests reading the same block in its most current version, then oracle sends a message to change the access mode from exclusive to shared, sends the block to the requesting instance and keeps a Prior Image (PI) buffer if the buffer contained a dirty (changed) block. It then sends a “current read” version of the block to the requesting instance. The original instance keeps a copy in current mode, but the overall status of the block becomes global. Again, there can be multiple copies of the shared current (scurmode) cached at any time.
In early versions of
Oracle OPS, one master instance kept track of the lock status, so if the master
instance crashed, the entire OPS system went down. Obviously, this was a
serious shortcoming, remedied in RAC. In later versions of OPS and RAC, only
the uncommitted transactions on the instance that goes down are lost. The other
instances stay active.
In RAC there is still a master
node, but while the first node to start-up becomes the “master” node, it is
strictly a bookkeeping method, and there are no repercussions to the cluster if
the master node dies. The Cache Fusion mechanisms for Global Caching Service
(GCS) and Global Enqueue Service (GES) are global resources, running on all
nodes in the cluster, serving to maintain copies of the global dictionary.
Now that we understand the RAC
block updating process, we are ready to move even deeper into RAC internals.
Our next installment will examine RAC invalidation mechanisms.
No comments:
Post a Comment