In this section of Oracle Database Tutorial:RAC we will talk about RAC(Real application clusters)
CONCEPTS AND FUNCTIONAL OVERVIEW
RAC is the ability to have two or more instances connect to the same database. These instances reside on separate servers, thereby allowing each instance to make full use of the processing ability of each server. As these instances are accessing the same database they need to be able to communicate with each other, this is done through physical interconnects that join the servers together. All the servers in the configuration make up the cluster.
All instances in a cluster share access to common database resources, this access needs to be coordinated between the instances in order to maintain the overall integrity of the database. In order to coordinate this access RAC databases have a Global Resource Directory. This Global Resource Directory is RAC specific and is not required on single instance systems.
Global Resource Directory(GRD)
Global Resource Directory provides an extra layer of control in order to allow all instances in a cluster to share access to shared database resources. The Global Resource Directory resides in the SGA of each instance in the cluster, with each instance maintaining its own portion. The main role of the Global Resource Directory is to ensure that access and changes to common resources is controlled between the instances in order to maintain the integrity of the database.
Global Enqueue Service(GES) and the Global Cache Service(GCS) maintain the information in the Global Resource Directory. Although the Global Resource Directory is split amongst all instances in the cluster, the Global Cache Service(GCS) and Global Enqueue Service(GES) nominate one instance to manage all information about a particular database resource. This instance is called the resource master. This ownership is periodically checked and changed accordingly between instances, this is done to reduce interconnect traffic and resource acquisition time.
Global Cache Service(GCS)
Global Cache Service is responsible for cache fusion i.e. transmitting data blocks between the instances. The main features of this processing are:
-The lms processes are the Global Cache Service(GCS) background processes in the instance
-Blocks can exist in more than one instance at a time.
-If an instance is requested to transfer a dirty block (a dirty block is a block that has been modified but not yet written to disk) to another instance in exclusive mode it keeps a past image of the block. This past image is basically a copy of the block as the holding instance last knew it before it transferred it to the requesting instance. This past image is used for recovery. Once the most recent copy (the master copy) of the dirty block is written to disk by dbwr the past images can and will be discarded. Note that PI’s can be used for a consistent read of a block as this saves having to build a copy from the rollback segment. The important thing to note is that an instance will always create a PI version of a dirty block before sending it to another instance if the reqeusting instance wants it in exclusive mode. If an instance requests the master block for read (consistent or current) there is no need for the holding instance to keep a PI as the requesting instance is not going to change the block
-The most recent copy (the master copy or current block) of a block contains all changes made to it by transactions, regardless of which instance the change occurred on and whether the transaction(s) has committed or not
-A block is assigned a role and mode and a status (clean or dirty)
-The block is held in a local role if it is only held in one SGA, it is held in a global role if it is held in more than one SGA
-The block can be held in null, shared or exclusive mode. Null mode means that the instance has no access rights on the blocks, shared mode means that the instance can read the block and exclusive mode means that the instance can write to the block. Many instances can have the same block in null or shared mode, but only one can have it in exclusive mode (as exclusive mode implies the instance wants to modify the block). To view the current mode of the block in an instance view v$bh.status. The following applies to the state of the blocks:
Global Enqueue Service(GES)
Global Enqueue Service(GES) manages all non-cache fusion resource requests and tracks the status of all enqueuing mechanisms. The GES only does this for resources that are accessed by more than one instance. The primary resources that the GES controls are dictionary cache locks and library cache locks. The GES manages the interinstance communication that occurs between the instances for these resources. These resources exist in single instance, the difference being that in RAC these must be coordinated between all instances in the cluster.
Dictionary cache locks – The data dictionary must be consistent across all nodes, if a table definition is changed on one instance the Global Enqueue Service ensures that the definition is propagated to the dictionary cache on all the other instances.
Library cache locks – These locks are held by sessions whilst they parse or execute SQL or PLSQL statements. These locks prevent other sessions from modifying the definitions of objects in the data dictionary that are being referenced by the statement that is currently parsing or executing.
BACKGROUND PROCESSES IN RAC
Each instance in a RAC configuration has the same background processes as a single instance database, however there are extra background processes evident in a RAC enabled instance, these are mainly related to the Global Cache Service(GCS) and the Global Enqueue Service(GES):
LMS processes – These are the processes for the Global Cache Service. These processes are responsible for transferring blocks between instances and maintaining the Global Resource Directory to reflect the modes and roles the blocks are held in on each instance. A block can exist in more than one instance at a time, but the Global Cache Service controls who has what version of the block thereby ensuring that the most up to date block (the master copy) is always the one that is updated. All other versions of the block will be past images or read consistent versions of the block.
LMON – Global Enqueue Service Monitor
This process monitors global enqueues and resources across the cluster and performs global enqueue recovery operations. LMON also handles recovery associated with global resources and can detect instance failure of other nodes
LMD – Global Enqueue Service Daemon
This process manages global enqueue and global resource access. Within each instance, the LMD process manages incoming remote resource requests
LCK – This is the lock process and makes up part of the Global Enqueue Service. It manages non-Cache Fusion resource requests such as library and dictionary cache requests
DIAG – This is diagnostic daemon.Captures diagnostic information related to instance process failures.
To view all the background processes (RAC and non-RAC) evident in an instance:
SQL> select * from v$bgprocess where paddr ‘00’
STORAGE SYSTEM FOR RAC
In a RAC environment all instances need to be able to write to the same datafiles simultaneously. There are 2 ways to do this, use RAW devices or use a Cluster Filesystem.
Datafiles and tempfiles
All datafiles and tempfiles must reside on shared disks. The first instance to start will verify that it can read all datafiles identified in the controlfile. This must be done so that the first instance to start can determine whether instance or media recovery is required or not, this behaviour is no different to single instance. However, instances that join the cluster at a later date can operate even if they cannot access all the files, they will simply raise an error when an attempt is made to access the file.
The control files must be on shared disks and must be accessible by all instances at startup time as determined in the parameter file.
Redo log files and archived logs
On RAC each instance must write to its own set of redo logs. This set is called a thread of redo. All threads of redo must reside on shared disks. The instance gets its thread of redo at startup time as determined by the thread parameter. If an instance cannot get its thread of redo it will fail to open. Each redo group will still be uniquely numbered at the database level and will be multiplexed or mirrored, just as in single instance. The only difference is that in RAC each redo group belongs to a thread, and only the instance specifying that thread number at startup time will write to the redo groups in that thread.
Each instance can, however, read all threads of redo. This is to facilitate instance recovery i.e if instance a fails then instance b will read instance a’s thread of redo to recover the failure. This must happen so that the consistency and integrity of the database is maintained if one instance fails. In order to facilitate instance recovery all redo files must reside on shared disks.
Archived log files are generated by each thread of redo and are uniquely identified by the thread number that we include in the log_archive_format and the sequence number which is unique for each instance. Archived_log can be on the local filesystem or shared filesystem.
Sequences are held on disk. Even in single instance many DBA’s cache sequence numbers to avoid contention for the sequence. We cache most sequences in RAC to avoid contention on the sequence. If you have a high cumulative wait time in v$enqueue_stat on the SQ enqueue (the sequence number enqueue) then you should consider caching enqueues. RAC does support CACHING and ORDERING of sequence numbers.
Undo/rollback datafiles must reside on RAW devices. If you use MANUAL undo then each instance must specify unique rollback segments in the instance specific parameter file. If you use AUTOMATIC undo then each instance must specify a separate tablespace, this tablespace must be available and of type UNDO. All instances in a RAC cluster must run in the same UNDO mode i.e you can’t have one running AUTOMATIC undo and another running MANUAL undo. If you are using AUTOMATIC undo monitor v$undostat for statistics.
To see if the oracle home is RAC enabled issue the following SQL:
select * from dba_registry
where comp_id = ‘RAC’;
To relink an oracle home with RAC disabled or enabled:
make –f ins_rdbms.mk rac_off install OR
make –f ins_rdbms.mk rac_on install
Important RAC wait events
SQL> select event from v$system_event
2 where event like ‘%global%’
3 order by event
buffer busy global CR
buffer busy global cache
ges global resource directory to be frozen – no
ges global resource directory to be unfrozen – no
global cache busy
global cache cr request
global cache domain validation – no
global cache null to s
global cache null to x
global cache open s
global cache open x
global cache s to x
buffer busy global cache
This wait event falls under the umbrella of ‘global buffer busy events’. This wait event occurs when a user is waiting for a block that is currently held by another session on the same instance and the blocking session is itself waiting on a global cache transfer.
buffer busy global CR
This wait event falls under the umbrella of ‘global buffer busy events’. This wait event occurs when multiple CR requests for the same block are submitted from the same instance before the first request completes, users may queue up behind it
global cache busy
This wait event falls under the umbrella of ‘global buffer busy events’. This wait event means that a user on the local instance attempts to acquire a block globally and a pending acquisition or release is already in progress.
global cache cr request
This wait event falls under the umbrella of ‘global cache events’. This wait event determines that an instance has requested a consistent read version of a block from another instance and is waiting for the block to arrive
global cache null to s and global cache null to x
This wait event falls under the umbrella of ‘global cache events’. These events are waited for when a block was used by an instance, transferred to another instance, and then requested back again.
global cache open s and global cache open x
This wait event falls under the umbrella of ‘global cache events’. These events are used when an instance has to read a block from disk into cache as the block does not exist in any instances cache. High values on these waits may be indicative of a small buffer cache, therefore you may see a low cache hit ratio for your buffer cache at the same time as seeing these wait events.
global cache s to x
This wait event falls under the umbrella of ‘global cache events’. This event occurs when a session converts a block from shared to exclusive mode.
To find locks in RAC
SELECT inst_id,DECODE(request,0,’Holder: ‘,’Waiter: ‘)||sid sess,
id1, id2, lmode, request, type
WHERE (id1, id2, type) IN
(SELECT id1, id2, type FROM gV$LOCK WHERE request>0)
ORDER BY id1, request
Some Imp points
1. Cache Fusion:Cache Fusion is a new parallel database architecture for exploiting clustered computers to achieve scalability of all types of applications. Cache Fusion is a shared cache architecture that uses high speed low latency interconnects available today on clustered systems to maintain database cache coherency. Database blocks are shipped across the interconnect to the node where access to the data is needed. This is accomplished transparently to the application and users of the system. As Cache Fusion uses at most a 3 point protocol, this means that it easily scales to clusters with a large numbers of nodes
2.The LMD and LMS processes are critical RAC processes that should not be blocked on CPU by queuing up behind other scheduled CPU events
3 v$ges_statistics view returns various statistics on the Global Enqueue Service.
4.gv$lock view will show all the locks held by all the instances
Infrastructure for Oracle Real Application clusters
Oracle RAC require clusterware for its functioning.Clusterware has two main functions; to provide node membership services and to provide internode communications.
Oracle RAC provides oracle clusterware software for Oracle RAC.
Oracle Clusterware includes a number of background processes that are
implemented in Linux/Unix as daemons, including the following:
•Cluster Synchronization Service (CSS)
•Cluster Ready Services (CRS)
•Event Management (EVM)
These background processes communicate with similar components on other instances in the same database cluster. They also enable communication between Oracle Clusterware and the Oracle database. Under Linux, each daemon can have multiple threads, each of which appears as a separate operating system process.
On Unix platforms an additional daemon process called OPROCD is configured. This process is locked into memory to monitor the cluster and provide I/O fencing. It provides similar functionality to the hangcheck timer on Linux. OPROCD performs its check, stops running, and if the wake up is beyond the expected time, then OPROCD reboots the node. An OPROCD failure results in Oracle Clusterware restarting the node.