How ASM Failure Groups and CSS provide high availability

📁

Tutorial Collection

This guide is part of our comprehensive Oracle Database Reference Hub.

Table of Contents

ASM Failure Groups and Redundancy

1)For systems that do not use external redundancy, ASM provides its own internal redundancy mechanism
and additional high availability by way of ASM failure groups
2) It is also collection of disk and is considered as subset of disk group.
3)Disk group redundancy can be
Normal – It is Two-way mirroring requiring at least two failure groups(default)
High – It is high degree Three-way mirroring requiring at least three failure groups

Disk Group Type	Supported Mirroring Levels	Default Mirroring Level
External redundancy	Unprotected (None)	Unprotected (None)
Normal redundancy	Two-way Three-way Unprotected (None)	Two-way
High redundancy	Three way	Three way

4)Once a Disk group is created, its redundancy cannot be changed. The only way to change he redundancy is to create new Disk group with required redundancy and move the datafiles on it using RMAN restore or using DBMS_FILE_TRANSFER
5) ASM does not mirror disks but it mirror extents. When ASM allocates primary extent (first extent) of a file to one disk in DG, its mirror copy of that extent to another disk in DG

What are ASM Failure Groups

Failure groups are used to store mirror copies of data. When ASM allocates an extent for a normal redundancy file, ASM allocates a primary copy and a secondary copy. ASM chooses the disk on which to store the secondary copy so that it is in a different failure group than the primary copy. Each copy is on a disk in a different failure group so that the simultaneous failure of all disks in a failure group does not result in data loss.
A failure group is a subset of the disks in a disk group, which could fail at the same time because they share hardware. The failure of common hardware must be tolerated.

There are always failure groups even if they are not explicitly created. If you do not specify a failure group for a disk, then Oracle automatically creates a new failure group containing just that disk. A normal redundancy disk group must contain at least two failure groups. A high-redundancy disk group must contain at least three failure groups. However, Oracle recommends using several failure groups. A small number of failure groups, or failure groups of uneven capacity, can create allocation problems that prevent full use of all of the available storage.

ASM Fast Disk Resync

Disk loss in ASM can result from a number of reasons, such as loss of controller cards, cable failures, or power-supply errors. In many cases, the disk itself is still intact.
To allow for sufﬁcient time to recover from disk failures that do not involve the actual failure of a disk, ASM provides the ASM fast disk resync feature.

By default, when a disk in an ASM disk group fails the disk will be taken ofﬂine automatically. The disk will be dropped some 3.6 hours later. As a result, you have only 3.6 hours by default to respond to a disk outage. If you correct the problem and the physical disk media is not corrupted, then ASM fast disk resync will quickly re-synchronize the disk when it comes back online, correcting the problem very quickly.
To use this feature, the disk group compatibility attributes must be set to 11.1 or higher.
If a disk goes offline when the ASM instance is in rolling upgrade mode, the disk remains offline until the rolling upgrade has ended and the timer for dropping the disk is stopped until the ASM cluster is out of rolling upgrade mode. See “ASM Rolling Upgrade”.

You can change the amount of time that Oracle will wait to automatically drop the disk by setting the disk_repair_time attribute for the individual disk groups using the alter diskgroup command, as shown in this example, where we set the disk_repair_time attribute to 18 hours:

SQL> Alter diskgroup dgroup1 set attribute ‘disk_repair_time’=’10h’;

Example

SQL> create diskgroup TESTDG disk '/dev/raw/raw1','/dev/raw/raw2' attribute 'compatible.rdbms'='11.1','compatible.asm'='11.1';
Diskgroup created.
SQL> select group_number,name from v$asm_diskgroup where group_number=1;
GROUP_NUMBER NAME
----------   ----
1 TESTDG
SQL>select name,value from v$asm_attribute where group_number=1;
NAME VALUE
---  ---
disk_repair_time 3.6h
au_size 1048576
compatible.asm 11.1.0.0.0
compatible.rdbms 11.1.0.0.0

The default disk repair time is 3.6 hours

Connect to DB Instance
SQL> create tablespace test datafile '+TESTDG' size 20m;
Tablespace created.

SQL>Shutdown the DB Instance
SQL>Dismount the ASM Diskgroup

SQL> alter diskgroup TESTDG dismount;
Diskgroup altered.

Change the permission of /dev/raw/raw1 to simulate the disk loss

[root@11g ~]# chown root.root /dev/raw/raw1
[root@11g ~]# ls -ltr /dev/raw/raw1
crw-rw---- 1 root root 162, 1 May 8 01:47 /dev/raw/raw1

SQL> alter diskgroup TESTDG mount;
alter diskgroup TESTDG mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "0" is missing

SQL> alter diskgroup TESTDG mount force;
Diskgroup altered.

SQL>select path,name,repair_timer from v$asm_disk where group_number=1;
PATH NAME REPAIR_TIMER
-------   ---------
TESTDG_0000 12960
/dev/raw/raw2 TESTDG_0001 0

Disk groups mounted with the FORCE option will have one or more disks offline if they are not available at time of the mount.You must take corrective actions before DISK_REPAIR_TIME expires to restore those devices

Connect to DB Instance and add new datafile to the tablespace.

SQL> alter tablespace test add datafile '+TESTDG' size 20m;
Tablespace altered.

As there is only one disk available in the diskgroup (Normal redundancy), there will not be any mirror copy until the lost disk is accessible from oracle user and it is onlined using alter diskgroup online/new disk is added to diskgroup

chown oracle.dba /dev/raw/raw1
SQL> alter diskgroup TESTDG online disk TESTDG_0000;
Diskgroup altered.

SQL> select group_number,operation,state from v$asm_operation;
GROUP_NUMBER OPERA STAT POWER
--------     ----- ---  ----
1 ONLIN RUN 1

ASM fast resync keeps track of pending changes to extents on an OFFLINE disk during an outage. The extents are resynced when the disk is brought back online or replaced.

SQL> select path,header_status,mount_status from v$asm_disk where group_number=1;
PATH HEADER_STATU MOUNT_S
------------      ------
/dev/raw/raw2 MEMBER CACHED
/dev/raw/raw1 MEMBER CACHED

ASM Preferred Mirror Read

Your ASM configuration may involve remote mirroring to disks that are a fair distance away. When some of your disk mirrors are far away then those disks may not be the best set of disks for a given instance to read from. For example, you might have a Real Application Cluster database with local and remote mirrored disks. In this case, you want to have the RAC instances primarily read from the local disks to ensure the best performance.
ASM preferred mirror read is designed to indicate to Oracle which disk fail-group is the preferred read disk group.

ASM preferred mirror read is only available if you are using RAC. Also preferred mirror read is generally used only with clustered ASM instances, but this is not a requirement.
To take advantage of ASM preferred mirror read, you should configure each disk failure group with specific geographically located set of disks. Use the Oracle 11g parameter, asm_preferred_read_failure_groups, to configure a database instance with a list of preferred disk failure group names to use when that instance accesses ASM disks.
The format of the values of the asm_preferred_read_failure_groups parameter is diskgroup name.failure group name where diskgroup name is the name of the disk group that the failure group belongs to and failure group name is the preferred failure groups name.

In the event ASM cannot read from the preferred disk failure group, then the non-preferred failure groups will be read. To determine if a given disk file group is a preferred disk group you can use the PREFERRED_READ column of the V$ASM_DISK view.

Example of failure groups

Creating diskgroups for high and normal redundancy

CREATE DISKGROUP DATA NORMAL REDUNDANCY
FAILGROUP DATA_FAILURE_group_1 DISK
'/dev/5855d' , '/dev/6476d' ,
FAILGROUP DATA_FAILURE_group_2 DISK
'/dev/5853d' , '/dev/5854d' ;

For two-way mirroring, we would expect a diskgroup to contain two failure groups, so individual files are written to two locations.

CREATE DISKGROUP DATA HIGH REDUNDANCY
FAILGROUP DATA_FAILURE_group_1 DISK
'/dev/5851d', '/dev/5852d',
FAILGROUP DATA_FAILURE_group_2 DISK
'/dev/5853d', '/dev/5854d',
FAILGROUP DATA_FAILURE_group_3 DISK
'/dev/5855d', '/dev/5856d';

For three-way mirroring we would expect a diskgroup to contain three failure groups, so individual files are written to three locations.

Cluster Synchronization Services – CSS

1) CSS is important for ASM to operate
2)CSS maintains synchronization between the ASM and database instances. CSS, which is a component of Oracle’s Cluster Ready Services (CRS), is automatically installed on every node that runs Oracle Database 10g ASM and starts up automatically on server boot-up. In RAC 10g environments, the full Oracle Cluster-ware (CRS) is installed on every RAC node.
3) Since CSS provides cluster management and node monitor management, it inherently monitors ASM and
its shared storage components (disks and diskgroups). Upon startup, ASM will register itself and all
diskgroups it has mounted, with CSS. This allows CSS across all RAC nodes to keep diskgroup metadata
in-sync. Any new diskgroups that are created are also dynamically registered and broad-casted to other
nodes in the cluster.
4) As with the database, inter-node communication is used to synchronize activities in ASM instances. CSS is
used to heartbeat the health of the ASM instances. ASM inter-node messages are initiated by structural
changes that require synchronization; e.g. adding a disk. Thus, ASM uses the same integrated lock
management infrastructure that is used by the database for efficient synchronization.

When the ASM Oracle Home is changed in Single Instance ASM,then we need to reconfigure the CSS. The command below new to be executed from New Home

$ORACLE_HOME/bin/localconfig reset

Related Articles

Oracle ASM (Automatic Storage Management ) Introduction and How it works

ASM Initialization Parameters: ASM_DISKSTRING,ASM_DISKGROUPS

How Oracle ASM Rebalance works

Oracle ASM Diskgroups : Create and Alter diskgroup

ASM Failure Groups and Redundancy

What are ASM Failure Groups

ASM Fast Disk Resync

ASM Preferred Mirror Read

Related Posts

Leave a Comment Cancel Reply