Oracle uses a simple mechanism of a heartbeat to detect instance failure of a node in a Oracle RAC cluster. The cluster manager software detects if it cannot receive the heartbeat of the other instances and if it does not the recovery process begins. From what I can find, this heartbeat is provided by the udlm package on Sun and the MC/ServiceGuard on HP. Fundamentally, as all instances in a cluster can read all threads of redo another surviving member performs recovery on behalf of the failed instance. This comprises two main steps. Firstly, the Global Resource Directory must be remastered so that all resources that were mastered on the failed node are remastered to the surviving nodes. This remastering is performed by lmon. Secondly, the redo thread of the failed node is read so that changes to blocks on the failed node that were not written to disk (dirty blocks) can be applied to the datafiles. The RAC Concepts Guide outlines the following steps that are followed after node failure has occurred:
(1) During the first phase of recovery, which is the GES reconfiguration, Oracle first reconfigure the GES enqueues. Then Oracle reconfigure the GCS resources. During this time, all GCS resource requests and write requests are temporarily suspended. However, processes and transactions can continue to modify data blocks as long as these processes and transactions have already acquired the necessary enqueues.
(2) After the reconfiguration of enqueues that the GES controlled, a log read and the remastering of GCS resources occur in parallel. At the end of this step the block resources that need to be recovered have been identified. Only the redo of the dead instances must be read.
(3) Buffer space for recovery is allocated and the resources that were identified in the previous reading of the log are claimed as recovery resources. Then, assuming that there are PIs of blocks to be recovered in other caches in the cluster database, resource buffers are requested from other instances. The resource buffers are the starting point of recovery for a particular block.
(4) All resources and enqueues required for subsequent processing have been acquired and the Global Resource Directory is now unfrozen. Any data blocks that are not in recovery can now be accessed. Note that the system is already partially available.
(5) The cache layer recovers and writes each block identified in step 2, releasing the recovery resources immediately after block recovery so that more blocks become available as cache recovery proceeds.
(6) After all blocks have been recovered and the recovery resources have been released, the system is again fully available. Recovered blocks are available after recovery completes.
If all instances in a cluster fail then crash recovery is performed by the first instance in the cluster to start, just as in single instance.