Monday, June 24, 2013

Oracle RAC : one node went down


I am new to Oracle RAC setup .Can any one tell me why Oracle RAC initiating a reboot of other node while the heart beat link is not responding more than 30 seconds 
Loss of heartbeat --> split-brain --> node eviction 
you can search above terms for more information. 
http://dba-oracle.com/t_oracle_rac_node_eviction_tips.htm 
http://www.oracle-training.cc/grid_139.htm 
Hope this helps, 
Maybe an example would help. 

Create a TWO node cluster, then kill the heartbeat for some reason.
This creates a split brain status, but why that's bad for data
consistency needs an explanation.

A cluster works because its two or more computers COOPERATING
in working on the database or problem presented. Without a heart
beat, they can't make sure that one node is finished working on a
data record before the next node starts.

Let's say you have a database like this.
"FirstName LastName Balance Adjustment"
Bob Cat 100 0

Node1 updates like this.
Bob Cat 120 +20

Node2 updates like this.
Bob Cat 140 +40

Split brain creates a situation where the database is like:
Bob Cat 140 0

Because Node 2 was NOT aware that Node1 was updating.
That's called LOST UPDATE, and it's a big NO NO in the
DBA world.
It is a normal behavior of clustering system to garantee the data 
consistency. When hearbeat link is broken, cluster manager daemon detects 
that cluster is divided into two or even more parts. In HACMP terminology it 
is called cluster partitioning or split brain. What you see in RAC is a 
similar concept to prevent data access from nodes out of cluster manager 
control. 



0 comments:

Post a Comment

 
Design by BABU | Dedicated to grandfather | welcome to BABU-UNIX-FORUM