clustermode eligibility and epsilon

In clustermode a node can be part of a cluster and is only regarded healthy if
it is eligible. being eligible means that it is part of the replication ring for
clusterinformation in /mroot/etc/cluster_config/rdb
Each rdb-unit has a ringmaster. This ringmaster can be any node in the cluster.
Each rdb-unit can be managed by a different ringmaster. On node in the cluster
has a tiebreaker functionality for all rdb-units: epsilon.

In a two node cluster switch the epsilon to false on both nodes!

to view eligibility, from admin mode you can run
cl1::> cluster show
Node Health Eligibility
--------------------- ------- ------------
cl1-01 true true
cl1-03 true true
2 entries were displayed.

you can turn off eligibility for a particular node, but
only if that node also contains epsilon. (epsilon is the
functionality that gives the majority vote as a tiebreaker
in a two node split brain scenario.)

view epsilon in advanced mode:

cl1::> set advanced

Warning: These advanced commands are potentially dangerous; use them only when
directed to do so by NetApp personnel.
Do you want to continue? {y|n}: y

cl1::*> cluster show
Node Health Eligibility Epsilon
-------------------- ------- ------------ ------------
cl1-01 true true false
cl1-03 true true true
2 entries were displayed.

To switch off eligibility or change epsilon:
In a two node cluster, both nodes must be eligible.
Also when switching off eligibility it should be
run from another node in the cluster.
Also a node that is not eligible cannot hold epsilon.
In a running cluster you can change the epsilon node.
first set the existing node to false
then set epsilon to true for the other node.

cl1::*> cluster modify -epsilon false -node cl1-03
cl1::*> cluster modify -epsilon true -node cl1-01
cl1::*> cluster show
Node Health Eligibility Epsilon
-------------------- ------- ------------ ------------
cl1-01 true true true
cl1-03 true true false
2 entries were displayed.

scenario:
cl1-01 eligibility is true epsilon is false
cl1-03 eligibility is true epsilon is true
if you lose cl1-03 you lose the node that holds epsilon.
this leave cl1-01 in an unhealthy state, and will be
regarded offline in the cluster. And eventhough the
node is still running, its volumes are no longer
available.

*needs more work*

This entry was posted in netapp. Bookmark the permalink.

Comments are closed.