1. go to the systemshell
set diag
systemshell -node cl1-01
2. unmount mroot
cd /etc
./netapp_mroot_unmount
logout
3. run cluster show a couple of times and see that health is false
cluster show
4. run cluster ring show to see that M-host is offline
cluster ring show
Node UnitName Epoch DB Epoch DB Trnxs Master Online
--------- -------- -------- -------- -------- --------- ---------
cl1-01 mgmt 6 6 699 cl1-01 master
cl1-01 vldb 7 7 84 cl1-01 master
cl1-01 vifmgr 9 9 20 cl1-01 master
cl1-01 bcomd 7 7 22 cl1-01 master
cl1-02 mgmt 0 6 692 - offline
cl1-02 vldb 7 7 84 cl1-01 secondary
cl1-02 vifmgr 9 9 20 cl1-01 secondary
cl1-02 bcomd 7 7 22 cl1-01 secondary
5. try to create a volume and see that the status of the aggregate
cannot be determined if you pick the aggregate from the broken M-host.
6. now vldb will also be offline.
5. remount mroot by starting mgwd from the systemshell
set diag
systemshell -node cl1-01
/sbin/mgwd -z &
7. when you run cluster ring show it should show vldb offline
cl1::*> cluster ring show
Node UnitName Epoch DB Epoch DB Trnxs Master Online
--------- -------- -------- -------- -------- --------- ---------
cl1-01 mgmt 6 6 738 cl1-01 master
cl1-01 vldb 7 7 87 cl1-01 master
cl1-01 vifmgr 9 9 24 cl1-01 master
cl1-01 bcomd 7 7 22 cl1-01 master
cl1-02 mgmt 6 6 738 cl1-01 secondary
cl1-02 vldb 0 7 84 - offline
cl1-02 vifmgr 0 9 20 - offline
cl1-02 bcomd 7 7 22 cl1-01 secondary
Watch vifmgr has gone bad as well.
8. start vldb by running spmctl -s -h vldb
or run /sbin/vldb
in this case, do the same for vifmgr.
(this will open the databases again and the cluster will be healthy)