cdot mhost troubleshoot

1. go to the systemshell
set diag
systemshell -node cl1-01

2. unmount mroot
cd /etc
./netapp_mroot_unmount
logout

3. run cluster show a couple of times and see that health is false
cluster show

4. run cluster ring show to see that M-host is offline
cluster ring show
Node UnitName Epoch DB Epoch DB Trnxs Master Online
--------- -------- -------- -------- -------- --------- ---------
cl1-01 mgmt 6 6 699 cl1-01 master
cl1-01 vldb 7 7 84 cl1-01 master
cl1-01 vifmgr 9 9 20 cl1-01 master
cl1-01 bcomd 7 7 22 cl1-01 master
cl1-02 mgmt 0 6 692 - offline
cl1-02 vldb 7 7 84 cl1-01 secondary
cl1-02 vifmgr 9 9 20 cl1-01 secondary
cl1-02 bcomd 7 7 22 cl1-01 secondary

5. try to create a volume and see that the status of the aggregate
cannot be determined if you pick the aggregate from the broken M-host.

6. now vldb will also be offline.

5. remount mroot by starting mgwd from the systemshell
set diag
systemshell -node cl1-01
/sbin/mgwd -z &

7. when you run cluster ring show it should show vldb offline
cl1::*> cluster ring show
Node UnitName Epoch DB Epoch DB Trnxs Master Online
--------- -------- -------- -------- -------- --------- ---------
cl1-01 mgmt 6 6 738 cl1-01 master
cl1-01 vldb 7 7 87 cl1-01 master
cl1-01 vifmgr 9 9 24 cl1-01 master
cl1-01 bcomd 7 7 22 cl1-01 master
cl1-02 mgmt 6 6 738 cl1-01 secondary
cl1-02 vldb 0 7 84 - offline
cl1-02 vifmgr 0 9 20 - offline
cl1-02 bcomd 7 7 22 cl1-01 secondary

Watch vifmgr has gone bad as well.

8. start vldb by running spmctl -s -h vldb
or run /sbin/vldb
in this case, do the same for vifmgr.

(this will open the databases again and the cluster will be healthy)

This entry was posted in Uncategorized. Bookmark the permalink.

Comments are closed.