I am a seasonal IT professional with a background on VMware, Storage, Backup, Unix, and Project liaison experience. I have held positions working on technologies like Netapp, EMC, IBM, Cohesity storage and Backup supporting SAN and NAS Environment. I have held roles of IT administrator, engineer, team lead and project liaison. This blog is for Storage and Backup Professionals, and content are derived from vendor as well as my own experience.
............................................................................................................................................................................
I recently did cluster unjoin and removed nodes from multi-node cluster running CDOT 9.3Px without any service disruption and with seamless activities.
............................................................................................................................................................................
I recently did cluster unjoin and removed nodes from multi-node cluster running CDOT 9.3Px without any service disruption and with seamless activities.
##Visibility to system is the KEY. Log into SP or Console of nodes all the time during this activity. Helps if you run into some unforeseen or unpredicted situations.
Prerequisites:
- Disable Storage failover (cf status) between HA nodes.
- Migrate data lifs home and home port from nodes in question over to other healthy nodes/ports by net interface modify.
- Remove ports of nodes from broadcast domain / failover-groups leaving only ones that would be present after nodes removal.
- Delete intercluster lifs, and remove Intercluster ports from corresponding broadcast domain.
Actual Steps:
Step 1.
Log into adv mode in cluster by:
cluster: set advanced
Warning: These advanced commands are potentially dangerous; use them only when directed to do so by NetApp personnel.
Do you want to continue? {y|n}: (Press y)
Step 2.
Confirm that the nodes in questions is serving as master node. If it is working as master node, make it ineligible. (Caution, Once you have it made it ineligible, you would need to reboot the node to make it eligible again, if you need to.) This can be done by checking cluster ring status from adv mode.
cluster*::>cluster ring show
Verify that there is still no lif or data on nodes in question:
net int show -home-node node1
net int show -home-node node2
cluster*::> cluster modify -node node2 -eligibility false
Double confirm by running below to ensure there is no lif dependency except cluster port and/or node management port/lif. It would complain if there is still something left on nodes being worked.
cluster*::>cluster ring show
(This would list node2 being offline as it has been deemed ineligible. Other node will display as master)
(You run the same for both nodes in HA.)
Node UnitName Epoch DB Epoch DB Trnxs Master Online
--------- -------- ------- -------- -------- --------- ---------
node2 mgmt 0 25 736267 offline
node2 vldb 0 23 1913 offline
node2 vifmgr 0 25 11031 offline
node2 bcomd 0 26 4 offline
node2 crs 0 23 1 offline
Step 3. Verify:
cluster::*>storage failover show
cluster::*>cluster ring show
Step 4.
Now run actual unjoin action.
cluster::*> cluster unjoin -node node2
(This will provide some warnings, but as long as checklist of pre-req and other is completed, proceed)
(At this time, you can wipe disk data by pressing step 4 by doing “Ctrl + C” during reboot. You will have visibility to console, if you have been accessing the node via console or service processor. It is strongly advised, that nodes in question be accessed using sp in a different session than actual session where cluster unjoin session is being executed.)
Step 5.
Halt the nodes in question by logging into individual nodes, if applicable. You can uninstall the Hardware after completion of disk initialization (if chosen). Otherwise, its safe to remove cables, uninstall hardware.
You are welcome :)
Helpful link: https://www.youtube.com/watch?v=D9TyL5hygMo
No comments:
Post a Comment