Lets Talk SAN, NAS, Cloud, & Backup: 2020

Thursday, December 17, 2020

Cohesity: In Azure, how to Destroy Azure Cluster nodes and repurpose them to add it to running cluster.

Learn Storage, Backup, Virtualization, and Cloud. AWS, GCP & AZURE.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

1. Stop the cluster.

2. Destroy the cluster.

3. Wipe config on freed Nodes.

4. Add Nodes to Cluster.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Code Running 6.3.1g:

[cohesity@shcstypazbk003-000d3a317381-node-1 ~]$ iris_cli cluster stop

[cohesity@shcstypazbk003-000d3a317381-node-1 ~]$ iris_cli cluster status

[cohesity@shcstypazbk003-000d3a317381-node-1 ~]$ iris_cli cluster destroy id=<Cluster_ID>

[cohesity@shcstypazbk003-000d3a317381-node-1 ~]$ iris_cli cluster

[cohesity@ClusterName--node-1 ~]$ ps -ef iris_cli

After cluster Destroy, log into individual Node and run iris_cli node status.

[cohesity@ClusterName--node-1 ~]$iris_cli node status

NODE ID : 123456789107

NODE IPS : 10.10.9.100, fe80::20d:3aff:fe31:7c6f

NODE IN CLUSTER : false

CLUSTER ID : -1

CLUSTER INCARNATION ID : -1

SOFTWARE VERSION :

LAST UPGRADED TIME :

NODE UPTIME :

ACTIVE OPERATION :

MESSAGE : Node is not part of a cluster.

(If response says, node is not part of cluster, its green to go.)

(ButIf, if node says, its part of cluster, It might need to wipe out data and config manually with prepopulated script)

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

[support@azure-cohesity--node-1 bin]$ pwd

/home/cohesity/bin

[cohesity@azure-cohesity--node-1 bin]$ cd rescue/

[cohesity@azure-cohesity--node-1 rescue]$ ls

breakfix_nvme_ssd.sh clean_node.sh defsh.sh erase_disk.sh make_bootable_device.sh reset_linux_users.sh rollback_upgrade.sh

[cohesity@azure-cohesity--node-1 rescue]$

[cohesity@ClusterName--node-1 rescue]$ ./clean_node.sh

CLEAN NODE IS A DESTRUCTIVE OPERATION. DO YOU WANT TO PROCEED? (Y/N): y

Cleaning...

[cohesity@ClusterName--node-1 rescue]$ reboot.sh

RECEIVED REQUEST TO REBOOT NODE. DO YOU WANT TO PROCEED? (Y/N): y

Rebooting...

Connection to 10.249.8.135 closed by remote host.

By Now, Node will be free and not part of Cluster.

AT this step, you can move to Node Add.

1. Log into the Cluster where you want to join new nodes.

2. iris_cli

3. admin@127.0.0.1> cluster cloud-join node-ips=10.10.9.100,10.10.9.101,10.10.9.102

(These IPs are IPs from recently destroyed Nodes.)

Monitor Nodes add work in Siren and/or GUI.

Thursday, October 1, 2020

Cohesity- How to expand the cluster, and how to remove node from the cluster-- with examples.

Learn Storage, Backup, Virtualization, and Cloud. AWS, GCP & AZURE.

..........................................................................................................................................................................

Expand a Cluster

Perform the following steps before adding new nodes to the cluster.

1 Two methods are available:

Use the iris_cli.

a Use the iris_cli vlan add command to set up the non-native VLAN to be used for the node add workflow. Example:

iris_cli vlan add if-name=bond0 id=101 subnet-mask-bits=8

b Use the following command to set the non-native VLAN logical bond interface as primary. Replace vland_id with the ID of the VLAN you added.

iris_cli ip config interface-name=<bond0.vland_id> interface-role=primary

Alternatively, to configure the IP on a new node and access the node using the IP (not required if using Avahi to discover all nodes), use this command:

iris_cli ip config interface-name=<bond0.vlan_id> iface-ips=xx subnet-gateway=yy subnet-mask-bits=zz mtu=qq

2 Or use the configure_network.sh script.

a Use configure_network.sh option 10.

Location: /home/cohesity/bin/network/configure_network.sh

3 Restart the Nexus service:

sudo service nexus restart

4 Run ifconfig and ensure Avahi runs on the non-native VLAN bonded interface.

5 On any node in the existing cluster, start the node add workflow from the UI and provide cluster IPs from the configured non-native VLAN.

NOTE: If necessary, the user can configure cluster IPs and VIPs from the non-native VLAN and keep the IPMI in the native VLAN or some other subnet.

https://docs.cohesity.com/6_1_1/Web/UserGuide/index.htm#CLI/VLANTagging.htm%3FTocPath%3DCluster%2520Administration%7CNetworking%7C_____13

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Remove a Node from Cluster.

This is a clean way. Tricky way could be failing the node and let it reconstruct in the background— given you have right redundancy settings in place, which can be checked under Storage domain Configuration.

Log into Cluster/node

> iris_cli cluster status

(It lists Node ID with IP and Serial Numbers)

> iris_cli node rm -id=<serial number of node>

(It will prompt cluster username (admin) and Password, followed by message—

“Success: Node ID: <Serial Number> marked for removal successfully.”)

Note: There is no way to track removal process using cli. But if you were to logged in to Siren Page, and go to Scribe, it will show you KRemoveNode process and metadata/replica that node holds constantly decreasing. It indicates the Node is being removed. Scribe service track/manage metadata and metadata removal and data removal from owned disks from the node in question runs in parallel. However, metadata finishes quickly. Once Data gets reshuffled across other nodes, by logging into the node and running same commands as above will show a message— Node is not part of cluster, and/or password is reset to default admin password, not the one you have it changed for entire cluster.

You are Welcome :)

Friday, September 25, 2020

Cohesity- Network related troubleshooting during initial cluster build

Learn Storage, Backup, Virtualization, and Cloud. AWS, GCP & AZURE.

..........................................................................................................................................................................

Network Related Troubleshooting during Initial Cluster Build With Examples

1. To Begin with start from Looking at 10G interfaces.

Bond0 is used by Cohesity Nodes.

By default the 10G interfaces are included in bond0.

ens802f0/ens802f1 are 10G interfaces on C2xxx,4xxx, and 6xxx Series.

Mode 1 (active-backup policy)

Mode 4 (active-active policy)- LACP (Recommended)

2. Ensure what primary interface is used for running setting.

[cohesity@node ~]$ primary_interface_name.sh

bond0 (This should be listed in result)

[cohesity@node ~]$allssh.sh ‘ip a |grep bond’

(This will list what interfaces are member of bond0)

3. Ensure 10G ports are connected.

[cohesity@node ~]$ sudo ethtool ens802f0

Settings for ens802f0:

Supported ports: [ FIBRE ]

Supported link modes: 10000baseT/Full

Supported pause frame use: Symmetric

...

Speed: 10000Mb/s

Duplex: Full

Port: Direct Attach Copper

PHYAD: 0

Transceiver: internal

Auto-negotiation: off

...

Link detected: yes

[cohesity@node ~] sudo ethtool ens802f1

Settings for ens802f1:

Supported ports: [ FIBRE ]

Supported link modes: 10000baseT/Full

Supported pause frame use: Symmetric

...

Speed: 10000Mb/s

Duplex: Full

Port: Direct Attach Copper

PHYAD: 0

Transceiver: internal

Auto-negotiation: off

...

Link detected: yes

4. Ensure LLDP service is enabled on switch port.

[cohesity@node ~] sudo lldpctl ens802f0

-----------------------------------------------------

LLDP neighbors:

------------------------------------------------

Interface: ens802f0, via: LLDP, RID: 3, Time: 6 days, 02:38:24

(This should list Chassis/SerialNumber— details of connected Switch).

[cohesity@node ~] sudo lldpctl ens802f1

-----------------------------------------------------

LLDP neighbors:

-----------------------------------------------------

Interface: ens802f1, via: LLDP, RID: 4, Time: 6 days, 02:42:21

5. Ensure Bond Ports are up, active and with no issues.

[cohesity@node ~]$ cat /proc/net/bonding/bond0

Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation

Transmit Hash Policy: layer3+4 (1)

MII Status: up

MII Polling Interval (ms): 100

Up Delay (ms): 0

Down Delay (ms): 0

802.3ad info

LACP rate: slow

Min links: 0

Aggregator selection policy (ad_select): stable

…

Slave Interface: ens802f0

MII Status: up

Speed: 10000 Mbps

Duplex: full

Link Failure Count: 0

Permanent HW addr: a4:bf:01:2d:7f:56

Aggregator ID: 3

Actor Churn State: none

Partner Churn State: none

Actor Churned Count: 0

Partner Churned Count: 0

6. When used Native vlan, Ensure Port Config is correct on Switch running config.

nexus-1

interface port-channel 101

description vpc 101 cohesity-node1-ens802f0

switchport mode trunk

switchport trunk allowed vlan 50

switchport trunk native vlan 50

spanning-tree port type edge trunk

vpc 101

interface Ethernet1/5

description vpc 101 cohesity-node1-ens802f0

switchport mode trunk

switchport trunk allowed vlan 50

switchport trunk native vlan 50

channel-group 101 mode active

nexus-2

interface port-channel 101

description vpc 101 cohesity-node1-ens802f1

switchport mode trunk

switchport trunk allowed vlan 50

switchport trunk native vlan 50

spanning-tree port type edge trunk

vpc 101

interface Ethernet1/5

description vpc 101 cohesity-node1-ens802f1

switchport mode trunk

switchport trunk allowed vlan 50

switchport trunk native vlan 50

channel-group 101 mode active

7. Map out Mac address with IP of Node using arp tool. If duplicate IP address is used, it helps identify that.

[cohesity@node-1 ~]$ hostips

10.19.65.50 10.19.65.51 10.19.65.52 10.19.65.53

[cohesity@node-1 ~]$ ping 10.19.65.50

PING 10.19.65.50 (10.19.65.50) 56(84) bytes of data.

64 bytes from 10.19.65.50: icmp_seq=1 ttl=64 time=0.112 ms

64 bytes from 10.19.65.50: icmp_seq=2 ttl=64 time=0.071 ms

[cohesity@node-1 ~]$ arp -na |grep 10.19.65.50

? (10.19.65.50) at 00:1e:67:9c:49:90 [ether] on bond0.101

[cohesity@node-1 ~]$ ifconfig bond0.101

bond0.101: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500

inet 10.19.65.50 netmask 255.255.254.0 broadcast 10.5.65.255

inet6 fe80::21e:67ff:fe9c:4990 prefixlen 64 scopeid 0x20<link>

ether 00:1e:67:9c:49:90 txqueuelen 1000 (Ethernet)

8. Check ethtool errors. Check Sfp, cable, or connection if there is error- especially crc/errors.

[cohesity@node ~]$ sudo ethtool -S ens802f0 |egrep "dropped|error"

rx_errors: 0

tx_errors: 0

rx_dropped: 0

tx_dropped: 0

rx_over_errors: 0

rx_crc_errors: 0

rx_frame_errors: 0

rx_fifo_errors: 0

rx_missed_errors: 24758

tx_aborted_errors: 0

tx_carrier_errors: 0

tx_fifo_errors: 0

tx_heartbeat_errors: 0

rx_long_length_errors: 0

rx_short_length_errors: 0

rx_csum_offload_errors: 864

rx_fcoe_dropped: 0

9. New cluster install with NON-Native VLAN trunk port configuration.

Example shows customer wanting to use vlan 101. It needs to configure vlan 101 to be used as primary interface.

[cohesity@node-1 ~]$ primary_interface_set.sh bond0.101

[cohesity@node-1 ~]$ primary_interface_name.sh

bond0.101

When ./configure_network.sh is used, select option 10 that allows you to customize vlans, and IP info.

10. After Cluster is created, validate vlans for VIPS

[cohesity@optimus-64-11 ~]$ iris_cli vlan ls

11. Once all of these are verified, and checked out, it is possible that Cohesity’s NEXUS service, which is responsible for network, might need a restart too.

for i in `seq 180 184`; do ssh 10.123.23.$i date; done

for i in `seq 180 184`; do ssh 10.123.23.$i sudo systemctl stop nexus; done

for i in `seq 180 184`; do ssh 10.123.23.$i sudo systemctl restart nexus; done

These are the areas to be looked at that may potentially indicate issue in their logs.

[cohesity@node ~]$less nexus_exec.FATAL

[cohesity@node ~]$less nexus_exec.INFO

[cohesity@node ~]$ less logs/nexus_proxy_exec.INFO (Displays NEXUS Service issues).

[cohesity@node ~]$ ls -ltr logs/*FATAL*

[cohesity@node ~]$cat /etc/sysconfig/network-scripts/ifcfg-bond0

You are Welcome :)

Cohesity: How to create a new Cohesity Cluster--with Examples

Learn Storage, Backup, Virtualization, and Cloud. AWS, GCP & AZURE.

..........................................................................................................................................................................

This is the method used to create a new cluster using IPMI.

This one applies to 6XX models. (If you were to use it for C25xx or 4xxx model, you set value of “3”)

C6xxx uses username: admin, and Password: administrator for IPMI.

Console into the very first Node.

It will take you to black Screen.

[cohesity@node ~]$sh (Type sh and enter)

UserName:cohesity

Password: Cohe$1ty

(This will take you to Cluster shell) You run these below Commands

sudo ipmitool lan print 1

sudo ipmitool lan set 1 ipsrc static

sudo ipmitool lan set 1 ipaddr 10.123.123.20

sudo ipmitool lan set 1 defgw ipaddr 10.123.123.1

sudo ipmitool lan set 1 access on

Now that you have enabled IPMI, you can Use IP address on URL and access KVM remotely.

Once Logged in to KVM.

[cohesity@node ~]$cd bin/network

$ ls

(This will list available Scripts)

Select configure_network.sh Script.

[cohesity@node ~]$./configure_network.sh

(It will list 12 options. Select Option 7 to configure LACP bonding across two 10G ports on Cohesity side. You must have 10G LACP configured the same way on Switch side too).

(LACP config on Switch Side should look like this:

SwitchA

interface Ethernet1/5

description cohesity-node1-ens802f0

switchport mode trunk

switchport trunk allowed vlan 50

switchport trunk native vlan 50

channel-group 101 mode active

mtu 9216

SwitchB:

interface Ethernet1/5

description cohesity-node1-ens802f1

switchport mode trunk

switchport trunk allowed vlan 50

switchport trunk native vlan 50

channel-group 101 mode active

mtu 9216

In an event BMC/IPMI Port becomes inresponsive, Log Into IPMI from another node and run this to reboot.

ipmitool -I lanplus -U admin -P administrator -H 10.123.123.20 mc reset cold

(If a IPMI interface is frozon, then you can use this to reset the IPMI using IPMI from a different node).

Part of ./configure_network.sh uses Node IP. You can ssh into that NODE IP (E.G.10.123.123.40) now.

Once ssh into NODE IP,

[cohesity@node ~]$cat /proc/net/bonding/bond0 (This gives info on what kind of bond config is configured)

It Shows something like this.

[cohesity@node ~]$ cat /proc/net/bonding/bond0

Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation

Transmit Hash Policy: layer3+4 (1)

MII Status: up

MII Polling Interval (ms): 100

Up Delay (ms): 0

Down Delay (ms): 0

802.3ad info

LACP rate: slow

Min links: 0

Aggregator selection policy (ad_select): stable

…

Slave Interface: ens802f0

MII Status: up

Speed: 10000 Mbps

Duplex: full

Link Failure Count: 0

Permanent HW addr: a4:bf:01:2d:7f:56

Aggregator ID: 3

Actor Churn State: none

Partner Churn State: none

Actor Churned Count: 0

Partner Churned Count: 0

[cohesity@node ~]$avahi-browse -tarp

(This goes out discovering all the Nodes connected in the cluster using IPV6 internal processes). If this doesn’t see any nodes, it needs to be looked at.

At this Stage, you can use Node IP in URL and should be able to discover all the Nodes in discovery to be able to start Creating Cohesity Cluster.

This is Interactive session, you get to assign NODE IP, VIPS, SMTP, DNS, NTP Servers.

At the end of interactive session, it gives a message notifying you that Cluster has been created, and You can use the provided URL using admin user.

https://10.123.123.40

Username: admin

Password: admin

Note: If you want to update gflags, and other things, you may at this point in time.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Validation Steps for Cluster Settings:

1. Now that Cluster Is Up, you can run this at any Node. MII Should show UP on all the nodes you have as part of the cluster.

[cohesity@node ~]$ allssh.sh 'cat /proc/net/bonding/bond0' | grep MII

[01;31m[KMII[m[K Status: up

[01;31m[KMII[m[K Polling Interval (ms): 100

[01;31m[KMII[m[K Status: up

[01;31m[KMII[m[K Polling Interval (ms): 100

[01;31m[KMII[m[K Status: up

[01;31m[KMII[m[K Polling Interval (ms): 100

[01;31m[KMII[m[K Status: up

[01;31m[KMII[m[K Polling Interval (ms): 100

[01;31m[KMII[m[K Status: up

[01;31m[KMII[m[K Polling Interval (ms): 100

[01;31m[KMII[m[K Status: up

[cohesity@node ~]$ allssh.sh 'cat /proc/net/bonding/bond0' | grep Mode

(This should list link aggregation mode. Mode 4 i.e. LACP is dynamic link aggregation mode)

Bonding [01;31m[KMode[m[K: IEEE 802.3ad Dynamic link aggregation

[cohesity@node ~]$ iris_cli node status
[cohesity@node ~]$ iris_cli cluster status
[cohesity@node ~]$ allssh.sh hostips

(This will list all nodes iPs in the cluster)

[cohesity@node ~]$ less logs/iris_proxy_exec.FATAL (lists any fatals related to iris service)

You are Welcome :)

Thursday, February 13, 2020

Netapp: Excessive DNS queries by Netapp Harvest Server against monitored Netapp Cluster

Learn Storage, Backup, Virtualization, and Cloud. AWS, GCP & AZURE.

..........................................................................................................................................................................

Synopsis: Netapp nabox, if used ova image for 2.5 version considering it as a GA release is good in terms of use case. However, Server generates thousands of DNS queries against cluster in less than 24 hours.

I have ran into a problem where DNS server got choked with NABOX server that generated about 70K DNS requests for both 27A and 27AAAA request in less than 24 hours.

Resolution:

1. Use IP address of Cluster (i.e. use cluster management IP) as source of cluster to be monitored.

2. Use Beta Version of newer release running 2.6 that has the function that bundles dnsmasq as a local resolver.

I went with Option 2 as it was seamless and worth running hostname rather than using IP address and such.

Ref:

https://nabox.org/

You are Welcome :)