Friday, September 25, 2020

Cohesity- Network related troubleshooting during initial cluster build


Learn Storage, Backup, Virtualization,  and Cloud. AWS, GCP & AZURE.
 
 ..........................................................................................................................................................................
Network Related Troubleshooting during Initial Cluster Build With Examples

1. To Begin with start from Looking at 10G interfaces.
Bond0 is used by Cohesity Nodes.
By default the 10G interfaces are included in bond0.
ens802f0/ens802f1 are 10G interfaces on C2xxx,4xxx, and 6xxx Series.
Mode 1 (active-backup policy)
Mode 4 (active-active policy)- LACP (Recommended)

2. Ensure what primary interface is used for running setting.
[cohesity@node ~]$ primary_interface_name.sh
bond0 (This should be listed in result)
[cohesity@node ~]$allssh.sh ‘ip a |grep bond’
(This will list what interfaces are member of bond0)

3. Ensure 10G ports are connected.
[cohesity@node ~]$ sudo ethtool ens802f0
Settings for ens802f0:
  Supported ports: [ FIBRE ]
  Supported link modes:   10000baseT/Full
  Supported pause frame use: Symmetric
       ...
  Speed: 10000Mb/s
  Duplex: Full
  Port: Direct Attach Copper
  PHYAD: 0
  Transceiver: internal
  Auto-negotiation: off
       ...
  Link detected: yes
[cohesity@node ~] sudo ethtool ens802f1
Settings for ens802f1:
  Supported ports: [ FIBRE ]
  Supported link modes:   10000baseT/Full
  Supported pause frame use: Symmetric
       ...
  Speed: 10000Mb/s
  Duplex: Full
  Port: Direct Attach Copper
  PHYAD: 0
  Transceiver: internal
  Auto-negotiation: off
       ...
  Link detected: yes

4. Ensure LLDP service is enabled on switch port.
[cohesity@node ~] sudo lldpctl ens802f0
-----------------------------------------------------
LLDP neighbors:
------------------------------------------------
Interface:    ens802f0, via: LLDP, RID: 3, Time: 6 days, 02:38:24
(This should list Chassis/SerialNumber— details of connected Switch).
[cohesity@node ~] sudo lldpctl ens802f1
-----------------------------------------------------
LLDP neighbors:
-----------------------------------------------------
Interface:    ens802f1, via: LLDP, RID: 4, Time: 6 days, 02:42:21

5. Ensure Bond Ports are up, active and with no issues.
[cohesity@node ~]$ cat /proc/net/bonding/bond0
[cohesity@node ~]$ cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
Slave Interface: ens802f0
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: a4:bf:01:2d:7f:56
Aggregator ID: 3
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0

6. When used Native vlan, Ensure Port Config is correct on Switch running config.
nexus-1
interface port-channel 101
description vpc 101 cohesity-node1-ens802f0
switchport mode trunk
switchport trunk allowed vlan 50
switchport trunk native vlan 50
spanning-tree port type edge trunk
vpc 101
interface Ethernet1/5
description vpc 101 cohesity-node1-ens802f0
switchport mode trunk
switchport trunk allowed vlan 50
switchport trunk native vlan 50
channel-group 101 mode active
nexus-2
interface port-channel 101
description vpc 101 cohesity-node1-ens802f1
switchport mode trunk
switchport trunk allowed vlan 50
switchport trunk native vlan 50
spanning-tree port type edge trunk
vpc 101
interface Ethernet1/5
description vpc 101 cohesity-node1-ens802f1
switchport mode trunk
switchport trunk allowed vlan 50
switchport trunk native vlan 50
channel-group 101 mode active

7. Map out Mac address with IP of Node using arp tool. If duplicate IP address is used, it helps identify that.
[cohesity@node-1 ~]$ hostips
10.19.65.50 10.19.65.51 10.19.65.52 10.19.65.53
[cohesity@node-1 ~]$ ping 10.19.65.50
PING 10.19.65.50 (10.19.65.50) 56(84) bytes of data.
64 bytes from 10.19.65.50: icmp_seq=1 ttl=64 time=0.112 ms
64 bytes from 10.19.65.50: icmp_seq=2 ttl=64 time=0.071 ms
^C
[cohesity@node-1 ~]$ arp -na |grep 10.19.65.50
? (10.19.65.50) at 00:1e:67:9c:49:90 [ether] on bond0.101
[cohesity@node-1 ~]$ ifconfig bond0.101
bond0.101: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.19.65.50  netmask 255.255.254.0  broadcast 10.5.65.255
        inet6 fe80::21e:67ff:fe9c:4990  prefixlen 64  scopeid 0x20<link>
        ether 00:1e:67:9c:49:90  txqueuelen 1000  (Ethernet)

8. Check ethtool errors. Check Sfp, cable, or connection if there is error- especially crc/errors.
[cohesity@node ~]$ sudo ethtool -S ens802f0 |egrep "dropped|error"
    rx_errors: 0
    tx_errors: 0
    rx_dropped: 0
    tx_dropped: 0
    rx_over_errors: 0
    rx_crc_errors: 0
    rx_frame_errors: 0
    rx_fifo_errors: 0
    rx_missed_errors: 24758
    tx_aborted_errors: 0
    tx_carrier_errors: 0
    tx_fifo_errors: 0
    tx_heartbeat_errors: 0
    rx_long_length_errors: 0
    rx_short_length_errors: 0
    rx_csum_offload_errors: 864
    rx_fcoe_dropped: 0

9. New cluster install with NON-Native VLAN trunk port configuration.
Example shows customer wanting to use vlan 101. It needs to configure vlan 101 to be used as primary interface.
[cohesity@node-1 ~]$ primary_interface_set.sh bond0.101
[cohesity@node-1 ~]$ primary_interface_name.sh
bond0.101
When ./configure_network.sh is used, select option 10 that allows you to customize vlans, and IP info.

10. After Cluster is created, validate vlans for VIPS
[cohesity@optimus-64-11 ~]$ iris_cli vlan ls

11. Once all of these are verified, and checked out, it is possible that Cohesity’s NEXUS service, which is responsible for network, might need a restart too.
for i in `seq 180 184`; do ssh 10.123.23.$i date; done
for i in `seq 180 184`; do ssh 10.123.23.$i sudo systemctl stop nexus; done
for i in `seq 180 184`; do ssh 10.123.23.$i sudo systemctl restart nexus; done

  1. These are the areas to be looked at that may potentially indicate issue in their logs.
[cohesity@node ~]$less nexus_exec.FATAL
[cohesity@node ~]$less nexus_exec.INFO
[cohesity@node ~]$ less logs/nexus_proxy_exec.INFO (Displays NEXUS Service issues).
[cohesity@node ~]$ ls -ltr  logs/*FATAL*

[cohesity@node ~]$cat /etc/sysconfig/network-scripts/ifcfg-bond0 


You are Welcome :)

Cohesity: How to create a new Cohesity Cluster--with Examples

Learn Storage, Backup, Virtualization,  and Cloud. AWS, GCP & AZURE.
.......................................................................................................................................................................... 

This is the method used to create a new cluster using IPMI.

This one applies to 6XX models. (If you were to use it for C25xx or 4xxx model, you set value of “3”)
C6xxx uses username: admin, and Password: administrator for IPMI.
  1. Console into the very first Node.
It will take you to black Screen.

[cohesity@node ~]$sh (Type sh and enter)
UserName:cohesity
Password: Cohe$1ty

(This will take you to Cluster shell) You run these below Commands
sudo ipmitool lan print 1
sudo ipmitool lan set 1 ipsrc static
sudo ipmitool lan set 1 ipaddr 10.123.123.20
sudo ipmitool lan set 1 defgw ipaddr 10.123.123.1
sudo ipmitool lan set 1 access on
  1. Now that you have enabled IPMI, you can Use IP address on URL and access KVM remotely.
Once Logged in to KVM.
[cohesity@node ~]$cd bin/network
$ ls
(This will list available Scripts)
  1. Select configure_network.sh Script.
[cohesity@node ~]$./configure_network.sh
(It will list 12 options. Select Option 7 to configure LACP bonding across two 10G ports on Cohesity side. You must have 10G LACP configured the same way on Switch side too).
(LACP config on Switch Side should look like this:
SwitchA
interface Ethernet1/5
description  cohesity-node1-ens802f0
switchport mode trunk
switchport trunk allowed vlan 50
switchport trunk native vlan 50
channel-group 101 mode active
mtu 9216
SwitchB:
interface Ethernet1/5
description cohesity-node1-ens802f1
switchport mode trunk
switchport trunk allowed vlan 50
switchport trunk native vlan 50
channel-group 101 mode active
mtu 9216
  1. In an event BMC/IPMI Port becomes inresponsive, Log Into IPMI from another node and run this to reboot.
ipmitool -I lanplus -U admin -P administrator -H  10.123.123.20 mc reset cold
(If a IPMI interface is frozon, then you can use this to reset the IPMI using IPMI from a different node).
  1. Part of ./configure_network.sh uses Node IP. You can ssh into that NODE IP (E.G.10.123.123.40)  now.
  1. Once ssh into NODE IP,
[cohesity@node ~]$cat /proc/net/bonding/bond0 (This gives info on what kind of bond config is configured)
It Shows something like this.
[cohesity@node ~]$ cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
Slave Interface: ens802f0
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: a4:bf:01:2d:7f:56
Aggregator ID: 3
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
  1. [cohesity@node ~]$avahi-browse -tarp
(This goes out discovering all the Nodes connected in the cluster using IPV6 internal processes). If this doesn’t see any nodes, it needs to be looked at.

  1. At this Stage, you can use Node IP in URL and should be able to discover all the Nodes in discovery to be able to start Creating Cohesity Cluster.
This is Interactive session, you get to assign NODE IP, VIPS, SMTP, DNS, NTP Servers.
At the end of interactive session, it gives a message notifying you that Cluster has been created, and You can use the provided URL using admin user.
Username: admin
Password: admin
Note: If you want to update gflags, and other things, you may at this point in time.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Validation Steps for Cluster Settings:
1. Now that Cluster Is Up, you can run this at any Node. MII Should show UP on all the nodes you have as part of the cluster.
[cohesity@node ~]$ allssh.sh 'cat /proc/net/bonding/bond0' | grep MII
[01;31m[KMII[m[K Status: up
[01;31m[KMII[m[K Polling Interval (ms): 100
[01;31m[KMII[m[K Status: up
[01;31m[KMII[m[K Status: up
[01;31m[KMII[m[K Status: up
[01;31m[KMII[m[K Polling Interval (ms): 100
[01;31m[KMII[m[K Status: up
[01;31m[KMII[m[K Status: up
[01;31m[KMII[m[K Status: up
[01;31m[KMII[m[K Polling Interval (ms): 100
[01;31m[KMII[m[K Status: up
[01;31m[KMII[m[K Status: up
[01;31m[KMII[m[K Status: up
[01;31m[KMII[m[K Polling Interval (ms): 100
[01;31m[KMII[m[K Status: up
[01;31m[KMII[m[K Status: up
[01;31m[KMII[m[K Status: up
[01;31m[KMII[m[K Polling Interval (ms): 100
[01;31m[KMII[m[K Status: up
[01;31m[KMII[m[K Status: up
  1. [cohesity@node ~]$ allssh.sh 'cat /proc/net/bonding/bond0' | grep Mode
(This should list link aggregation mode. Mode 4 i.e. LACP is dynamic link aggregation mode)
Bonding [01;31m[KMode[m[K: IEEE 802.3ad Dynamic link aggregation
  1. [cohesity@node ~]$ iris_cli node status
  2. [cohesity@node ~]$ iris_cli cluster status
  3. [cohesity@node ~]$ allssh.sh hostips
(This will list all nodes iPs in the cluster)
  1. [cohesity@node ~]$ less logs/iris_proxy_exec.FATAL (lists any fatals related to iris service)
You are Welcome :)