Friday, September 25, 2020

Cohesity- Network related troubleshooting during initial cluster build


Learn Storage, Backup, Virtualization,  and Cloud. AWS, GCP & AZURE.
 
 ..........................................................................................................................................................................
Network Related Troubleshooting during Initial Cluster Build With Examples

1. To Begin with start from Looking at 10G interfaces.
Bond0 is used by Cohesity Nodes.
By default the 10G interfaces are included in bond0.
ens802f0/ens802f1 are 10G interfaces on C2xxx,4xxx, and 6xxx Series.
Mode 1 (active-backup policy)
Mode 4 (active-active policy)- LACP (Recommended)

2. Ensure what primary interface is used for running setting.
[cohesity@node ~]$ primary_interface_name.sh
bond0 (This should be listed in result)
[cohesity@node ~]$allssh.sh ‘ip a |grep bond’
(This will list what interfaces are member of bond0)

3. Ensure 10G ports are connected.
[cohesity@node ~]$ sudo ethtool ens802f0
Settings for ens802f0:
  Supported ports: [ FIBRE ]
  Supported link modes:   10000baseT/Full
  Supported pause frame use: Symmetric
       ...
  Speed: 10000Mb/s
  Duplex: Full
  Port: Direct Attach Copper
  PHYAD: 0
  Transceiver: internal
  Auto-negotiation: off
       ...
  Link detected: yes
[cohesity@node ~] sudo ethtool ens802f1
Settings for ens802f1:
  Supported ports: [ FIBRE ]
  Supported link modes:   10000baseT/Full
  Supported pause frame use: Symmetric
       ...
  Speed: 10000Mb/s
  Duplex: Full
  Port: Direct Attach Copper
  PHYAD: 0
  Transceiver: internal
  Auto-negotiation: off
       ...
  Link detected: yes

4. Ensure LLDP service is enabled on switch port.
[cohesity@node ~] sudo lldpctl ens802f0
-----------------------------------------------------
LLDP neighbors:
------------------------------------------------
Interface:    ens802f0, via: LLDP, RID: 3, Time: 6 days, 02:38:24
(This should list Chassis/SerialNumber— details of connected Switch).
[cohesity@node ~] sudo lldpctl ens802f1
-----------------------------------------------------
LLDP neighbors:
-----------------------------------------------------
Interface:    ens802f1, via: LLDP, RID: 4, Time: 6 days, 02:42:21

5. Ensure Bond Ports are up, active and with no issues.
[cohesity@node ~]$ cat /proc/net/bonding/bond0
[cohesity@node ~]$ cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
Slave Interface: ens802f0
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: a4:bf:01:2d:7f:56
Aggregator ID: 3
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0

6. When used Native vlan, Ensure Port Config is correct on Switch running config.
nexus-1
interface port-channel 101
description vpc 101 cohesity-node1-ens802f0
switchport mode trunk
switchport trunk allowed vlan 50
switchport trunk native vlan 50
spanning-tree port type edge trunk
vpc 101
interface Ethernet1/5
description vpc 101 cohesity-node1-ens802f0
switchport mode trunk
switchport trunk allowed vlan 50
switchport trunk native vlan 50
channel-group 101 mode active
nexus-2
interface port-channel 101
description vpc 101 cohesity-node1-ens802f1
switchport mode trunk
switchport trunk allowed vlan 50
switchport trunk native vlan 50
spanning-tree port type edge trunk
vpc 101
interface Ethernet1/5
description vpc 101 cohesity-node1-ens802f1
switchport mode trunk
switchport trunk allowed vlan 50
switchport trunk native vlan 50
channel-group 101 mode active

7. Map out Mac address with IP of Node using arp tool. If duplicate IP address is used, it helps identify that.
[cohesity@node-1 ~]$ hostips
10.19.65.50 10.19.65.51 10.19.65.52 10.19.65.53
[cohesity@node-1 ~]$ ping 10.19.65.50
PING 10.19.65.50 (10.19.65.50) 56(84) bytes of data.
64 bytes from 10.19.65.50: icmp_seq=1 ttl=64 time=0.112 ms
64 bytes from 10.19.65.50: icmp_seq=2 ttl=64 time=0.071 ms
^C
[cohesity@node-1 ~]$ arp -na |grep 10.19.65.50
? (10.19.65.50) at 00:1e:67:9c:49:90 [ether] on bond0.101
[cohesity@node-1 ~]$ ifconfig bond0.101
bond0.101: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.19.65.50  netmask 255.255.254.0  broadcast 10.5.65.255
        inet6 fe80::21e:67ff:fe9c:4990  prefixlen 64  scopeid 0x20<link>
        ether 00:1e:67:9c:49:90  txqueuelen 1000  (Ethernet)

8. Check ethtool errors. Check Sfp, cable, or connection if there is error- especially crc/errors.
[cohesity@node ~]$ sudo ethtool -S ens802f0 |egrep "dropped|error"
    rx_errors: 0
    tx_errors: 0
    rx_dropped: 0
    tx_dropped: 0
    rx_over_errors: 0
    rx_crc_errors: 0
    rx_frame_errors: 0
    rx_fifo_errors: 0
    rx_missed_errors: 24758
    tx_aborted_errors: 0
    tx_carrier_errors: 0
    tx_fifo_errors: 0
    tx_heartbeat_errors: 0
    rx_long_length_errors: 0
    rx_short_length_errors: 0
    rx_csum_offload_errors: 864
    rx_fcoe_dropped: 0

9. New cluster install with NON-Native VLAN trunk port configuration.
Example shows customer wanting to use vlan 101. It needs to configure vlan 101 to be used as primary interface.
[cohesity@node-1 ~]$ primary_interface_set.sh bond0.101
[cohesity@node-1 ~]$ primary_interface_name.sh
bond0.101
When ./configure_network.sh is used, select option 10 that allows you to customize vlans, and IP info.

10. After Cluster is created, validate vlans for VIPS
[cohesity@optimus-64-11 ~]$ iris_cli vlan ls

11. Once all of these are verified, and checked out, it is possible that Cohesity’s NEXUS service, which is responsible for network, might need a restart too.
for i in `seq 180 184`; do ssh 10.123.23.$i date; done
for i in `seq 180 184`; do ssh 10.123.23.$i sudo systemctl stop nexus; done
for i in `seq 180 184`; do ssh 10.123.23.$i sudo systemctl restart nexus; done

  1. These are the areas to be looked at that may potentially indicate issue in their logs.
[cohesity@node ~]$less nexus_exec.FATAL
[cohesity@node ~]$less nexus_exec.INFO
[cohesity@node ~]$ less logs/nexus_proxy_exec.INFO (Displays NEXUS Service issues).
[cohesity@node ~]$ ls -ltr  logs/*FATAL*

[cohesity@node ~]$cat /etc/sysconfig/network-scripts/ifcfg-bond0 


You are Welcome :)

No comments:

Post a Comment