Lets Talk SAN, NAS, Cloud, & Backup: November 2019

Tuesday, November 26, 2019

Netapp: How to remove Nodes from multi-node cluster running CDOT with Examples

I am a seasonal IT professional with a background on VMware, Storage, Backup, Unix, and Project liaison experience. I have held positions working on technologies like Netapp, EMC, IBM, Cohesity storage and Backup supporting SAN and NAS Environment. I have held roles of IT administrator, engineer, team lead and project liaison. This blog is for Storage and Backup Professionals, and content are derived from vendor as well as my own experience.
............................................................................................................................................................................
I recently did cluster unjoin and removed nodes from multi-node cluster running CDOT 9.3Px without any service disruption and with seamless activities.

##Visibility to system is the KEY. Log into SP or Console of nodes all the time during this activity. Helps if you run into some unforeseen or unpredicted situations.

Prerequisites:

Disable Storage failover (cf status) between HA nodes.
Migrate data lifs home and home port from nodes in question over to other healthy nodes/ports by net interface modify.
Remove ports of nodes from broadcast domain / failover-groups leaving only ones that would be present after nodes removal.
Delete intercluster lifs, and remove Intercluster ports from corresponding broadcast domain.

Actual Steps:

Step 1.

Log into adv mode in cluster by:

cluster: set advanced

Warning: These advanced commands are potentially dangerous; use them only when directed to do so by NetApp personnel.

Do you want to continue? {y|n}: (Press y)

Step 2.

Confirm that the nodes in questions is serving as master node. If it is working as master node, make it ineligible. (Caution, Once you have it made it ineligible, you would need to reboot the node to make it eligible again, if you need to.) This can be done by checking cluster ring status from adv mode.

cluster*::>cluster ring show

Verify that there is still no lif or data on nodes in question:

net int show -home-node node1

net int show -home-node node2

cluster*::> cluster modify -node node2 -eligibility false

Double confirm by running below to ensure there is no lif dependency except cluster port and/or node management port/lif. It would complain if there is still something left on nodes being worked.

cluster*::>cluster ring show

(This would list node2 being offline as it has been deemed ineligible. Other node will display as master)

(You run the same for both nodes in HA.)

Node UnitName Epoch DB Epoch DB Trnxs Master Online

--------- -------- ------- -------- -------- --------- ---------

node2 mgmt 0 25 736267 offline

node2 vldb 0 23 1913 offline

node2 vifmgr 0 25 11031 offline

node2 bcomd 0 26 4 offline

node2 crs 0 23 1 offline

Step 3. Verify:

cluster::*>storage failover show

cluster::*>cluster ring show

Step 4.

Now run actual unjoin action.

cluster::*> cluster unjoin -node node2

(This will provide some warnings, but as long as checklist of pre-req and other is completed, proceed)

(At this time, you can wipe disk data by pressing step 4 by doing “Ctrl + C” during reboot. You will have visibility to console, if you have been accessing the node via console or service processor. It is strongly advised, that nodes in question be accessed using sp in a different session than actual session where cluster unjoin session is being executed.)

Step 5.

Halt the nodes in question by logging into individual nodes, if applicable. You can uninstall the Hardware after completion of disk initialization (if chosen). Otherwise, its safe to remove cables, uninstall hardware.

You are welcome :)

Helpful link: https://www.youtube.com/watch?v=D9TyL5hygMo

Thursday, November 21, 2019

Cohesity: What are chunks, Erasure Coding (EC) and Replication Factor (RF) ?

Learn Storage, Backup, Virtualization, and Cloud. AWS, GCP & AZURE.

............................................................................................................................................................................
If you are in Cohesity domain, you will hear a lot about data resiliency, fault tolerance and distributed workload.

All of these whines around smallest unit of data which Cohesity calls Chunk, a form which is written into the disk.

Chunk combined with Drive and Node level redundancy and resiliency, results into highly available resilient backup solution.

Chunk: The unit of storage data that Cohesity uses for protection. Chunk file can be considered to be a collection of pieces of data from one or more client objects (files, VMs, etc.) packaged together into a single large unit. Cohesity takes a blob of storage, which can be a collection of one or more client objects, divides it into variable-sized, deduplicated chunks, compresses and encrypts them, and puts them in a chunk file. Usually, chunks from the same large client (user) file are combined to belong to the same chunk file. This will happen in most cases when the client file or VM writes are sequential and can be stored together. There may also be several smaller client files that are not large enough to form a single chunk file, in which case chunks from such client files could be packed together to form a chunk file.

A chunk file could be protected using either EC or RF schemes. Cohesity provides a configurable resiliency on HDDs or node failures. A single, large file could be a part of several different chunk files and will end up getting distributed evenly across all the nodes of the cluster as defined in Cohesity Erasure coding settings

Replication Factor (RF) refers to the number of replicas of a unit of data. The unit of replication is a chunk file, and a chunk file is mirrored into either one or two other nodes depending on the Replication Factor number chosen. An RF2 mechanism provides resilience against a single data unit failure, and a RF3 provides resilience against two data unit failures.

Erasure Coding (EC) refers to a scheme where a number of usable data stripe units can be protected from failures using code stripe units, which are in turn derived from the usable data stripe units. A single code stripe unit can protect against one data (or code) stripe failure, and two code stripe units can protect against two data (or code) stripe unit failures.

You are Welcome :)

Source:https://info.cohesity.com/Cohesity-Fault-Tolerance-White-Paper.html

Cohesity: Architecture Concept and Terminology...

Learn Storage, Backup, Virtualization, and Cloud. AWS, GCP & AZURE.

Uses Paxos algorithm for read consistency which is a mechanism to return read request from recently written value especially in a distributed filesystem.

https://lamport.azurewebsites.net/pubs/paxos-simple.pdf

Consistent Hashing to spread data across all nodes in a cluster.
Data distribution using selected Erasure Coding (EC) or Replication Factor (RF) factor.

Strict consistency : Non Disruptive Upgrades and operational function of non-disruptive service delivery at an event of disk or node failures i.e. strict consistency to support backup, restore, application data consistency and so on.

SpanFS is an underlying web-scale file system which is a fully distributed filesystem which is where Cohesity Software Defined backup and recovery application “DataProtect” runs. SpanFS is what exposes NFS, SMB, and S3 Interfaces while it also manages the IO operation for all data written to or from the system.

Distributed Lock Manager, manages concurrent access to the data repository and metadata

Data Repository stores actual client data, such as network files, VMs, and databases ina. deduplication, compressed, and encrypted form.

Metadata Store keeps track of all file data sitting across nodes, Metadata store is based on Distributed Key-Value, that incorporates a fully redundant consistent, distributed NoSQL store for fast IO operations at scale.

SnapTree is Cohesity’s builtin function that provides unlimited, frequent snapshots which provides a distributed metadata structure based on B+ tree concepts.

Data Journaling: The SpanFS file system constantly looks at incoming requests and tries to estimate the IO pattern. Journal absorbs IOs and acts as write-cache which can be committed to disks later helping making data crash-consistent. It is part of the metadata and is replicated along with the File Metadata Store .

Distributed Metadata Manager: On each node, the underlying SpanFS file system is used to write to disks. All file data is stored on the Distributed File Data Store. Distributed Metadata Manager maintains all metadata.

Pictorial Depiction Below:

You are Welcome :)

Source:https://info.cohesity.com/Cohesity-Fault-Tolerance-White-Paper.html

Wednesday, November 20, 2019

Netapp: How to partition SSD disk/tray to an aggregate that already has partitioned disks

Learn Storage, Backup, Virtualization, and Cloud. AWS, GCP & AZURE.
............................................................................................................................................................................
Problem Synopsis: I have encountered into a situation, where I saw discrepancy on logical partition size of newly added disks into a pool of partitioned disks aggregate while you check partition size of spare drive.

Potential Result: In an event drive would fail in a current aggregate, the spare partition drive wont kick in, or even if it kicks in, it may not be in healthy state.

Solution: Validate the current partition size first, and then when add a new tray of SSD or just drives, perform partition copying the partition size.

**This applies only when you are adding additional disk/tray to an already existent aggregate**
Step 1: All new drives or entire tray of disks should be assigned to one node. If it partitions by default, you would need to unpartition the disks first. After disks are unpartitioned, thats when you partition the disks from nodeshell.

Step 2: Get the actual raw size of P3 partition from root aggregate. Remember Higher Partition number is for ROOT. E.G. P3 is partition only usable for root, No data aggregate can be created even when you have many spare P3 partition available. (This is what I found little off i.e. not being able to use entire space).

** Judgement to use Unparitioned drive to leave two drives for Parity and DP, Or, do Partition the drives and still leave P3 partition in the float pool, is something what you want. I found out that partitioning still is better in terms of total SSD usable capacity rather than using non-partition based disk add, after adding into pool.**

Go to diag mode from a node where you already have root aggregate (partitioned) and gather info on P3 partition raw size.

node>priv set diag
node*>raid_Config info listdisk XX.XX.XXP3 (locate raw size)

xxxxxxxxxxxxxxxxdfdsdafdsjlfkj, jlkjfldkjlsj ,ljkjflsjkdljsljd,jlfjlksjfljdlsdjssjlfjldsfjldfs,lsjldsjdlfjsd

rawsize=28246976,used=28239680,rightsize=28244928,slfjldsjfs,klsjfkdsljfs,jlkfjlsjlskjlk,lkdjldksjlkf

Step 3: Now From list of all the unpartitioned disk that were just added, Pick the one you want to partition. And perform Partition.

node*> disk partition -n 3 -i 3 -b 28246976 0d.11.0 (n= number of parition, b=raw size from current one)

disk partition: 0d.11.0 partitioned successfully

Step 4: Validate the partitioned disk.

node*> disk show -n

DISK OWNER POOL SERIAL NUMBER HOME DR HOME

------------ ------------- ----- ------------- ------------- -------------

0d.11.0P1 Not Owned NONE

0d.11.0P2 Not Owned NONE

0d.11.0P3 Not Owned NONE

Step 4: Now Assign the partition to respective node.

node*> disk assign 0d.11.0P3 -o

This will leave remainder of Partition match what were were previously on existing partitioned disks.

You are Welcome :)

Source: www.netapp.com

Thursday, November 14, 2019

SAN CISCO: NPV and NPIV Concepts, Configuration with examples

............................................................................................................................................................................

Why to Use NPV, and what is NPIV. Use Case with Examples

· In fabric mode, each switch that joins a SAN is assigned a domain ID. Each SAN (or VSAN) supports a maximum of 239 domain IDs, so the SAN has a limit of 239 switches.

· NPV alleviates the domain ID limit by sharing the domain ID of the core switch among multiple edge switches.

· In NPV mode, the edge switch relays all traffic to the core switch, which provides the Fibre Channel switching capabilities. The edge switch shares the domain ID of the core switch.

· Server interfaces are F ports on the edge switch that connect to the servers.

· A server interface may support multiple end devices by enabling the N port identifier virtualization (NPIV) feature. NPIV provides a means to assign multiple FC IDs to a single N port, which allows the server to assign unique FC IDs to different applications.

· All interfaces from the edge switch to the core switch are configured as proxy N ports (NP ports). An NP uplink is a connection from an NP port on the edge switch to an F port on the core switch.

EXTERNAL INTERFACE: NP Port (That connects to the Core Switch F-Port and does fabric logins)

SERVER INTERFACE: F Port (That connects to Client hosts)

Enabling NPV

	Command	Purpose
Step 1	switch# configure terminal switch(config)#	Enters configuration mode.
Step 2	switch(config)# npv enable	Enables NPV mode. The switch reboots, and it comes back up in NPV mode. Note A write-erase is performed during the initialization.
Step 3	switch(config-npv)# no npv enable switch(config)#	Disables NPV mode, which results in a reload of the switch.

Configuring NPV Interfaces

After you enable NPV, you should configure the NP uplink interfaces and the server interfaces.

	Command	Purpose
Step 1	switch# configure terminal switch(config)#	Enters configuration mode.
Step 2	switch(config)# interface fc slot/port	Selects an interface that will be connected to the core NPV switch.
Step 3	switch(config-if)# switchport mode NP switch(config-if)# no shutdown	Configures the interface as an NP port. Brings up the interface.

To configure a server interface, perform this task:

	Command	Purpose
Step 1	switch# configure terminal switch(config)#	Enters configuration mode.
Step 2	switch(config)# interface { fc slot/port \| vfc vfc-id }	Selects a server interface.
Step 3	switch(config-if)# switchport mode F switch(config-if)# no shutdown	Configures the interface as an F port. Brings up the interface.

NPV Traffic Maps

An NPV traffic map associates one or more NP uplink interfaces with a server interface.

To configure a traffic map, perform this task:

	Command	Purpose
Step 1	switch# config t switch(config)#	Enters configuration mode on the NPV.
Step 2	switch(config)# npv traffic-map server-interface { fc slot/port \| vfc vfc-id } e xternal-interface fc slot/port switch (config)#	Configures a mapping between a server interface (or range of server interfaces) and an NP uplink interface (or range of NP uplink interfaces).
Step 2	switch(config)# no npv traffic-map server-interface { fc slot/port \| vfc vfc-id } external-interface fc slot/port switch (config)#	Removes the mapping between the specified server interfaces and NP uplink interfaces.

Verifying NPV

Command	Purpose
switch# show npv flogi-table [ all ]	Displays the NPV configuration.

Display a list of devices on a server interface and their assigned NP uplinks,

Display the status of the server interfaces and the NP uplink interfaces,

switch# show npv status

npiv is enabled

External Interfaces:

====================

Interface: fc2/1, VSAN: 1, FCID: 0x1c0000, State: Up

Interface: fc2/2, VSAN: 1, FCID: 0x040000, State: Up

Interface: fc2/3, VSAN: 1, FCID: 0x260000, State: Up

Interface: fc2/4, VSAN: 1, FCID: 0x1a0000, State: Up

Number of External Interfaces: 4

Server Interfaces:

==================

Interface: vfc3/1, VSAN: 1, NPIV: No, State: Up

Number of Server Interfaces: 1

Verifying NPV Traffic Management

You are Welcome :)

Source: www.cisco.com