Wednesday, December 18, 2019

Netapp: xcp tool for high file count volumes copy

Learn Storage, Backup, Virtualization,  and Cloud. AWS, GCP & AZURE.
............................................................................................................................................................................


Troubleshooting Workflow: NetApp XCP Migration Tool errorsDocument ID BR519
Answer ID 1005557
Published Date 12/12/2019
Symptom
Symptom 1
-bash: ./xcp: Permission denied
Symptom 2
xcp: ERROR: This license has expired
Symptom 3
xcp: This copy is not licensed.
Symptom 4
xcp: ERROR: XCP not activated, run 'activate' first
Symptom 5
xcp: ERROR: License file /opt/NetApp/xFiles/xcp/license not found.
Symptom 6
xcp: ERROR: Failed to activate license: Server unreachable
Symptom 7
xcp usage error: show: requires at least one server
Symptom 8
xcp: ERROR: show '10.61.73.94:/vol/nfsvol1': invalid hostname
Symptom 9
xcp usage error: scan: missing source path
Symptom 10
xcp usage error: scan: invalid path '10.61.73.94'
Symptom 11
xcp: ERROR: Catalog inaccessible: Cannot mount nfs_server:/export[:subdirectory]
Symptom 12
xcp: compare1 'n.txt': ERROR: nfs3 LOOKUP 'n.txt' in '10.61.73.113:/target': nfs3 error 2: no such file or directory
Symptom 13
xcp: ERROR: Failed to open catalog id '3': nfs3 LOOKUP '3' in '10.61.73.94:/vol/nfsvol1/catalog/indexes': nfs3 error 2: no such file or directory
Symptom 14
xcp: ERROR: Empty or invalid index '10.61.73.94:/vol/nfsvol1/catalog/indexes/2/2.index'
Symptom 15
xcp: copying 'file3': WARNING: 10.193.67.237:/ntfs/topdir_3/subdir_52/file3: nfs3 SETATTR '10.193.67.237:/ntfs/topdir_3/subdir_52/file3' mode 00777, uid 0, gid 0, size 27, atime Thu Feb 23 10:15:00 2017, mtime Tue Feb  7 11:51:54 2017: nfs3 error 1: not owner
Symptom 16
Sync fails in different areas due to following warning:
xcp: mount 10.61.73.94:/vol/nfsvol1: WARNING: This NFS server only supports 1-second timestamp granularity. This may cause sync to fail because changes will often be undetectable.
Symptom 17
XCP fails to transfer ACLs and produces the following ERROR:
ERROR failed to obtain fallback security principal "". Please check if the principal with the name "" exists on "".
Cause
The cause and procedure to be performed to resolve these issues are described in the Solution section below.

Cause #1: Improper file permissions for XCP binary.
Solution: Modify permissions by running the  "chmod 755" command. 
Ensure XCP binary can be executed by the 
root user or sudo
command on designated XCP Linux client host.

Cause #2: This error occurs if the free 90 days XCP evaluation license is expired.
Solution: Renew or obtain the new XCP license from https://xcp.netapp.com

Cause #3: XCP is not licensed.
Solution: Download and apply XCP licenses on Linux client host system.
Cause #4: XCP license is not activated
Solution: Obtain the appropriate XCP license file. Copy the XCP license to the 
/opt/NetApp/xFiles/xcp/ directory on the XCP Linux client host. Run  "xcp activate" command to activate the license.
Cause #5: XCP license is not activated.
Solution: Download the XCP license from https://xcp.netapp.com. Copy XCP license file on to the XCP Linux client host at 
/opt/NetApp/xFiles/xcp/
After copying, run "xcp activate" command to activate the XCP license.
Cause #6: License file not found at 
/opt/NetApp/xFiles/xcp/license
Solution: Register for XCP license at https://xcp.netapp.com. Download and copy XCP license file to 
/opt/NetApp/xFiles/xcp/ directory on XCP Linux client host.
Cause #7: Incomplete parameter specified for the 
xcp show command.
Solution: Re-run the command with the server name or IP address specified. Correct syntax 
./xcp show abc.nfsserver.com
Cause #8: Valid hostname was not specified for the 
./xcp show
 command.
Solution: Run the 
./xcp show
 command with valid hostname. Exact command syntax 
./xcp show 10.61.73.94
 or 
./xcp show localhost
Cause #9: Incomplete parameters specified for the 
xcp scan
 command.
Solution: To resolve this error, run the 
xcp scan
 command with the complete source NFSv3 export path. Correct syntax 
xcp scan
Cause #10: Incomplete parameter specified for the 
xcp scan command.
Solution: Run the same command again by specifying the complete NFSv3 source export path. Correct syntax 
xcp scan .
Cause #11: Catalog path is not specified in the
 xcp.ini
 configuration file.
Solution: Open a text editor on XCP Linux client host and update XCP configuration file with proper catalog location. XCP config file is located at 
/opt/NetApp/xFiles/xcp/xcp.ini
Sample entries of config file:
[root@localhost linux]# cat /opt/NetApp/xFiles/xcp/xcp.ini
# Sample xcp config:
[xcp]
catalog = 10.61.73.94:/vol/nfsvol1

Cause #12: Verify operation did not find the source file(s) on target NFS export.
Solution: Run the 
xcp sync
 command to copy the incremental updates from the source to destination.

Cause #13: XCP could not locate the specified catalog index number.
Solution: Verify the index number of the previous operation. To determine the exact catalog path, run the 
cat /opt/NetApp/xFiles/xcp/xcp.ini
 command. Catalog indexes are located in the 
:/catalog/indexes
 directory. After locating the index number, run the same command again by specifying the correct index number or name.
Cause #14: Previous copy operation was interrupted before indexing the files.
Solution: Run the 
xcp copy
 operation again by specifying the 
–newid
 option.
Interrupt the copy operation when you see the message '
indexed' 
on the console session
:
88,126 scanned, 42,058 copied, 41,469 indexed, 150 MiB in (3.60 MiB/s), 41.9 MiB out (2.88 MiB/s), 16s
Cause #15: This happens on NTFS security style volumes because XCP attempts to chown a file and cannot, as UNIX operations will fail on NTFS security style volume by default.
Solution:  Run the following command:
# chown root newfile
chown: changing ownership of ‘newfile’: Operation not permitted

In a packet trace, the following is displayed:
3380 7.642295   10.193.67.237   10.193.67.233   NFS  214  V3 SETATTR Reply (Call In 3338) Error: NFS3ERR_PERM
The error is benign, as the owner would be set from a Windows client after the fact anyway.
The following are the two ways to avoid this:
  • Change the export policy rule to ignore with the following command: export-policy rule modify -vserver DEMO -policyname default -ruleindex 1 -ntfs-unix-security-ops ignore
  • Copy to a mixed security style or UNIX security style volume. After the copy, change the security style to NTFS.
A third possible way would be for XCP not to attempt to run setattr after copying. Unable to identify a way to perform so in the current version for the copy command. Perhaps that option could go into a future release.
  • copy: Recursively copy everything from source to target
  • newid : Catalog name for a new index
  • md5: Checksum the files (also save the checksums when indexing) (default: False)
  • edupe: Include dedupe estimate in reports (see documentation for details)
  • nonames: Do not look up user and group names for file listings or reports
  • bs : read/write blocksize (default: 64k)
  • dircount : Request size for reading directories (default: 64k)
  • parallel : Maximum concurrent batch processes (default: 7)
  • noId: Disables the creation of a default index (default: False)
Cause #16: Have seen this when source volume is SLES or RHEL with ext3 file system
Solution: Issue is documented here: 1181841 XCP Sync: Timestamp granularity
  • Workaround: We recommend to have access time enabled, because xcp compares the timestamp of the files from the source and index (contains all the file metadata found on the source). And in this use-case we may have differences < 1s and xcp will not apply the changes.
  • This is only a warning, and could be ignored.
Cause #17: This may happen when:
  • The target netbios name/machine account cannot be resolved to an IP
  • The target machine cannot resolve the fallback-user or fallback-group to a SID
Solution:
  • For cause 1, add a mapping in the hosts file (c:\windows\system32\drivers\etc\hosts) for the netbios name/machine account name of the target.
  • For cause 2, select a fallback-user/group that can be resolved by the target system. This can either be a local user to the target machine, or a domain user resolvable by the target machine.
Related Links:
  • 1098120: NetApp XCP Frequently Asked Questions and Resources
  • 1015592: Triage Template - How to troubleshoot NetApp XCP Migration Tool issues
  • 1015613: How to transition 7-Mode NFSv3 exports to clustered Data ONTAP using NetApp XCP Migration Tool 1.0
XCP 1.5
XCP Best Practices TR-4808
You are Welcome :)

Monday, December 9, 2019

Netapp: How to do perform Disk Erasure, Disk Clearing and Wipe Configuration on CDOT Netapp disks.

Learn Storage, Backup, Virtualization,  and Cloud. AWS, GCP & AZURE.
............................................................................................................................................................................

Depending on what scenario is applicable to you in your environment, there are two ways to run data erasure and disk configuration wipe out activity. By simply pushing the data disk to spare pool after removing it from aggregate may not suffice some of the data erasure function requirement. Disk degaussing by feeding the drive into traditional degauss machine that uses magnetic function to certify degauss might leave some data owners in a questionable position about their comfort before disk disposal. Here are two scenarios that are standard practices which Netapp supports without intervention from third party software or application that runs disks data erasure and wiping configuration.

Scenario 1 Usage: Disk Initialize
If there is flexibility to reboot the node and entire disks data needs to be wiped or configuration reset, then “disk Initialize” is the right option.

Prerequisites.
1. If the disks are part of an aggregate and/or holds volumes, volumes must be taken offline, destroyed followed by taking aggregate offline and aggregate deletion.
2. Disks must be in a spare pool, but can be owned by nodes.
3. Only root aggregate disks should be present.

Actual Action:
Step 1. Boot each node by while accessing from console or SP (if configured) and take it to the loader prompt.
Boot each node to the LOADER/CFE prompt and ensure that the below variables are set. These variables remove the cluster RDBs, CDBs and the varfs from mroot, boot device and nvram.
setenv bootarg.init.boot_clustered true
setenv bootarg.factory_init_completed true
setenv bootarg.init.clearvarfsnvram true

Step 2. Then run boot_ontap from loader prompt and while node reboots, Press CTRL + C to go to special boot menu.

Step 3. Out of 8 special boot menu option, dont select any option yet. Rather Type “wipeconfig” on each node.

Step 4. Then select Option no. 4. that says “Clean configuration and initialize all disks”
(This will prompt if you want to zero disks, reset config and install a new file system. Type “yes”)

Step 5. This will run disk initialize operation in the background which is indicated by dots (…….) fillling the screen till its done. Each and every drives gets initialized and upon completion, it will take you to a prompt where it asks if you want to create or join cluster or new filesystem.

At this time, it is safe to power down Controller head and Disk Shelves. Disk have been reset and data have been completely erased.


Scenario 2 Usage: Disk Sanitization
Data ONTAP 8.0 and earlier, the disk sanitization feature needed a disk sanitization license.
Data ONTAP 8.1 and later, Just need to enable the feature per step 1, under Actual Action below.
If you only few drives in a stack, or just one shelf from a set of stack of shelves, you cannot use scenario 1 based solution for complete data erasure as we are not going to erase data from entire array, but only from selective disks.
Disk sanitization is the process of physically obliterating data by overwriting disks with specified byte patterns or random data so that recovery of the original data becomes impossible. You use the sanitization process to ensure that no one can recover the data on the disks. This functionality is available through the nodeshell.

Pre-Requisites.
1. Disks in question must be in spare pool, but can be owned by nodes.

How Disk sanitization works?
Disk sanitization process uses three successive default or user-specified byte overwrite patterns for up to seven cycles per operation. The random overwrite pattern is repeated for each cycle.

**Sanitization contains two phase:
a. Formatting phase
b. Pattern overwrite phase

**Disk Sanitization Feature is applied at Storage system level, and once it is enabled, it cannot be disabled.

Actual Action:
Step 1. Go to nodeshell from cluster.
node::>options nodescope.reenabledoptions licensed_feature.disk_sanitization.enable
node::>options licensed_feature.disk_sanitization.enable on

Step 2.Start disk sanitize on disk or disklist.
node::> disk sanitize start disk_list

Step 3. Check disk sanitize status
node::> disk sanitize status disk_list

Step 4. After disk sanitization is complete, return the sanitized disk to spare pool, it wont automatically send disk to spare pool.
node::> disk sanitize release disk_name

Step 5. Exit from node shell and go to cluster shell.
node::> CTRL + D
Cluster::>>

Step 6. Verify disk have been properly placed on spare pool.
cluster::> storage disk show -container-type spare

By now, disk is sanitized with no data and is in hot spare pool for it be ready to be used.

**At this time, you can use the degauss machine to crush the drive**

**Some of the Industry Standards on how to run disk sanitization or data erasure procedure**

https://kb.netapp.com/app/answers/answer_view/a_id/1034565/~/how-to-use-disk-sanitize-to-meet-department-of-defense-5220.22m-
https://kb.netapp.com/app/answers/answer_view/a_id/1072424/~/how-to-perform-disk-erasure%2C-disk-clearing-and-disk-sanitization-

You are Welcome :)