Thursday, November 21, 2019

Cohesity: What are chunks, Erasure Coding (EC) and Replication Factor (RF) ?

Learn Storage, Backup, Virtualization,  and Cloud. AWS, GCP & AZURE.
............................................................................................................................................................................
If you are in Cohesity domain, you will hear a lot about data resiliency, fault tolerance and distributed workload.  
  • All of these whines around smallest unit of data which Cohesity calls Chunk, a form which is written into the disk.
  • Chunk combined with Drive and Node level redundancy and resiliency, results into highly available resilient backup solution.

Chunk: The unit of storage data that Cohesity uses for protection. Chunk file can be considered to be a collection of pieces of data from one or more client objects (files, VMs, etc.) packaged together into a single large unit. Cohesity takes a blob of storage, which can be a collection of one or more client objects, divides it into variable-sized, deduplicated chunks, compresses and encrypts them, and puts them in a chunk file. Usually, chunks from the same large client (user) file are combined to belong to the same chunk file. This will happen in most cases when the client file or VM writes are sequential and can be stored together. There may also be several smaller client files that are not large enough to form a single chunk file, in which case chunks from such client files could be packed together to form a chunk file. 


A chunk file could be protected using either EC or RF schemes. Cohesity provides a configurable resiliency on HDDs or node failures. A single, large file could be a part of several different chunk files and will end up getting distributed evenly across all the nodes of the cluster as defined in Cohesity Erasure coding settings


Replication Factor (RF) refers to the number of replicas of a unit of data. The unit of replication is a chunk file, and a chunk file is mirrored into either one or two other nodes depending on the Replication Factor number chosen. An RF2 mechanism provides resilience against a single data unit failure, and a RF3 provides resilience against two data unit failures.


Erasure Coding (EC) refers to a scheme where a number of usable data stripe units can be protected from failures using code stripe units, which are in turn derived from the usable data stripe units. A single code stripe unit can protect against one data (or code) stripe failure, and two code stripe units can protect against two data (or code) stripe unit failures. 




You are Welcome :)


No comments:

Post a Comment