Working Principles
The SmartDedupe and SmartCompression features deduplicate and compress data respectively to reduce redundant data and save storage space. Moreover, the features reduce the amount of data that is written into SSDs and the data write count, prolonging the service life of SSDs.
Basic Concepts
The basic concepts of SmartDedupe and SmartCompression are as follows:
- Deduplication data block size: Specifies the granularity of data that will be deduplicated in a storage system.
- Compression data block size: Specifies the granularity of data that will be compressed in a storage system.
- Similarity-based deduplication: Indicates that a storage system deduplicates the similar data written to a LUN based on a fixed deduplication data block size during the deduplication process.
- Fingerprint: Represents a data block. The fingerprint is a fixed-length binary numeric value. OceanStor Dorado V6 series storage systems use the weak hash algorithm to calculate the fingerprints of data blocks. In a storage system, all the mappings between data block fingerprints and data storage locations are stored in the fingerprint table.
- Opportunity table: Saves data blocks' fingerprint and location information for identifying hot fingerprints.
- Byte-by-byte comparison: Compares the data of data blocks byte by byte if the fingerprints of these data blocks are the same.
- Deduplication metadata: Saves information about deduplication. For example, the metadata saves the fingerprints of data blocks and the storage locations of data after deduplication is executed.
SmartDedupe
Figure 1-1 shows the similarity-based deduplication process.
Step 1:
- The storage system divides newly-written data into blocks. The value of Application Request Size on the LUN is the size of the block.
- The storage system uses a similar fingerprint algorithm to calculate the SFPs of the newly-written data blocks.
- The storage system writes the data blocks to disks and records the data blocks' SFPs and storage locations in the opportunity table.
Step 2:
- The storage system periodically checks whether identical SFPs exist in the opportunity table.
- If yes, go to 2.
- If no, continue the periodic check.
- The storage system checks whether the similar data blocks are the same in byte-by-byte comparison.
- If yes, the storage system considers that the new data block redundant and deletes it. Then, the storage system points the fingerprint and storage location of the new data block to that of the existing one in the fingerprint table.
- If no, the storage system performs delta compression on the new data block, records its fingerprint in the fingerprint table, updates the fingerprint to the metadata of the data block, and reclaims the storage space of the data block.
For example, LUN 1, LUN 2, and LUN 4 in the storage system have the same attributes. Table 1-2 lists the existing data blocks of LUN 1, LUN 2, and LUN 4 as well as the results of comparison between new data blocks J, K, and L on LUN 1 and the existing data blocks.
LUN Name |
Existing Data Block |
Characteristic of New Data Block |
---|---|---|
LUN 1 |
Data blocks A, B, and C |
|
LUN 2 |
Data blocks D, E, and F |
- |
LUN 4 |
Data blocks G, H, and I |
- |
Figure 1-2 shows the similarity-based deduplication results when SmartDedupe is disabled and enabled respectively.
SmartCompression
OceanStor Dorado V6 series storage systems support inline compression. If the SmartCompression feature is enabled for a LUN when it is created, the storage system will compress all the data written to the LUN.
Figure 1-3 shows data compression results.