OceanStor Dorado V6 Series 6.1.x and V700R001 SmartDedupe and SmartCompression Feature Guide for Block
Working Principles
The SmartDedupe and SmartCompression features deduplicate and compress data respectively to reduce redundant data and save storage capacity. In addition, the two features reduce the amount of data written into SSDs and the number of data writes, prolonging the service life of SSDs.
The storage system provides an adaptive deduplication policy that combines inline deduplication and similarity-based deduplication. The system adaptively performs deduplication and compression based on service data characteristics in different scenarios. Adaptive deduplication and compression maximize the data reduction ratio. Figure 1-1 shows the system processing flow.
When a user writes data, the adaptive deduplication algorithm identifies data suitable for inline or similarity-based deduplication based on data characteristics.
- If an inline deduplication policy is used, the system directly performs inline deduplication. Then the system compresses the user data and writes the compressed data to the storage pool.
- If a similarity-based deduplication policy is used, the system calculates similar fingerprint (SFP) information of the data and inserts SFPs into the similarity-based deduplication opportunity table. Then, the user data is compressed and written to the storage pool. At the same time, the backend reads the data corresponding to these SFPs from disks for similarity-based deduplication. After the deduplication is complete, the fingerprint table is updated.
Basic Concepts
The basic concepts of SmartDedupe and SmartCompression are as follows:
- Deduplication data block size: Specifies the granularity of data that will be deduplicated in a storage system.
- Compression data block size: Specifies the granularity of data that will be compressed in a storage system.
- Similarity-based deduplication: The system divides data into blocks of a fixed size and analyzes the similarity among the blocks. Then, the system deduplicates the identical data blocks and performs combining compression on the similar data blocks.
- Fingerprint: It is a fixed-length binary value that represents a data block. In a storage system, all the mappings between data block fingerprints and data storage locations are stored in the fingerprint table.
- SFP: Specifies the similarities among data. If two pieces of data have the same SFP, the contents of the two pieces of data are partially or completely the same.
Only 6.1.2 and later versions support GFPs.
- Gradient fingerprint (GFP): A piece of data may be similar to multiple pieces of other data, and therefore has multiple SFPs. To ensure that fingerprints with high similarities are preferentially processed during deduplication, the system also records the GFPs for describing data similarities when calculating SFPs.
- Opportunity table: Saves data blocks' fingerprint and location information, and identifies hot data.
- Byte-by-byte comparison: When a storage system searches for duplicate data blocks, it will compare fingerprints of data blocks. If the fingerprints are the same, the system compares the data blocks byte by byte.
- Deduplication metadata: Saves information about deduplication. For example, the metadata saves the fingerprint information about data blocks and the storage locations of data after deduplication is executed.
SmartDedupe
Figure 1-2 shows the similarity-based deduplication process.
You can run the change disk_domain general disk_domain_id=? dedup_method=? command on the CLI in developer mode to change the deduplication mode in 6.1.5 and later versions.
Step 1:
- The storage system divides newly-written data into blocks. The Application Request Size set on the LUN is the block size.
- The storage system uses a similar fingerprint algorithm to calculate the SFPs and GFPs of the new data blocks.
Only 6.1.2 and later versions support GFPs. For 6.1.0, only SFPs of the newly-written data blocks are calculated.
- The storage system writes the data blocks to disks and records data blocks' fingerprint and location information in the opportunity table.
Step 2:
- The storage system periodically checks whether SFPs exist in the opportunity table.
- If yes, go to 2.
- If no, continue the periodic check.
- The storage system checks whether similar data blocks are the same based on byte-by-byte comparison.
- If yes, the storage system considers the new data block redundant and deletes it. Then, the storage system points the fingerprint and storage location of the new data block to that of the existing one in the fingerprint table.
- If no, the storage system performs combining compression on the new data block, records its fingerprint in the fingerprint table, updates the fingerprint to the metadata of the data block, and reclaims the storage space of the data block.
For example, LUN 1, LUN 2, and LUN 4 in the storage system have the same application request size and enabling status of SmartDedupe and SmartCompression. Table 1-2 lists the existing data blocks of LUN 1, LUN 2, and LUN 4 as well as the results of comparison between new data blocks J, K, and L on LUN 1 and the existing data blocks.
LUN Name |
Existing Data Block |
Characteristic of New Data Block |
---|---|---|
LUN 1 |
Data blocks A, B, and C |
|
LUN 2 |
Data blocks D, E, and F |
- |
LUN 4 |
Data blocks G, H, and I |
- |
Figure 1-3 shows the data deduplication results when SmartDedupe is disabled and enabled respectively.
SmartCompression
The storage systems support inline compression. If SmartCompression is enabled for a LUN when it is created, the storage system will compress all the data written to the LUN.
Figure 1-4 shows data compression results.