Working Principle
SmartDedupe and SmartCompression delete duplicate data and compress data respectively to reduce data redundancy and save storage space.
Basic Concepts
The basic concepts of SmartDedupe and SmartCompression are as follows:
- Deduplication data block size: This determines deduplication granularity and is identical to the block size of a thin LUN. To set the block size of a thin LUN, run create lun on the command line interface (CLI) and modify the value of parameter grain_size.
For details about how to use the create lun command, see the command reference of the corresponding product model.
- Compression data block size: This determines compression granularity. The storage system compresses data intelligently using the block size of newly-written data in LUNs.
- Fixed-length deduplication: The storage system deduplicates the data written into a LUN by the specified deduplication data block size.
- Hash algorithm: This verifies the consistency of data blocks. It computes the fingerprint of a data block, which is a unique binary number with a fixed length. If the fingerprints of two data blocks are the same, the storage system considers them duplicates.
- Byte-by-byte comparison policy: This is complementary to the hash algorithm. If two data blocks are found to have the same fingerprint, the storage system compares the data blocks byte by byte, ensuring that the deduplication is secure.
- Deduplication metadata: This stores deduplication information, such as the fingerprints and storage locations of deduplicated data.
Deduplication
After SmartDedupe is enabled for a LUN, the OceanStor storage system uses the hash algorithm to calculate the fingerprint of each new data block. It then compares these fingerprints with those of existing data blocks in the LUN. If a new fingerprint is identical to an existing one, it will be deleted and its storage location registered as that of the existing one. If the fingerprint is unique, the new data block is written to disks.
If you have enabled neither SmartDedupe nor SmartCompression when you create a LUN, you cannot enable them any more after the LUN is created.
Figure1 Deduplication process shows the deduplication process.
- The storage system uses the hash algorithm to compute fingerprint information about new data blocks. In fixed-length deduplication, the division granularity of data blocks is the same as the allocation unit size of thin LUNs, and data blocks of the same size are deduplicated each time.
- The storage system checks if the fingerprint information of the new data blocks is the same as that of any existing data blocks.
- If yes, the storage system considers the new data blocks duplicate, deletes them, and points the storage locations of the new data blocks to those of the existing data blocks.
- If no, the storage system directly writes the new data blocks.
For example, an application server intends to write data blocks C and D. Table 1-2 shows the comparison between the two data blocks and existing data blocks. Figure 1-2 shows the expected effects when different deduplication policies are used.
Data Compression
After SmartCompression has been enabled for LUNs, the OceanStor storage system performs inline compression.
The degree to which data is compressed depends on the specified compression policy. You can set the compression policy to either fast or deep:
- The fast option is the default compression method. It is optimized for speed rather than space efficiency.
- The deep option is optimized for space efficiency rather than speed.
Figure3 Effects of compression policies shows the effects of the two compression policies.