Specifications

NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide
5
In summary, this is how deduplication works. Newly saved data on the FAS system is stored in 4KB blocks
as usual by Data ONTAP. Each block of data has a digital fingerprint, which is compared to all other
fingerprints in the flexible volume. If two fingerprints are found to be the same, a byte-for-byte comparison is
done of all bytes in the block and, if there is an exact match between the new block and the existing block on
the flexible volume, the duplicate block is discarded and its disk space is reclaimed.
1.2 DEDUPLICATED VOLUMES
Despite the introduction of less expensive ATA disk drives, one of the biggest challenges for storage
systems today continues to be the storage cost. There is a desire to reduce storage consumption (and
therefore storage cost per MB) by eliminating duplicate data through sharing blocks across files.
The core NetApp technology to accomplish this goal is the deduplicated volume, a flexible volume that
contains shared data blocks. Data ONTAP supports shared blocks in order to optimize storage space
consumption. Basically, within one volume, there is the ability to have multiple references to the same data
block, as shown in Figure 2.
Figure 2) Data structure in a deduplicated volume.
In Figure 2, the number of physical blocks used on the disk is 3 (instead of 5), and the number of blocks
saved by deduplication is 2 (5 minus 3). In the remainder of this document, these will be referred to as used
blocks and saved blocks.
Each data block has a block count reference kept in the volume metadata. As additional indirect blocks
(―IND‖ in Figure 2) point to the data, or existing ones stop pointing to it, this value is incremented or
decremented accordingly. When no indirect blocks point to a data block, it is released.
The NetApp deduplication technology allows duplicate 4KB blocks anywhere in the flexible volume to be
deleted, as described in the following sections.
The maximum sharing for a block is 255. This means, for example, that if there are 500 duplicate blocks,
deduplication would reduce that to only 2 blocks. Also note that this ability to share blocks is different from
the ability to keep 255 Snapshot copies for a volume.
1.3 DEDUPLICATION METADATA
The core enabling technology of deduplication is fingerprints. These are unique digital ―signatures‖ for every
4KB data block in the flexible volume.
When deduplication runs for the first time on a flexible volume with existing data, it scans the blocks in the
flexible volume and creates a fingerprint database, which contains a sorted list of all fingerprints for used
blocks in the flexible volume.
After the fingerprint file is created, fingerprints are checked for duplicates and, when found, first a byte-by-
byte comparison of the blocks is done to make sure that the blocks are indeed identical, and if they are
found to be identical, the block’s pointer is updated to the already existing data block and the new (duplicate)
data block is released.