Specifications

NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide
6
Releasing a duplicate data block entails updating the indirect inode pointing to it, incrementing the block
reference count for the already existing data block, and freeing the duplicate data block.
In real time, as additional data is written to the deduplicated volume, a fingerprint is created for each new
block and written to a change log file. When deduplication is run subsequently, the change log is sorted and
its sorted fingerprints are merged with those in the fingerprint file, and then the deduplication processing
occurs.
Note that there are really two change log files, so that as deduplication is running and merging the new
blocks from one change log file into the fingerprint file, new data that is being written to the flexible volume is
causing fingerprints for these new blocks to be written to the second change log file. The roles of the two
files are then reversed the next time that deduplication is run. (For those familiar with Data ONTAP usage of
NVRAM, this is analogous to when it switches from one half to the other to create a consistency point.)
Note: When deduplication is run for the first time on an empty flexible volume, it still creates the fingerprint
file from the change log.
Here are some additional details about the deduplication metadata:
There is a fingerprint record for every 4KB data block, and the fingerprints for all the data blocks in the
volume are stored in the fingerprint database file.
Fingerprints are not deleted from the fingerprint file automatically when data blocks are freed, but when
a threshold of 20% new fingerprints is reached, the stale fingerprints are deleted. This can also be done
by a manual operation from the command line.
In Data ONTAP 7.2.X, all the deduplication metadata resides in the flexible volume.
Starting with Data ONTAP 7.3.0, part of the metadata resides in the volume and part of it resides in the
aggregate outside the volume. The fingerprint database and the change log files that are used in the
deduplication process are located outside of the volume in the aggregate and are therefore not captured
in Snapshot copies. This change enables deduplication to achieve higher space savings. However,
some other temporary metadata files created during the deduplication operation are still placed inside
the volume. These temporary metadata files are deleted once the deduplication operation is complete.
These temporary metadata files can get locked in Snapshot copies if the Snapshot copies are created
during a deduplication operation. The metadata files remain locked until the Snapshot copies are
deleted.
During an upgrade from Data ONTAP 7.2 to 7.3, the fingerprint and change log files will be moved from
the flexible volume to the aggregate level during the next deduplication process following the upgrade.
During the deduplication process where the fingerprint and change log files are being moved from the
volume to the aggregate, the ―sis status‖ command will display the message ―Fingerprint is being
upgraded.‖
In Data ONTAP 7.3 and later, the deduplication metadata for a volume is located outside the volume, in
the aggregate. When you revert from Data ONTAP 7.3 to a pre-7.3 release, the deduplication metadata
is lost during the revert process. In order to obtain optimal space savings, use the sis start s
command to rebuild the deduplication metadata for all existing data. If this is not done, the existing data
in the volume will retain the space savings from deduplication run prior to the revert process; however,
any deduplication that occurs after the revert process will only apply to data that was created after the
revert process, and will not deduplicate against data that existed prior to the revert process. The sis
start s command can take a long time to complete, depending on the size of the logical data in the
volume, but during this time the system is available for all other operations. Before using the sis
start s command, make sure that the volume has sufficient free space to accommodate the addition
of the deduplication metadata to the volume. The deduplication metadata uses 1% to 6% of the logical
data size in the volume.
For the size of the overhead associated with the deduplication metadata files, see the section
―Deduplication Metadata Overhead.‖