Specifications

ManualsBrandsVMware ManualsHome building and DecorNetApp FAS 2050HA

NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide

Releasing a duplicate data block entails updating the indirect inode pointing to it, incrementing the block

reference count for the already existing data block, and freeing the duplicate data block.

In real time, as additional data is written to the deduplicated volume, a fingerprint is created for each new

block and written to a change log file. When deduplication is run subsequently, the change log is sorted and

its sorted fingerprints are merged with those in the fingerprint file, and then the deduplication processing

occurs.

Note that there are really two change log files, so that as deduplication is running and merging the new

blocks from one change log file into the fingerprint file, new data that is being written to the flexible volume is

causing fingerprints for these new blocks to be written to the second change log file. The roles of the two

files are then reversed the next time that deduplication is run. (For those familiar with Data ONTAP usage of

NVRAM, this is analogous to when it switches from one half to the other to create a consistency point.)

Note: When deduplication is run for the first time on an empty flexible volume, it still creates the fingerprint

file from the change log.

Here are some additional details about the deduplication metadata:

There is a fingerprint record for every 4KB data block, and the fingerprints for all the data blocks in the

volume are stored in the fingerprint database file.

Fingerprints are not deleted from the fingerprint file automatically when data blocks are freed, but when

a threshold of 20% new fingerprints is reached, the stale fingerprints are deleted. This can also be done

by a manual operation from the command line.

In Data ONTAP 7.2.X, all the deduplication metadata resides in the flexible volume.

Starting with Data ONTAP 7.3.0, part of the metadata resides in the volume and part of it resides in the

aggregate outside the volume. The fingerprint database and the change log files that are used in the

deduplication process are located outside of the volume in the aggregate and are therefore not captured

in Snapshot copies. This change enables deduplication to achieve higher space savings. However,

some other temporary metadata files created during the deduplication operation are still placed inside

the volume. These temporary metadata files are deleted once the deduplication operation is complete.

These temporary metadata files can get locked in Snapshot copies if the Snapshot copies are created

during a deduplication operation. The metadata files remain locked until the Snapshot copies are

deleted.

During an upgrade from Data ONTAP 7.2 to 7.3, the fingerprint and change log files will be moved from

the flexible volume to the aggregate level during the next deduplication process following the upgrade.

During the deduplication process where the fingerprint and change log files are being moved from the

volume to the aggregate, the ―sis status‖ command will display the message ―Fingerprint is being

upgraded.‖

In Data ONTAP 7.3 and later, the deduplication metadata for a volume is located outside the volume, in

the aggregate. When you revert from Data ONTAP 7.3 to a pre-7.3 release, the deduplication metadata

is lost during the revert process. In order to obtain optimal space savings, use the sis start –s

command to rebuild the deduplication metadata for all existing data. If this is not done, the existing data

in the volume will retain the space savings from deduplication run prior to the revert process; however,

any deduplication that occurs after the revert process will only apply to data that was created after the

revert process, and will not deduplicate against data that existed prior to the revert process. The sis

start –s command can take a long time to complete, depending on the size of the logical data in the

volume, but during this time the system is available for all other operations. Before using the sis

start –s command, make sure that the volume has sufficient free space to accommodate the addition

of the deduplication metadata to the volume. The deduplication metadata uses 1% to 6% of the logical

data size in the volume.

For the size of the overhead associated with the deduplication metadata files, see the section

―Deduplication Metadata Overhead.‖