Specifications

NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide
17
The storage savings may continue to stay low.
When the last Snapshot copy that was created before deduplication was run is deleted, the storage
savings should increase noticeably.
The question thus becomes when to run deduplication again in order to achieve maximum capacity savings.
The answer is that deduplication should be run, and allowed to complete, before the creation of each and
every Snapshot copy; this provides the most storage savings benefit. However, depending on the flexible
volume size and possible performance impact on the system, this may not always be advisable.
DEDUPLICATION METADATA OVERHEAD
This section discusses storage overhead that deduplication introduces. While deduplication can provide
substantial storage savings in many environments, there is a small amount of storage overhead associated
with it. This should be considered when sizing the flexible volume.
The total storage used by the deduplication metadata files is approximately 1% to 6% of the total data in the
volume. Total data = used space + saved s pace, as reported when using df s (that is, the size of the
data before it is deduplicated). So for 1TB of total data, the metadata overhead would be approximately
10GB to 60GB. The breakdown of the overhead associated with the deduplication metadata is as follows:
There is a fingerprint record for every 4KB data block, and the fingerprint records for all of the data
blocks in the volume are stored in the fingerprint database file. There is an overhead of less than 2%
associated with this database file.
The size of the deduplication change log files depends on the rate of change of the data and on how
frequently deduplication is run. This accounts for less than 2% overhead in the volume.
Finally, when deduplication is running, it creates some temporary files that could account for up to 2% of
the size of the volume. These temporary metadata files are deleted when the deduplication process has
finished running.
In Data ONTAP 7.2.X, all of the above deduplication metadata files reside in the volume, and this metadata
is therefore captured and locked in the Snapshot copies of the volume as well.
Starting with Data ONTAP 7.3, part of the metadata still resides in the volume, and part of it resides in the
aggregate outside of the volume. The fingerprint database and the change log files are located outside of
the volume in the aggregate and are therefore not captured in Snapshot copies. This change enables
deduplication to achieve higher space savings. However, the other temporary metadata files created during
the deduplication operation are still placed inside the volume. These temporary metadata files are deleted
when the deduplication operation completes. However, if Snapshot copies are created during a
deduplication operation, these temporary metadata files can get locked in Snapshot copies, and they remain
there until the Snapshot copies are deleted.
The guideline for the amount of extra space that should be left in the aggregate or volume for the
deduplication metadata overhead is as follows:
If you’re running Data ONTAP 7.2.X, leave about 6% extra space inside the volume on which you plan
to run deduplication.
If you’re running Data ONTAP 7.3, leave about 2% extra space inside the volume on which you plan to
run deduplication, and around 4% extra space outside the volume in the aggregate, for each volume
running deduplication.
3.4 SPACE SAVINGS ESTIMATION TOOL (SSET)
The actual amount of data space reduction depends on the type of data. For this reason, the SSET should
be used to analyze the actual data set and determine the effectiveness of deduplication on that particular
data set.
When executed, the SSET crawls through all the files in the specified path and estimates the space savings
that will be achieved by deduplication. Although actual deduplication space savings may deviate from what
the estimation tool predicts, use and testing so far indicate that in general, the actual results are within +/
5% of the space savings that the tool predicts.