Specifications

NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide
14
The more concurrent deduplication processes you’re running, the more system resources are
consumed.
Given the previous two items, the best option is to do one of the following:
Use the auto mode so that deduplication runs only when significant additional data has been written to
each particular flexible volume (this tends to naturally spread out when deduplication runs).
Stagger the deduplication schedule for the flexible volumes so that it runs on alternative days.
Run deduplication manually.
If Snapshot copies are required, run deduplication before creating the Snapshot copy to minimize the
amount of data before the data gets locked in to the copies. (Make sure that deduplication has
completed before creating the copy.) If a Snapshot copy is created on a flexible volume before
deduplication has a chance to complete on that flexible volume, this could result in lower space savings.
If Snapshot copies are to be used, the Snapshot reserve should be greater than 0. An exception to this
could be in a volume that contains LUNs, where snap reserve might be set to zero for thin provisioning
reasons, and additional free space should be available in the volume to contain Snapshot copies.
For deduplication to run properly, you need to leave some free space for the deduplication metadata.
For information about how much extra space to leave in the volume and in the aggregate, see section
―Deduplication Metadata Overhead.‖
3.2 DEDUPLICATION PERFORMANCE
This section discusses the performance aspects of deduplication.
Since deduplication is a part of Data ONTAP, it is tightly integrated with the WAFL
®
file structure.
Because of this, deduplication is performed with high efficiency. It is able to leverage the internal
characteristics of Data ONTAP to create and compare digital fingerprints, redirect data pointers, and
free up redundant data areas.
However, the following factors can affect the performance of the deduplication process and the I/O
performance of deduplicated volumes.
The application and the type of data set being used
The data access pattern (for example, sequential vs. random access, the size and pattern of the I/O)
The amount of duplicate data, the amount of total data, and the average file size
The nature of the data layout in the volume
The amount of changed data between deduplication runs
The number of concurrent deduplication sessions
Hardware platformthe amount of CPU/memory in the system
Amount of load on the system
Disk types ATA/FC, and the RPM of the disk
Number of disk spindles in the aggregate
Because of these factors, NetApp recommends that the performance impact due to deduplication be
carefully considered and measured in a test setup and taken into sizing considerations
before deploying deduplication in performance-sensitive solutions.
THE PERFORMANCE OF THE DEDUPLICATION OPERATION
The performance of the deduplication operation itself varies widely depending on the factors listed above,
and this determines how long it takes this background process to finish running.
On a FAS6080 with no other load on the system, we have seen deduplication performances of up to 120
MBytes/sec (running a single deduplication session). If multiple deduplication streams are running, this total
bandwidth gets divided evenly into the number of streams.
To get an idea of how long it takes for a deduplication process to complete, let’s say that the deduplication
process is running on a flexible volume at 25MB/sec. If 1TB of new data has been added to the volume
since the last deduplication update, this deduplication operation takes about 10 to 12 hours to complete.