Specifications

ManualsBrandsVMware ManualsHome building and DecorNetApp FAS 2050HA

NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide

The more concurrent deduplication processes you’re running, the more system resources are

consumed.

Given the previous two items, the best option is to do one of the following:

Use the auto mode so that deduplication runs only when significant additional data has been written to

each particular flexible volume (this tends to naturally spread out when deduplication runs).

Stagger the deduplication schedule for the flexible volumes so that it runs on alternative days.

Run deduplication manually.

If Snapshot copies are required, run deduplication before creating the Snapshot copy to minimize the

amount of data before the data gets locked in to the copies. (Make sure that deduplication has

completed before creating the copy.) If a Snapshot copy is created on a flexible volume before

deduplication has a chance to complete on that flexible volume, this could result in lower space savings.

If Snapshot copies are to be used, the Snapshot reserve should be greater than 0. An exception to this

could be in a volume that contains LUNs, where snap reserve might be set to zero for thin provisioning

reasons, and additional free space should be available in the volume to contain Snapshot copies.

For deduplication to run properly, you need to leave some free space for the deduplication metadata.

For information about how much extra space to leave in the volume and in the aggregate, see section

―Deduplication Metadata Overhead.‖

3.2 DEDUPLICATION PERFORMANCE

This section discusses the performance aspects of deduplication.

Since deduplication is a part of Data ONTAP, it is tightly integrated with the WAFL

file structure.

Because of this, deduplication is performed with high efficiency. It is able to leverage the internal

characteristics of Data ONTAP to create and compare digital fingerprints, redirect data pointers, and

free up redundant data areas.

However, the following factors can affect the performance of the deduplication process and the I/O

performance of deduplicated volumes.

The application and the type of data set being used

The data access pattern (for example, sequential vs. random access, the size and pattern of the I/O)

The amount of duplicate data, the amount of total data, and the average file size

The nature of the data layout in the volume

The amount of changed data between deduplication runs

The number of concurrent deduplication sessions

Hardware platform—the amount of CPU/memory in the system

Amount of load on the system

Disk types ATA/FC, and the RPM of the disk

Number of disk spindles in the aggregate

Because of these factors, NetApp recommends that the performance impact due to deduplication be

carefully considered and measured in a test setup and taken into sizing considerations

before deploying deduplication in performance-sensitive solutions.

THE PERFORMANCE OF THE DEDUPLICATION OPERATION

The performance of the deduplication operation itself varies widely depending on the factors listed above,

and this determines how long it takes this background process to finish running.

On a FAS6080 with no other load on the system, we have seen deduplication performances of up to 120

MBytes/sec (running a single deduplication session). If multiple deduplication streams are running, this total

bandwidth gets divided evenly into the number of streams.

To get an idea of how long it takes for a deduplication process to complete, let’s say that the deduplication

process is running on a flexible volume at 25MB/sec. If 1TB of new data has been added to the volume

since the last deduplication update, this deduplication operation takes about 10 to 12 hours to complete.