HP IAP Version 2.0 Administrator Guide (July 2008)

the converter is run, the master congu ration le is copied to the PCC server, a nd renamed to
YYYY-MM-DD-hh.mm.ss_FileName. It is also backed up to tape, if application backup is enabled.
To restore this d irectory from the tape backup, run the following command on the PCC server:
/usr/local/tsmBackup/rotateMasterConfigBackup –restore
Duplicate Manager
The Duplicate Manager view allows the a dministrator to schedule duplicate merge jobs and view the
status of duplicate merge jobs.
This tool is me ant for customers who are storing with Single Instancing enabled. Under certain
circumstances the single instancing mechanism allows duplicate copies of the same email to b e stored.
Storing duplicate emails uses unne cessary space on the smart cells and clutters search results by listing
the same email multiple times. Once the email le has been stored there is additional data attached
to the le ca lle d meta da ta. O ver the lifecycle of the email, the meta data will change as user access
permissions and folder information change. The purpose of the Duplicate Manager tool is to eliminate
duplicate emails while retaining all associated meta data. To that end, the Duplicate Manager will take
the union of all meta data and attach it to a single copy of the email and remove all redundant copies.
Single instancing is the ability to store a single copy of an email that exists in multiple user repositories.
Users can access the email through a reference pointer. Em a ils are uniquely identied by a cryptographic
checksum called a hash. If a client attempts to store the same email twice, as determined by the h ash,
only one copy is physically stored and the same reference URI is returned for both store requests. In the
event that the IAP cannot determine if an email has already been stored, it will store another copy. This
can lead to multiple copies of the same email being stored.
If a user previously ran a successful query that returned duplicate emails and saved the query results,
after running this tool all b ut one of the duplicate emails should be accessible. If a user has quarantined
the les in the save d query result set, after running this tool, the single remaining copy of the email
will remain quarantined.
Smart cells have a storage threshold that is lower than the total capacity of the hard disk. This is because
the services that run on the smart cells require a percentage of the disk space in order to operate. Once
the smar t c ell reaches this threshold it will transition into a closed state. B ecause duplicate emails may be
stored across smart cell groups, there is a possibility that the m erge job will n ot be able to work o n a set
of duplicates because the smart cells have exceeded the storage threshold and are no longer able to
accept additional meta data. These duplicates cannot be merged until additional space has been made
available on the closed smart cell by either migrating some of the emails to a new group or waiting for
retention to delete emails and free additional storage space.
The amount of time for a merge job to complete is dependent on the total number of les store d, the
percentage of those les that are duplicates, and the average number of duplicates stored per hash. The
rst merge job may take many days to complete, while subsequent merge jobs m ay complete much faster.
If no errors occur during the duplicate merge, duplicates are signicantly reduced in the system. However,
there is no guarantee that all duplicates are removed. As mentioned above some emails may not have an
associated hash. Such emails cannot be merged into a single copy. The duplicate m erge procedure
may also miss a few duplicate emails on the rst pass s ince duplicate merge speed was traded for an
exhaustive duplicate elimination. Duplicate merge does not attempt to merge all emails in a single pass.
It merges all emails in the system in manageable batches according to d efault or user congurable batch
size. As a result of this batching, the unmerged duplicate emails often occur for emails at the beginning or
end of these batches. A subsequent duplicate merge pass will remove these remaining duplicate emails.
Tabl e 44 Link to Duplicate Manager view
Origin
Link
left menu
Data Management > Duplicate Manager
72
Data management