User's Manual

DAT 72/DDS-4 Product Manual

DCLZ Algorithm

Within the computer industry, algorithms developed by Abraham Lempel and Jacob

Ziv (enhanced later by Terry Welch) are popular, versatile and powerful compression

methods. These LZ algorithms are basically of two types—LZ1, a sliding window

method, and LZ2/LZW, a hashed directory method.

LZ2

and

LZW

(Lempel-Ziv-Welch) are algorithms based on the

hashed dictionary

method; these algorithms offer an acceptable compromise between speed and

compression ratio. This type of algorithm builds a symbol dictionary to represent

strings as the data is processed and then looks up matching patterns in the

dictionary. By monitoring the compression ratio in this type of algorithm, a new

dictionary can be started when the ratio drops, indicating a change in the data type.

This type of algorithm is responsive to changing data patterns while maintaining

acceptable speed.

Although dependent on the particular implementation, the LZ2/LZW type of algorithm

is generally faster than the LZ1 type because the dictionary structure promotes

efficient searching.

The DCLZ algorithm used in the DDS tape drive is based on the LZ2/LZW algorithm

type described earlier in this chapter. This algorithm has been approved by the US

ANSI standards group and the European ECMA standards group. Both the DDS

Manufacturers Group and QIC tape industry-standards committees accept DCLZ as

an approved standard. Within the DDS Manufacturers Group, DCLZ is the only

approved standard, ensuring complete interchange across all DDS drives and media.

Simplified Compression Operation

The following steps describe a simplified version of operation of the algorithm for

compressing data:

From the current position in the input data stream, the algorithm fetches bytes

(characters) until a string is formed that does not have a matching entry in the

dictionary.

The codeword for the longest string that has an entry in the dictionary (all bytes

except the last) is output.

A dictionary entry for the string formed in step 1 is created.

The current position is moved to the last byte of that string.

Steps 1 through 4 are repeated until the input data stream is completely

processed.