Managing the System Registry Hive on Windows Server 2003 and Windows Server 2008 Integrity Systems

Causes of Increasing System Hive Size
When Windows Server 2003 for Itanium-based Systems was first released, the System hive limit
was set to 32 MB by design. This was not changed with the release of Windows Server 2008 for
Itanium-based Systems (although it was increased in Windows Server 2008 R2 for Itanium-based
Systems). At the inception of Windows Server 2003, Microsoft also introduced a new technology
called Multipath I/O (MPIO). This was Microsoft’s strategy for disk multipathing, and included
software that vendors could plug into. As multipathing in storage area networks (SANs) became
more pervasive, the information to be managed (about each disk and path) grew in the System
hive. Up until March 2009, the MPIO framework allowed a maximum of 8 paths per disk. So if
there were 100 disks, then information about 800 different paths had to be stored in each
ControlSet. With the March release of the MPIO framework (Microsoft internal version 1.22,
which the major manufacturers have built their modules on), the maximum number of paths
per disk is now 32, thus quadrupling the space required in each ControlSet for disk information.
Compounding the problem, a popular application often used on Itanium-based scale-up systems
is Symantec’s Veritas Storage Foundation for Windows (SFW). This application is generally used
when large quantities of disks need to be managed. For each disk managed by SFW, an additional
entry is added to the ControlSet.
Another reason for increased System hive size is when there are more than two ControlSets. This
is rare. It only happens when the primary ControlSet failed to load and a “LastKnownGood”
configuration had to be used. When it occurs, the System hive Select key will have a value other
than “0” (zero) in its Failed subkey (refer back to Figure 2). The size of a System hive containing
a failed ControlSet is increased by approximately a third. This can be advantageous however,
since the first two contributing causes listed above tend to increment slowly over time, making
it difficult to determine the cause, whereas a ControlSet failure is much more sudden and easier
to root-cause.
There are other secondary causes for increased System hive size, but they are only manifestations
of the primary causes listed above. Noteworthy of these would be a SAN administrator
inadvertently adding a number of disks to a system incorrectly, and then unpresenting them. In
this situation, Windows maintains a record of these entries since it cannot determine if they are
stale or transient (for example, transient disks would be shared cluster disks), and then adds
them to the ControlSet, never to be removed. A similar situation can occur when a SAN device’s
firmware changes its identity string. Typically the firmware revision, or an incantation of it, is
appended to the Hardware ID subkey. If there are a large number of disks, a firmware update
can be the catalyst that causes a breach of the 32 MB System hive limit.
Breaching the System Hive Limit
As already noted, the System hive limit is hard-coded to 32 MB. This limit cannot be breached.
The file is mapped into system memory with that restriction. This means that whenever a large
addition is made to the system registry, one that takes it to the 32 MB limit, any “overflow” is
lost. In this scenario the registry reaches a point where nothing new can be added, so applications
expecting to add keys cannot function and fail.
Because it is hard-coded, there is no possible way to bypass this system limitation without a
radical redesign of the operating system. In sections, “System Recovery” (page 11), and “Proactive
Avoidance” (page 20), are discussions on how to recover should the server ever breach the limit,
and how to avoid reaching the limit proactively.
Differences in Windows Server 2003, Windows Server 2008, and Windows Server
2008 R2
Windows Server 2003 and Windows Server 2008 for Itanium-based Systems have the same 32
MB limitation for all Service Pack versions. Both operating systems typically collect the same
data in the System hive and are therefore subject to the same root causes of increased hive size.
Windows Server 2008 does have a specific hotfix to alleviate the symptoms, which is described
10