Managing Serviceguard A.11.20, March 2013
Minimizing Planned Downtime
Planned downtime (as opposed to unplanned downtime) is scheduled; examples include backups,
systems upgrades to new operating system revisions, or hardware replacements. For planned
downtime, application designers should consider:
• Reducing the time needed for application upgrades/patches.
Can an administrator install a new version of the application without scheduling downtime?
Can different revisions of an application operate within a system? Can different revisions of
a client and server operate within a system?
• Providing for online application reconfiguration.
Can the configuration information used by the application be changed without bringing down
the application?
• Documenting maintenance operations.
Does an operator know how to handle maintenance operations?
When discussing highly available systems, unplanned failures are often the main point of discussion.
However, if it takes 2 weeks to upgrade a system to a new revision of software, there are bound
to be a large number of complaints.
The following sections discuss ways of handling the different types of planned downtime.
Reducing Time Needed for Application Upgrades and Patches
Once a year or so, a new revision of an application is released. How long does it take for the
end-user to upgrade to this new revision? This answer is the amount of planned downtime a user
must take to upgrade their application. The following guidelines reduce this time.
Provide for Rolling Upgrades
Provide for a “rolling upgrade” in a client/server environment. For a system with many components,
the typical scenario is to bring down the entire system, upgrade every node to the new version of
the software, and then restart the application on all the affected nodes. For large systems, this
could result in a long downtime.
An alternative is to provide for a rolling upgrade. A rolling upgrade rolls out the new software in
a phased approach by upgrading only one component at a time. For example, the database server
is upgraded on Monday, causing a 15 minute downtime. Then on Tuesday, the application server
on two of the nodes is upgraded, which leaves the application servers on the remaining nodes
online and causes no downtime. On Wednesday, two more application servers are upgraded,
and so on. With this approach, you avoid the problem where everything changes at once, plus
you minimize long outages.
The trade-off is that the application software must operate with different revisions of the software.
In the above example, the database server might be at revision 5.0 while the some of the application
servers are at revision 4.0. The application must be designed to handle this type of situation.
For more information about the rolling upgrades, see “Software Upgrades ” (page 363), and the
Release Notes for your version of Serviceguard at http://www.hp.com/go/hpux-serviceguard-docs.
Do Not Change the Data Layout Between Releases
Migration of the data to a new format can be very time intensive. It also almost guarantees that
rolling upgrade will not be possible. For example, if a database is running on the first node, ideally,
the second node could be upgraded to the new revision of the database. When that upgrade is
completed, a brief downtime could be scheduled to move the database server from the first node
to the newly upgraded second node. The database server would then be restarted, while the first
node is idle and ready to be upgraded itself. However, if the new database revision requires a
different database layout, the old data will not be readable by the newly updated database. The
downtime will be longer as the data is migrated to the new layout.
358 Designing Highly Available Cluster Applications










