Managing HP Serviceguard for Linux, Tenth Edition, September 2012

ManualsBrandsHP ManualsSoftwareHP Serviceguard for Linux License Kit

311

312

313

314

315

316

317

318

319

320

Another alternative is for the failure of one component to still allow bringing down the

other components cleanly. If a database SQL server fails, the database should still be

able to be brought down cleanly so that no database recovery is necessary.

The worse case is for a failure of one component to cause the entire system to fail. If one

component fails and all other components need to be restarted, the downtime will be

high.

Be Able to Monitor Applications

All components in a system, including applications, should be able to be monitored for

their health. A monitor might be as simple as a display command or as complicated as

a SQL query. There must be a way to ensure that the application is behaving correctly.

If the application fails and it is not detected automatically, it might take hours for a user

to determine the cause of the downtime and recover from it.

Minimizing Planned Downtime

Planned downtime (as opposed to unplanned downtime) is scheduled; examples include

backups, systems upgrades to new operating system revisions, or hardware replacements.

For planned downtime, application designers should consider:

• Reducing the time needed for application upgrades/patches.

Can an administrator install a new version of the application without scheduling

downtime? Can different revisions of an application operate within a system? Can

different revisions of a client and server operate within a system?

• Providing for online application reconfiguration.

Can the configuration information used by the application be changed without

bringing down the application?

• Documenting maintenance operations.

Does an operator know how to handle maintenance operations?

When discussing highly available systems, unplanned failures are often the main point

of discussion. However, if it takes 2 weeks to upgrade a system to a new revision of

software, there are bound to be a large number of complaints.

The following sections discuss ways of handling the different types of planned downtime.

Reducing Time Needed for Application Upgrades and Patches

Once a year or so, a new revision of an application is released. How long does it take

for the end-user to upgrade to this new revision? This answer is the amount of planned

downtime a user must take to upgrade their application. The following guidelines reduce

this time.

Provide for Rolling Upgrades

Provide for a “rolling upgrade” in a client/server environment. For a system with many

components, the typical scenario is to bring down the entire system, upgrade every node

318 Designing Highly Available Cluster Applications