Managing HP Serviceguard for Linux, Eighth Edition, March 2008

Designing Highly Available Cluster Applications
Minimizing Planned Downtime
Appendix B358
Do Not Change the Data Layout Between Releases
Migration of the data to a new format can be very time intensive. It also
almost guarantees that rolling upgrade will not be possible. For example,
if a database is running on the first node, ideally, the second node could
be upgraded to the new revision of the database. When that upgrade is
completed, a brief downtime could be scheduled to move the database
server from the first node to the newly upgraded second node. The
database server would then be restarted, while the first node is idle and
ready to be upgraded itself. However, if the new database revision
requires a different database layout, the old data will not be readable by
the newly updated database. The downtime will be longer as the data is
migrated to the new layout.
Providing Online Application Reconfiguration
Most applications have some sort of configuration information that is
read when the application is started. If to make a change to the
configuration, the application must be halted and a new configuration
file read, downtime is incurred.
To avoid this downtime use configuration tools that interact with an
application and make dynamic changes online. The ideal solution is to
have a configuration tool which interacts with the application. Changes
are made online with little or no interruption to the end-user. This tool
must be able to do everything online, such as expanding the size of the
data, adding new users to the system, adding new users to the
application, etc. Every task that an administrator needs to do to the
application system can be made available online.
Documenting Maintenance Operations
Standard procedures are important. An application designer should
make every effort to make tasks common for both the highly available
environment and the normal environment. If an administrator is
accustomed to bringing down the entire system after a failure, he or she
will continue to do so even if the application has been redesigned to
handle a single failure. It is important that application documentation
discuss alternatives with regards to high availability for typical
maintenance operations.