Managing HP Serviceguard for Linux, Tenth Edition, September 2012

to the new version of the software, and then restart the application on all the affected
nodes. For large systems, this could result in a long downtime. An alternative is to provide
for a rolling upgrade. A rolling upgrade rolls out the new software in a phased approach
by upgrading only one component at a time. For example, the database server is
upgraded on Monday, causing a 15 minute downtime. Then on Tuesday, the application
server on two of the nodes is upgraded, which leaves the application servers on the
remaining nodes online and causes no downtime. On Wednesday, two more application
servers are upgraded, and so on. With this approach, you avoid the problem where
everything changes at once, plus you minimize long outages.
The trade-off is that the application software must operate with different revisions of the
software. In the above example, the database server might be at revision 5.0 while the
some of the application servers are at revision 4.0. The application must be designed
to handle this type of situation.
Do Not Change the Data Layout Between Releases
Migration of the data to a new format can be very time intensive. It also almost guarantees
that rolling upgrade will not be possible. For example, if a database is running on the
first node, ideally, the second node could be upgraded to the new revision of the
database. When that upgrade is completed, a brief downtime could be scheduled to
move the database server from the first node to the newly upgraded second node. The
database server would then be restarted, while the first node is idle and ready to be
upgraded itself. However, if the new database revision requires a different database
layout, the old data will not be readable by the newly updated database. The downtime
will be longer as the data is migrated to the new layout.
Providing Online Application Reconfiguration
Most applications have some sort of configuration information that is read when the
application is started. If to make a change to the configuration, the application must be
halted and a new configuration file read, downtime is incurred.
To avoid this downtime use configuration tools that interact with an application and make
dynamic changes online. The ideal solution is to have a configuration tool which interacts
with the application. Changes are made online with little or no interruption to the end-user.
This tool must be able to do everything online, such as expanding the size of the data,
adding new users to the system, adding new users to the application, etc. Every task that
an administrator needs to do to the application system can be made available online.
Documenting Maintenance Operations
Standard procedures are important. An application designer should make every effort
to make tasks common for both the highly available environment and the normal
environment. If an administrator is accustomed to bringing down the entire system after
a failure, he or she will continue to do so even if the application has been redesigned
to handle a single failure. It is important that application documentation discuss alternatives
with regards to high availability for typical maintenance operations.
Minimizing Planned Downtime 319