Managing HP Serviceguard for Linux Ninth Edition, April 2009

Provide for Rolling Upgrades
Provide for a “rolling upgrade” in a client/server environment. For a system with many
components, the typical scenario is to bring down the entire system, upgrade every
node to the new version of the software, and then restart the application on all the
affected nodes. For large systems, this could result in a long downtime. An alternative
is to provide for a rolling upgrade. A rolling upgrade rolls out the new software in a
phased approach by upgrading only one component at a time. For example, the database
server is upgraded on Monday, causing a 15 minute downtime. Then on Tuesday, the
application server on two of the nodes is upgraded, which leaves the application servers
on the remaining nodes online and causes no downtime. On Wednesday, two more
application servers are upgraded, and so on. With this approach, you avoid the problem
where everything changes at once, plus you minimize long outages.
The trade-off is that the application software must operate with different revisions of
the software. In the above example, the database server might be at revision 5.0 while
the some of the application servers are at revision 4.0. The application must be designed
to handle this type of situation.
Do Not Change the Data Layout Between Releases
Migration of the data to a new format can be very time intensive. It also almost
guarantees that rolling upgrade will not be possible. For example, if a database is
running on the first node, ideally, the second node could be upgraded to the new
revision of the database. When that upgrade is completed, a brief downtime could be
scheduled to move the database server from the first node to the newly upgraded
second node. The database server would then be restarted, while the first node is idle
and ready to be upgraded itself. However, if the new database revision requires a
different database layout, the old data will not be readable by the newly updated
database. The downtime will be longer as the data is migrated to the new layout.
Providing Online Application Reconfiguration
Most applications have some sort of configuration information that is read when the
application is started. If to make a change to the configuration, the application must
be halted and a new configuration file read, downtime is incurred.
To avoid this downtime use configuration tools that interact with an application and
make dynamic changes online. The ideal solution is to have a configuration tool which
interacts with the application. Changes are made online with little or no interruption
to the end-user. This tool must be able to do everything online, such as expanding the
size of the data, adding new users to the system, adding new users to the application,
etc. Every task that an administrator needs to do to the application system can be made
available online.
Documenting Maintenance Operations
Standard procedures are important. An application designer should make every effort
to make tasks common for both the highly available environment and the normal
environment. If an administrator is accustomed to bringing down the entire system
302 Designing Highly Available Cluster Applications