Understanding and Designing Serviceguard Disaster Recovery Architectures

1 Disaster Recovery in a Serviceguard Cluster
Evaluating the Need for Disaster Recovery
Disaster Recovery is the ability to restore applications and data within a reasonable period of time
after a disaster. Most think of fire, flood, and earthquake as disasters, but a disaster can be any
event that unexpectedly interrupts service or corrupts data in an entire data center: the backhoe
that digs too deep and severs a network connection, or an act of sabotage.
Disaster recovery architectures protect against unplanned down time due to disasters, these
architectures geographically distribute the nodes in a cluster so that a disaster at one location does
not disable the entire cluster. A node can be a host system or server that is configured to be a
member of a Serviceguard cluster. To evaluate your need for a disaster recovery solution, you
need to weigh the following:
Risk of disaster — Areas prone to tornadoes, floods, or earthquakes may require a disaster
recovery solution. Some industries need to consider risks other than natural disasters or
accidents, such as terrorist activity or sabotage.
The type of disaster to which your business is prone, whether it is due to geographical location
or the nature of the business, will determine the type of disaster recovery you choose. For
example, if you live in a region prone to big earthquakes, you are not likely to put your
alternate or backup nodes in the same city as your primary nodes, because that sort of disaster
affects a large area.
The frequency of the disaster also plays an important role in determining the need to invest
in a rapid disaster recovery solution. For example, it is more important to protect from hurricanes
that happen every season, rather than protecting from a dormant volcano.
Vulnerability of the business — How long can your business afford to be down? Some parts
of a business may be able to endure a recovery time of one or two days, while others need
to recover within few minutes. Some parts of a business only need local protection from single
outages, such as node failure. Other parts of a business might need both local protection and
protection in case of data center failure.
It is important to consider the role of a data servers in your business. For example, you might
target the assembly line production servers as most in need of quick recovery. But if the most
likely disaster in your area is an earthquake, it might cause the assembly line and the computers
to be inoperable . In this case disaster recovery is a matter of concern, and local failover is
not probably the more appropriate level of protection.
On the other hand, you may have an order processing center that is prone to floods in winter.
The business loses thousands of dollars a minute when the order processing servers are down.
A disaster recovery architecture is an appropriate protection in this situation.
The decision to implement a disaster recovery solution depends on the balance between risk of
disaster, and the vulnerability of your business, if a disaster occurs. The following sections provide
give a high-level view of a variety of disaster recovery solutions and sketch the general guidelines
that you must follow in developing a disaster recovery computing environment.
What is a Disaster Recovery Architecture?
In a Serviceguard cluster configuration, you can achieve high availability by using redundant
hardware to eliminate single points of failure. This protects the cluster against hardware faults,
such as the node failure in Figure 1.
6 Disaster Recovery in a Serviceguard Cluster