3.3

Table Of Contents
6 Disconnects
88
Causes of Disconnects
Servers are normally in constant communication with each other. When a story is saved, the
server tries to mirror that change across to the other server’s database. If the server cannot
contact the other server for a period of 30 seconds, it assumes the worst—that the other
server has died and is not available and that as the surviving server it must be responsible for
the entire system.
Knowing this design, it is obvious that network outages will cause a disconnect, as will the
loss of power by one server.
A “dirty” network leading to numerous network output errors (called RX-ERRs, as revealed
by the netstat -i command) can cause a disconnect, particularly if the output errors are
rapidly climbing.
A software error that leads to a looping condition that causes a server to become so busy it
cannot respond to a mirroring request could also theoretically lead to a disconnect.
Hardware failures such as the failure of a network card or hard drive may also lead to
disconnects.
Disconnect Recovery
This section provides an overview of recovering your system from a disconnect, recovery
procedures, and a quick reference worksheet you can use should a disconnect occur.
Overview
After a system has disconnected, one server must be selected to continue on as the master
computer. This server will be referred to as the survivor. The other server will be referred to
as the failed server.
Before the failed server can be reconnected to the survivor, it must be rebooted and its
database wiped clean. After the database on the failed server has been cleared, the server can
be reconnected to the survivor and the master database copied back across from the survivor.
Because one server’s database will be selected as the master database and the other’s
database erased, discovering a disconnect as soon as possible minimizes the possibility of
data loss.