Managing HP Serviceguard A.11.20.20 for Linux, May 2013

8.7.2.1 Sample System Log Entries ................................................................................255
8.7.3 Reviewing Configuration Files ...................................................................................256
8.7.4 Reviewing the Package Control Script ........................................................................256
8.7.5 Using the cmquerycl and cmcheckconf Commands......................................................256
8.7.6 Reviewing the LAN Configuration .............................................................................257
8.8 Solving Problems ...........................................................................................................257
8.8.1 Name Resolution Problems.......................................................................................257
8.8.1.1 Networking and Security Configuration Errors......................................................257
8.8.2 Halting a Detached Package....................................................................................257
8.8.3 Cluster Re-formations Caused by Temporary Conditions...............................................258
8.8.4 Cluster Re-formations Caused by MEMBER_TIMEOUT Being Set too Low........................258
8.8.5 System Administration Errors ....................................................................................259
8.8.5.1 Package Control Script Hangs or Failures ...........................................................259
8.8.6 Package Movement Errors (Legacy Packages)..............................................................260
8.8.7 Node and Network Failures ....................................................................................261
8.8.8 Troubleshooting the Quorum Server...........................................................................261
8.8.8.1 Authorization File Problems...............................................................................261
8.8.8.2 Timeout Problems............................................................................................261
8.8.8.3 Messages.......................................................................................................262
8.8.9 Lock LUN Messages................................................................................................262
8.9 Troubleshooting serviceguard-xdc package........................................................................262
8.10 Troubleshooting Serviceguard Manager...........................................................................263
A Designing Highly Available Cluster Applications .......................................265
A.1 Automating Application Operation ...................................................................................265
A.1.1 Insulate Users from Outages .....................................................................................265
A.1.2 Define Application Startup and Shutdown ..................................................................266
A.2 Controlling the Speed of Application Failover ....................................................................266
A.2.1 Replicate Non-Data File Systems ...............................................................................266
A.2.2 Evaluate the Use of a Journaled Filesystem (JFS)..........................................................267
A.2.3 Minimize Data Loss ................................................................................................267
A.2.3.1 Minimize the Use and Amount of Memory-Based Data .........................................267
A.2.3.2 Keep Logs Small .............................................................................................267
A.2.3.3 Eliminate Need for Local Data .........................................................................267
A.2.4 Use Restartable Transactions ....................................................................................267
A.2.5 Use Checkpoints ....................................................................................................268
A.2.5.1 Balance Checkpoint Frequency with Performance ................................................268
A.2.6 Design for Multiple Servers .....................................................................................268
A.2.7 Design for Replicated Data Sites ..............................................................................269
A.3 Designing Applications to Run on Multiple Systems ............................................................269
A.3.1 Avoid Node Specific Information ..............................................................................269
A.3.1.1 Obtain Enough IP Addresses .............................................................................270
A.3.1.2 Allow Multiple Instances on Same System ...........................................................270
A.3.2 Avoid Using SPU IDs or MAC Addresses ...................................................................270
A.3.3 Assign Unique Names to Applications ......................................................................270
A.3.3.1 Use DNS .......................................................................................................270
A.3.4 Use uname(2) With Care ........................................................................................271
A.3.5 Bind to a Fixed Port ................................................................................................271
A.3.6 Bind to Relocatable IP Addresses .............................................................................271
A.3.6.1 Call bind() before connect() ..............................................................................272
A.3.7 Give Each Application its Own Volume Group ...........................................................272
A.3.8 Use Multiple Destinations for SNA Applications .........................................................272
A.3.9 Avoid File Locking ..................................................................................................272
A.4 Restoring Client Connections ...........................................................................................272
12 Contents