Managing HP Serviceguard A.11.20.10 for Linux, December 2012

8.8.2 Halting a Detached Package....................................................................................249
8.8.3 Cluster Re-formations Caused by Temporary Conditions...............................................250
8.8.4 Cluster Re-formations Caused by MEMBER_TIMEOUT Being Set too Low........................250
8.8.5 System Administration Errors ....................................................................................251
8.8.5.1 Package Control Script Hangs or Failures ...........................................................251
8.8.6 Package Movement Errors (Legacy Packages)..............................................................252
8.8.7 Node and Network Failures ....................................................................................253
8.8.8 Troubleshooting the Quorum Server...........................................................................253
8.8.8.1 Authorization File Problems...............................................................................253
8.8.8.2 Timeout Problems............................................................................................253
8.8.8.3 Messages.......................................................................................................254
8.8.9 Lock LUN Messages................................................................................................254
8.9 Troubleshooting serviceguard-xdc package........................................................................254
8.10 Troubleshooting Serviceguard Manager...........................................................................255
A Designing Highly Available Cluster Applications .......................................257
A.1 Automating Application Operation ...................................................................................257
A.1.1 Insulate Users from Outages .....................................................................................257
A.1.2 Define Application Startup and Shutdown ..................................................................258
A.2 Controlling the Speed of Application Failover ....................................................................258
A.2.1 Replicate Non-Data File Systems ...............................................................................258
A.2.2 Evaluate the Use of a Journaled Filesystem (JFS)..........................................................259
A.2.3 Minimize Data Loss ................................................................................................259
A.2.3.1 Minimize the Use and Amount of Memory-Based Data .........................................259
A.2.3.2 Keep Logs Small .............................................................................................259
A.2.3.3 Eliminate Need for Local Data .........................................................................259
A.2.4 Use Restartable Transactions ....................................................................................259
A.2.5 Use Checkpoints ....................................................................................................260
A.2.5.1 Balance Checkpoint Frequency with Performance ................................................260
A.2.6 Design for Multiple Servers .....................................................................................260
A.2.7 Design for Replicated Data Sites ..............................................................................261
A.3 Designing Applications to Run on Multiple Systems ............................................................261
A.3.1 Avoid Node Specific Information ..............................................................................261
A.3.1.1 Obtain Enough IP Addresses .............................................................................262
A.3.1.2 Allow Multiple Instances on Same System ...........................................................262
A.3.2 Avoid Using SPU IDs or MAC Addresses ...................................................................262
A.3.3 Assign Unique Names to Applications ......................................................................262
A.3.3.1 Use DNS .......................................................................................................262
A.3.4 Use uname(2) With Care ........................................................................................263
A.3.5 Bind to a Fixed Port ................................................................................................263
A.3.6 Bind to Relocatable IP Addresses .............................................................................263
A.3.6.1 Call bind() before connect() ..............................................................................264
A.3.7 Give Each Application its Own Volume Group ...........................................................264
A.3.8 Use Multiple Destinations for SNA Applications .........................................................264
A.3.9 Avoid File Locking ..................................................................................................264
A.4 Restoring Client Connections ...........................................................................................264
A.5 Handling Application Failures .........................................................................................265
A.5.1 Create Applications to be Failure Tolerant ..................................................................265
A.5.2 Be Able to Monitor Applications ..............................................................................266
A.6 Minimizing Planned Downtime ........................................................................................266
A.6.1 Reducing Time Needed for Application Upgrades and Patches .....................................266
A.6.1.1 Provide for Rolling Upgrades .............................................................................266
A.6.1.2 Do Not Change the Data Layout Between Releases ..............................................267
A.6.2 Providing Online Application Reconfiguration ............................................................267
12 Contents