Parallel Programming Guide for HP-UX Systems

Troubleshooting
Triangular loops
Chapter 9 199
You can replace divides with shift instructions because they involve
powers of two.
If this application were to be run on an V2250 single-node machine,
it would be appropriate to choose a chunk size of 8 for 4-byte data.