Parallel Programming Guide for HP-UX Systems Eighth Edition Manufacturing Part Number: B3909-90031 September 2007
Print History Eighth Document Number Seventh Document Number B3909-90019 Released December 2004; document updates Sixth Document Number B3909-90015 Released September 2003; document updates Fifth Document Number B3909-90011 Released June 2003; document updates Fourth Document Number B3909-90008 Released September 2001; document updates Third Document Number B3909-90006 Released June 2001; document updates Second Document Number B3909-90003 Released March 2000; document updates.
First Document Number B3909-90001 Released October 1998.
Legal Notices Copyright 2007 Hewlett-Packard Development Company, L.P. All Rights Reserved. Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. he information contained in this document is subject to change without notice.
Contents 1. Introduction to parallel environments Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Non-parallel components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Parallel components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 UPC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents Types of applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Runtime environment variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Runtime utility commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . HyperFabric/HyperMessaging Protocol (HMP) . . . . . . . . . . . . . . . . . . . . . . . . . . . Communicating using daemons . . . . . . . . . . . . . . . . . . .
Contents VECLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 LAPACK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 ScaLAPACK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4. OpenMP HP’s implementation of OpenMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents Debugging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 Tuning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 6. Data privatization Directives and pragmas for data privatization . . . . . . . . . . . . . . .
Contents Synchronization functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Allocation functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deallocation functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Locking functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unlocking functions . . . . . . . .
Contents Invalid subscripts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Misused directives and pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Loop-carried dependences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reductions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Tables Table 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi Table 2-1. Default compilers for HP-UX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Table 2-2. Default compilers for Linux Itanium2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Table 2-3. Default compilers for Linux IA-32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Table 2-4.
Tables xii
Preface The final version of this document will be available on www.docs.hp.com when the product ships. This guide describes efficient methods for shared-memory programming using the following HP-UX compilers: HP Fortran, HP aC++ (ANSI C++), and HP C.
In addition, this guide is intended for use by experienced Fortran, C, and C++ programmers. It describes the enhanced features of HP-UX 11i compilers on single-node multiprocessor HP technical servers. You need not be familiar with the HP parallel architecture, programming models, or optimization concepts to understand the concepts introduced in this book.
Scope This guide covers parallel programming methods for the HP Fortran, aC++, and C compilers on machines running: • HP-UX 11i v1 and higher on HP 9000 systems • HP-UX 11i v2 and higher on Integrity systems The HP compilers now support an extensive shared-memory programming model. HP-UX 11i v1 and higher includes the required assembler, linker, and libraries. This guide describes how to produce programs that efficiently exploit the features of HP parallel architecture concepts and the HP compiler set.
Notational conventions This section discusses notational conventions used in this book. Table 1 bold monospace In command examples, bold monospace identifies input that must be typed exactly as shown. monospace In paragraph text, monospace identifies command names, system calls, and data structures and types. In command examples, monospace identifies command output, including error messages. italic In paragraph text, italic identifies titles of documents.
Table 1 (Continued) Vertical ellipses Vertical ellipses show that lines of code have been left out of an example. Keycap Keycap indicates the keyboard keys you must press to execute the command example. The directives and pragmas described in this book can be used with the HP Fortran and C compilers, unless otherwise noted. The aC++ compiler does not support the pragmas, but does support the memory classes.
Associated documents The following documents are listed as additional resources to help you use the compilers and associated tools: • HP Caliper User’s Guide • HP Fortran Programmer’s Reference — Provides language reference for HP Fortran and describes the language features and requirements. • HP Fortran Programmer’s Guide—Provides extensive usage information (including how to compile and link), suggestions and tools for migrating to HP Fortran, and how to call C and HP-UX routines for HP Fortran 90.
1 Introduction to parallel environments HP compilers generate efficient parallel code with little user intervention. However, you can increase this efficiency by using the techniques discussed in this book.
Introduction to parallel environments • Overview • Individual and clustered workstations/servers • HP SMP architectures • Parallel programming model 2 Chapter 1
Introduction to parallel environments Overview Overview Using one of the HP language compilers—Fortran, C/C++, or UPC (Unified Parallel C)— parallel software programs are designed, created, and modified. Non-parallel components Serial optimizations of the non-parallel components of a program are performed with the use of Fortran, C/C++, or UPC compiler switches. Additional optimization can occur with the use of HP MLIB.
Introduction to parallel environments Overview HP WDB Debugger The HP WDB debugger is an HP-supported implementation of the GDB debugger that supports debugging serial and Pthreaded programs. It supports source-level debugging of object files written in HP C, HP aC++, Fortran 90, and Fortran 77. HP’s implementation includes many enhancements to GDB such as enabling/disabling threads, debugging memory problems, and heap analysis. For more information, visit http://www.hp.
Introduction to parallel environments Individual and clustered workstations/servers Individual and clustered workstations/servers Both individual and clustered workstations/servers are suitable for running high performance, technical computing programs. These workstations or servers can have one or more CPUs. In general, workstations are not configured with more than two CPUs. Conversely, servers are often configured for up to 64 CPUs.
Introduction to parallel environments Programming Methods Programming Methods High performance programming methods address systems of one or more processors that can be distributed within an SMP system or over the nodes of a cluster. These methods include: standard serial optimizations and library calls; auto-parallelization offered by some compilers, OpenMP directives, calls to the POSIX threads library, and calls to the message passing interface (MPI).
Introduction to parallel environments Parallel programming model Parallel programming model Parallel programming models provide perspectives from which you can write—or adapt—code to run on a high-end HP system. You can perform both shared-memory programming and message-passing programming on an SMP. The shared-memory paradigm In the shared-memory paradigm, compilers handle optimizations, and, if requested, parallelization.
Introduction to parallel environments Parallel programming model In message-passing, a parallel application consists of a number of processes that run concurrently. Each process has its own local memory. It communicates with other processes by sending and receiving messages. When data is passed in a message, both processes must work to transfer the data from the local memory of one to the local memory of the other.
Introduction to parallel environments HP SMP architectures HP SMP architectures HP offers single-processor and symmetric multiprocessor (SMP) systems. SMP systems, those that utilize different bus configurations for memory access, are discussed below. Bus-based systems The K-Class servers are midrange servers with a bus-based architecture. It contains one set of processors and physical memory. Memory is shared among all the processors, with a bus serving as the interconnect.
Introduction to parallel environments MPI MPI The message passing model Programming models are generally categorized by how memory is used. In the shared memory model each process accesses a shared address space, while in the message passing model an application runs as a collection of autonomous processes, each with its own local memory. In the message passing model processes communicate with other processes by sending and receiving messages.
Introduction to parallel environments MLIB MLIB High-performance software programs typically need to apply multiple mathematical algorithms to perform scientific computations. Many of these algorithms are used over and over again and should, therefore, be efficient. The need for developers to not have to implement these algorithms every time they write a new program led to the creation of collections of precompiled and optimized libraries of mathematical algorithms.
Introduction to parallel environments OpenMP OpenMP OpenMP is a portable, scalable model that gives shared-memory parallel programmers a simple and flexible interface for developing parallel applications on platforms ranging from the desktop to the supercomputer. The OpenMP Application Program Interface (API) supports multi-platform shared-memory parallel programming in Fortran on all architectures, including UNIX and Windows NT. HP C/C++ and Fortran support OpenMP.
Introduction to parallel environments HP UPC HP UPC HP UPC is a fully conforming implementation of the UPC language, with some extensions, primarily for compatibility with the HP C and HP C++ products. UPC is fully compliant with the ANSI C99 specification except for complete implementation of the complex data type. Note also that the C run-time library used by UPC might not provide complete implementation of the C99 run-time components.
Introduction to parallel environments HP UPC 14 Chapter 1
2 MPI This chapter describes key components of the HP MPI (version 2.0) implementation of the Message Passing Interface (MPI) standard, helping you use HP MPI to develop and run parallel applications.
MPI HP MPI 2.0 is supported on workstations, midrange servers, and high-end servers. HP MPI 2.0 for HP-UX is supported on HP-UX 11i v1 or later operating systems on PA-RISC 2.0; and HP-UX 11i v2 or later operating systems on Itanium-based platforms. HP MPI 2.0 for Linux is supported on Red Hat Linux V7.2 operating systems on Intel IA-32 and Itanium2 platforms. HP MPI 2.0 for Tru64UNIX is supported on AlphaServers.
MPI Compiling and Linking Compiling and Linking Compiling applications The compiler you use to build HP MPI applications depends upon which programming language you use. The HP MPI compiler utilities are shell scripts that invoke the appropriate native compiler. You can pass the pathname of the MPI header files using the -I option and link an MPI library (for example, the diagnostic or thread-compliant library) using the -Wl, -L or -l option.
MPI Compiling and Linking If aCC is not available, mpiCC uses CC as the default C++ compiler.
MPI Compiling and Linking If you want to use a compiler other than the default one assigned to each utility, set the corresponding environment variables shown in Table 2-5. Table 2-5 Compilation environment variables Utility Environment variable mpicc MPI_CC mpiCC MPI_CXX mpif77 MPI_F77 mpif90 MPI_F90 Autodouble functionality HP MPI 2.
MPI Compiling and Linking Same as -r8. For Tru64UNIX: • -r8 Defines REAL declarations, constants, functions, and intrinsics as DOUBLE PRECISION (REAL*8), and defines COMPLEX declarations, constants, functions, and intrinsics as DOUBLE COMPLEX (COMPLEX*16). This option is the same as the -real_size 64 option. • -r16 Defines REAL and DOUBLE PRECISION declarations, constants, functions, and intrinsics as REAL*16.
MPI Compiling and Linking The user-defined callback passed to these functions should accept normal-sized arguments. These functions are called internally by the library where normally-sized data types will be passed to them. 64-bit support HP-UX 11.i and higher is available as a 32- and 64-bit operating system. You must run 64-bit executables on the 64-bit system (though you can build 64-bit executables on the 32-bit system). HP MPI supports a 64-bit version of the MPI library on platforms running HP-UX 11.
MPI Compiling and Linking NOTE In HP Fortran 3.2 and higher versions, the +Oparallel option is not supported on Integrity systems. You must use +Oautopar instead of +Oparallel for Fortran applications on Integrity systems. Building Applications This example shows how to build hello_world.c prior to running. Step 1. Change to a writable directory. Step 2. Compile the hello_world executable. For shared libraries: % $MPI_ROOT/bin/mpicc -o hello_world $MPI_ROOT/help/ hello_world.
MPI Running Running Running applications This section introduces the methods to run your HP MPI application. Using one of the mpirun methods is required. The examples below demonstrate two basic methods. Refer to “mpirun (mpirun.all)” on page 36 for all the mpirun command line options. There are three methods you can use to start your application: • Use mpirun with the -np # option and the name of your program.
MPI Running Running on multiple hosts using remote shell This example teaches you to run the hello_world.c application that you built in Building Applications (above) using two hosts to achieve four-way parallelism. For this example, the local host is named jawbone and a remote host is named wizard. To run hello_world.c on two hosts, use the following procedure, replacing jawbone and wizard with the names of your machines: Step 1. Edit the .rhosts file on jawbone and wizard. Add an entry for wizard in the .
MPI Running HP MPI prints the output from running the hello_world executable in non-deterministic order. The following is an example of the output: Hello Hello Hello Hello world! world! world! world! I'm I'm I'm I'm 2 0 3 1 of of of of 4 4 4 4 on on on on wizard jawbone wizard jawbone Notice that processes 0 and 1 run on jawbone, the local host, while processes 2 and 3 run on wizard.
MPI Running where # is the number of processors and program is the name of your application. Suppose you want to build a C application called poisson and run it using five processes to do the computation. To do this, use the following command sequence: % $MPI_ROOT/bin/mpicc -o poisson poisson.c % $MPI_ROOT/bin/mpirun -np 5 poisson prun also supports running applications with SPMD. Please refer to the prun documentation at http://www.quadrics.com.
MPI Running Runtime environment variables Environment variables are used to alter the way HP MPI executes an application. The variable settings determine how an application behaves and how an application allocates internal resources at runtime. Many applications run without setting any environment variables. However, applications that use a large number of nonblocking messaging requests, require debugging support, or need to control process placement may need a more customized configuration.
MPI Running • MPI_TMPDIR • MPI_WORKDIR • TOTALVIEW MPI_COMMD MPI_COMMD routes all off-host communication through daemons rather than between processes. The MPI_COMMD syntax is as follows: out_frags,in_frags where out_frags Specifies the number of 16Kbyte fragments available in shared memory for outbound messages. Outbound messages are sent from processes on a given host to processes on other hosts using the communication daemon. The default value for out_frags is 64.
MPI Running nmsg Disables detection of multiple buffer writes during receive operations and detection of send buffer corruptions. nwarn Disables the warning messages that the diagnostic library generates by default when it identifies a receive that expected more bytes than were sent. dump:prefix Dumps (unformatted) all sent and received messages to prefix.msgs.rank where rank is the rank of a specific process. dumpf:prefix Dumps (formatted) all sent and received messages to prefix.msgs.
MPI Running allocated to these objects before you call MPI_Finalize. In C, this is analogous to making calls to malloc() and free() for each object created during program execution. Setting the l option may decrease application performance. f Forces MPI errors to be fatal. Using the f option sets the MPI_ERRORS_ARE_FATAL error handler, ignoring the programmer’s choice of error handlers. This option can help you detect nondeterministic error problems in your code.
MPI Running turning them off. This is accomplished by setting the time period of the s option in the MPI_FLAGS environment variable (for example: s600). Time is in seconds. You can use the s[a][p]# option with the thread-compliant library as well as the standard non thread-compliant library. Setting s[a][p]# for the thread-compliant library has the same effect as setting MPI_MT_FLAGS=ct when you use a value greater than 0 for #. The default value for the thread-compliant library is sp0.
MPI Running +E2 Sets -1 as the value of.TRUE. and 0 as the value for FALSE. when returning logical values from HP MPI routines called within Fortran 77 applications. D Dumps shared memory configuration information. Use this option to get shared memory values that are useful when you want to set the MPI_SHMCNTL flag. E[on|off] Function parameter error checking is turned off by default. It can be turned on by setting MPI_FLAGS=Eon. T Prints the user and system times for each MPI rank.
MPI Running amount where amount specifies the total amount of shared memory in bytes for all processes. The default is 2 Mbytes for up to 64-way applications and 4 Mbytes for larger applications. Be sure that the value specified for MPI_GLOBMEMSIZE is less than the amount of global shared memory allocated for the host. Otherwise, swapping overhead will degrade application performance. MPI_INSTR MPI_INSTR enables counter instrumentation for profiling HP MPI applications.
MPI Running MPI_LOCALIP MPI_LOCALIP specifies the host IP address that is assigned throughout a session. Ordinarily, mpirun determines the IP address of the host it is running on by calling gethostbyaddr. However, when a host uses a SLIP or PPP protocol, the host’s IP address is dynamically assigned only when the network connection is established. In this case, gethostbyaddr may not return the correct IP address. The MPI_LOCALIP syntax is as follows: xxx.xxx.xxx.xxx where xxx.xxx.xxx.
MPI Running • SIGILL • SIGBUS • SIGSEGV • SIGSYS In the event one of these signals is not caught by a user signal handler, HP MPI will display a brief stack trace that can be used to locate the signal in the code. Signal 10: bus error PROCEDURE TRACEBACK: (0) (1) (2) (3) (4) 0x0000489c 0x000048c4 0x000049d4 0xc013750c 0x0003b50 bar + 0xc [././a.out] foo + 0x1c [,/,/a.out] main + 0xa4 [././a.out] _start + 0xa8 [/usr/lib/libc.2] $START$ + 0x1a0 [././a.
MPI Running MPI_TMPDIR By default, HP MPI uses the /tmp directory to store temporary files needed for its operations. MPI_TMPDIR is used to point to a different temporary directory. The MPI_TMPDIR syntax is directory where directory specifies an existing directory used to store temporary files. MPI_WORKDIR By default, HP MPI applications execute in the directory where they are started. MPI_WORKDIR changes the execution directory.
MPI Running We recommend using the mpirun launch utility. However, for users that are unable to install MPI on all hosts, HP MPI provides a self-contained launch utility, mpirun.all. The restrictions for mpirun.all include • Applications must be linked statically • Start-up may be slower • TotalView is unavailable to executables launched with mpirun.
MPI Running The -np option is not allowed with -prun. The following mpirun options are allowed with -prun: mpirun [-help] [-version] [-jv] [-i ] [-universe_size=#] [-sp ] [-T] [-prot] [-spawn] [-1sided] [-e var[=val]] -prun [] • To invoke LSF for applications where all processes execute the same program on the same host: bsub [lsf_options] pam -mpi mpirun [mpirun_options] program [args] In this case, LSF assigns a host to the MPI job.
MPI Running % bsub pam -mpi $MPI_ROOT/bin/mpirun -f my_appfile runs an appfile named my_appfile and requests host assignments for all remote and local hosts specified in my_appfile. If my_appfile contains the following items: -h voyager -np 10 send_receive -h enterprise -np 8 compute_pi Host assignments are returned for the two symbolic links voyager and enterprise. When requesting a host from LSF, you must ensure that the path to your executable file is accessible by all machines in the resource pool.
MPI Running Eliminates a teardown when ranks exit abnormally. Further communications involved with ranks that went away return error class MPI_ERR_EXITED, but do not force the application to teardown, as long as the MPI_Errhandler is set to MPI_ERRORS_RETURN. Some restrictions apply: • Cannot be used with HyperFabric • Communication is done via TCP/IP (Does not use shared memory for intranode communication.) • Cannot be used with the diagnostic library.
MPI Running Prints the communication protocol between each host (i.e. TCP/IP, HyperFabric, or shared memory). -prun Enables start-up with Elan usage. Only supported when linking with shared libraries. Some features like mpirun -stdio processing are unavailable. The -np option is not allowed with -prun.
MPI Running Specifies extra arguments to be applied to the programs listed in the appfile—A space separated list of arguments. Use this option at the end of your command line to append extra arguments to each line of your appfile. Refer to the example in “Adding program arguments to your appfile” on page 43 for details. program Specifies the name of the executable file to run. IMPI_options Specifies this mpirun is an IMPI client.
MPI Running -e var=val Sets the environment variable var for the program and gives it the value val. The default is not to set environment variables. When you use -e with the -h option, the environment variable is set to val on the remote host. -l user Specifies the user name on the target host. The default is the current user name. -sp paths Sets the target shell PATH environment variable to paths. Search paths are separated by a colon. Both -sp path and -e PATH=path do the same thing.
MPI Running send_receive arg1 arg2 arg3 -arg4 arg5 • The compute_pi command line for machine enterprise becomes: compute_pi arg3 -arg4 arg5 When you use the -- extra_args_for_appfile option, it must be specified at the end of the mpirun command line. Setting remote environment variables To set environment variables on remote hosts use the -e option in the appfile.
MPI Running However, this places processes 0 and 1 on hosta and processes 2 and 3 on hostb, resulting in interhost communication between the ranks identified as having slow communication: Slow communication process 0 process 2 process 1 process 3 hosta hostb A more optimal appfile for this example would be -h -h -h -h hosta hostb hosta hostb -np -np -np -np 1 1 1 1 program1 program2 program1 program2 This places ranks 0 and 2 on hosta and ranks 1 and 3 on hostb.
MPI Running NOTE Because HP MPI sets up one daemon per host (or appfile entry) for communication, when you invoke your application with -np x, HP MPI generates x+1 processes. Generating multihost instrumentation profiles To generate tracing output files for multihost applications, you must invoke mpirun on a host where at least one MPI process is running. HP MPI writes the trace file (prefix.tr) to the working directory on the host where mpirun runs.
MPI Running mpiexec The MPI-2 standard defines mpiexec as a simple method to start MPI applications. It supports less features than mpirun, but it is portable. mpiexec syntax has three formats: • mpiexec offers arguments similar to a MPI_Spawn call, with arguments as shown in the following form: mpiexec [-n maxprocs][-soft ranges][-host host][-arch arch][-wdir dir][-path dirs][-file file]command-args For example: % $MPI_ROOT/bin/mpiexec -n 8 ./myprog.
MPI Running -file file Ignored in HP MPI. This last option is used separately from the options above. -configfile file Specify a file of lines containing the above options. mpiexec does not support prun startup. mpijob mpijob lists the HP MPI jobs running on the system. Invoke mpijob on the same host as you initiated mpirun. mpijob syntax is shown below: mpijob [-help] [-a] [-u] [-j id] [id id ...]] where -help Prints usage information for the utility. -a Lists jobs for all users.
MPI Running LIVE Indicates whether the process is running (an x is used) or has been terminated. PROGNAME Program names used in the HP MPI application. mpijob does not support prun startup. mpiclean mpiclean kills processes in an HP MPI application. Invoke mpiclean on the host on which you initiated mpirun. The MPI library checks for abnormal termination of processes while your application is running. In some cases, application bugs can cause processes to deadlock and linger in the system.
MPI Running The HMP functionality shipped with HP MPI 2.0 is turned off by default. (MPI_HMP=off) There are four possible values for MPI_HMP; on, off, ON, and OFF. The file /etc/mpi.conf can be created and set to define the system-wide default for HMP functionality. Setting MPI_HMP within the file to on or off is advisory only, and can be overridden by the user with the use of the environment variable. Setting MPI_HMP within the file to ON or OFF is forced and will override the user environment variable.
MPI Running You can also use an indirect approach and specify that all off-host communication occur between daemons, by specifying the -commd option to the mpirun command. In this case, the processes on a host use shared memory to send messages to and receive messages from the daemon. The daemon, in turn, uses a socket connection to communicate with daemons on other hosts. Figure 2-1 shows the structure for daemon communication.
MPI Running IMPI The Interoperable MPI protocol (IMPI) extends the power of MPI by allowing applications to run on heterogeneous clusters of machines with various architectures and operating systems, while allowing the program to use a different implementation of MPI on each machine. This is accomplished without requiring any modifications to the existing MPI specification. That is, IMPI does not add, remove, or modify the semantics of any of the existing MPI routines.
MPI Debugging Debugging This chapter describes debugging and troubleshooting HP MPI applications.
MPI Debugging Using Visual MPI Visual MPI is an MPI analysis tool focused on error detection and visualization, with automatic correlation to application source code. While Visual MPI includes a range of features, there are several highlights: ease of use (near-zero initial learning curve), automated analysis capabilities, and reporting of a range of programming errors. For more information about Visual MPI, refer to the documents available at http://www.hp.com/go/mpi and in the Visual MPI online help.
MPI Debugging Step 5. Set the global variable MPI_DEBUG_CONT to 1 using each session’s command line interface or graphical user interface. The syntax for setting the global variable depends upon which debugger you use: (adb) mpi_debug_cont/w 1 (dde) set mpi_debug_cont = 1 (xdb) print *MPI_DEBUG_CONT = 1 (wdb) set MPI_DEBUG_CONT = 1 (gdb) set MPI_DEBUG_CONT = 1 (ladebug) set MPI_DEBUG_CONT = 1 NOTE For the ladebug debugger, /usr/bin/X11 may need to be added to the command search path. Step 6.
MPI Debugging For example, % $MPI_ROOT/bin/mpicc myprogram.c -g % $MPI_ROOT/bin/mpirun -tv -np 2 a.out In this example, myprogram.c is compiled using the HP MPI compiler utility for C programs. The executable file is compiled with source line information and then mpirun runs the a.out MPI program: -g Specifies that the compiler generate the additional information needed by the symbolic debugger. -np 2 Specifies the number of processes to run (2, in this case).
MPI Debugging To improve performance, HP MPI supports a process-to-process, one-copy messaging approach. This means that one process can directly copy a message into the address space of another process. Because of this process-to-process bcopy (p2p_bcopy) implementation, a kernel thread is created for each process that has p2p_bcopy enabled. This thread deals with page and protection faults associated with the one-copy operation.
MPI Debugging • Message signature analysis—Detects type mismatches in MPI calls. For example, in the two calls below, the send operation sends an integer, but the matching receive operation receives a floating-point number.
MPI Debugging Backtrace functionality HP MPI 2.0 handles several common termination signals differently than earlier versions of HP MPI. If any of the following signals are generated by an MPI application, a stack trace is printed prior to termination: • SIGBUS - bus error • SIGSEGV - segmentation violation • SIGILL - illegal instruction • SIGSYS - illegal argument to system call The backtrace is helpful in determining where the signal was generated and the call stack at the time of the error.
MPI Debugging $MPI_ROOT/bin/mpicc: HP MPI 02.00.00.00 (dd/mm/yyyy) B6060BA - HP-UX 11.i This command returns the HP MPI version number, the date this version was released, HP MPI product numbers, and the operating system version. Building You can solve most build-time problems by referring to the documentation for the compiler you are using. If you use your own build script, specify all necessary input libraries.
MPI Debugging Running Run time problems originate from many sources and may include: • Shared memory • Message buffering • Propagation of environment variables • Interoperability • Fortran 90 programming features • UNIX open file descriptors • External input and output Shared memory When an MPI application starts, each MPI process attempts to allocate a section of shared memory.
MPI Debugging example, a sequence of operations (labeled "Deadlock") as illustrated in Table 2-6 would result in such a deadlock. Table 2-6 also illustrates the sequence of operations that would avoid code deadlock. Table 2-6 Non-buffered messages and deadlock Deadlock No Deadlock Process 1 Process 2 Process 1 Process 2 MPI_Send(2,.. ..) MPI_Send(1,... .) MPI_Send(2,....) MPI_Recv(1,....) MPI_Recv(2,.. ..) MPI_Recv(1,.... ) MPI_Recv(2,....) MPI_Send(1,....
MPI Debugging Fortran 90 programming features The MPI 1.1 standard defines bindings for Fortran 77 but not Fortran 90. Although most Fortran 90 MPI applications work using the Fortran 77 MPI bindings, some Fortran 90 features can cause unexpected behavior when used with HP MPI. In Fortran 90, an array is not always stored in contiguous memory.
MPI Debugging bnone [#>0] The same as b[#] except that the buffer is flushed both when it is full and when it is found to contain any data. Essentially provides no buffering from the user’s perspective. bline [#>0] Displays the output of a process after a line feed is encountered, or the # byte buffer is full. The default value of # in all cases is 10k bytes The following option is available for prepending: p Enables prepending.
MPI Tuning Tuning This section provides information about tuning HP MPI applications to improve performance. The topics covered are: • MPI_FLAGS options • Message latency and bandwidth • Multiple network interfaces • Processor subscription • MPI routine selection • Multilevel parallelism • Coding considerations The tuning information in this chapter improves application performance in most but not all cases.
MPI Tuning Message bandwidth is the reciprocal of the time needed to transfer a byte. Bandwidth is normally expressed in megabytes per second. Bandwidth becomes important when message sizes are large. To improve latency or bandwidth or both: • Reduce the number of process communications by designing coarse-grained applications. • Use derived, contiguous data types for dense data structures to eliminate unnecessary byte-copy operations in certain cases.
MPI Tuning In this case, all iterations through MPI_Recv_init are progressed just once when MPI_Startall is called. This approach avoids the additional progression overhead when using MPI_Irecv and can reduce application latency. Multiple network interfaces You can use multiple network interfaces for interhost communication while still having intrahost exchanges. In this case, the intrahost exchanges use shared memory between processes mapped to different same-host IP addresses.
MPI Tuning Figure 2-2 Multiple network interfaces Ranks 0 - 15 ethernet0 ethernet0 shmem Ranks 16 - 31 Ranks 32 - 47 shmem ethernet1 host0 Ranks 48 - 63 ethernet1 host1 Host0 processes with rank 0 - 15 communicate with processes with rank 16 - 31 through shared memory (shmem). Host0 processes also communicate through the host0-ethernet0 and the host0-ethernet1 network interfaces with host1 processes.
MPI Tuning Table 2-7 Subscription types (Continued) Subscription type Over subscribed Description More active processes than processors When a host is over subscribed, application performance decreases because of increased context switching. Context switching can degrade application performance by slowing the computation phase, increasing message latency, and lowering message bandwidth.
MPI Tuning Coding considerations The following are suggestions and items to consider when coding your MPI applications to improve performance: • Use HP MPI collective routines instead of coding your own with point-to-point routines because HP MPI’s collective routines are optimized to use shared memory where possible for performance. • Use commutative MPI reduction operations. — Use the MPI predefined reduction operations whenever possible because they are optimized.
MPI Profiling Profiling The following provides information about utilities you can use to analyze HP MPI applications. The topics covered are: • Using counter instrumentation — Creating an instrumentation profile — Viewing ASCII instrumentation data • Using the profiling interface Using counter instrumentation Counter instrumentation is a lightweight method for generating cumulative runtime statistics for your MPI applications. When you create an instrumentation profile, HP MPI creates an ASCII format.
MPI Profiling Specifications you make using mpirun -i override any specifications you make using the MPI_INSTR environment variable. MPIHP_Trace_on and MPIHP_Trace_off By default, the entire application is profiled from MPI_Init to MPI_Finalize. However, HP MPI provides the nonstandard MPIHP_Trace_on and MPIHP_Trace_off routines to collect profile information for selected code sections only. To use this functionality: 1.
MPI Profiling • Communication hot spots—The processes in your application between which the largest amount of time is spent in communication. • Message bin—The range of message sizes in bytes. The instrumentation profile reports the number of messages according to message length. NOTE You do not get message size information for MPI_Alltoallv instrumentation. Figure 2-3 displays the contents of the example report compute_pi.instr. Figure 2-3 ASCII instrumentation profile Version: HP MPI 01.08.00.
MPI Profiling 1 0.126355 0.008260( 6.54%) 0.118095( 93.46%) ----------------------------------------------------------------- Rank Proc MPI Time Overhead Blocking ----------------------------------------------------------------0 0.118003 0.118003(100.00%) 0.000000( 0.00%) 1 0.118095 0.118095(100.00%) 0.000000( 0.
MPI Profiling 1 0 1 (8, 8) 8 1 [0..64] 8 ----------------------------------------------------------------- Using the profiling interface The MPI profiling interface provides a mechanism by which implementors of profiling tools can collect performance information without access to the underlying MPI implementation source code. Because HP MPI provides several options for profiling your applications, you may not need the profiling interface to write your own routines.
MPI Profiling int MPI_Send(void *buf, int count, MPI_Datatype type, int to, int tag, MPI_Comm comm) { printf("Calling C MPI_Send to %d\n", to); return PMPI_Send(buf, count, type, to, tag, comm); } #pragma _HP_SECONDARY_DEF mpi_send mpi_send_ void mpi_send(void *buf, int *count, int *type, int *to, int *tag, int *comm, int *ierr) { printf("Calling Fortran MPI_Send to %d\n", *to); pmpi_send(buf, count, type, to, tag, comm, ierr); } 76 Chapter 2
3 MLIB HP’s high-performance math libraries (HP MLIB) help you speed development of applications and shorten execution time of long-running technical applications.
MLIB HP MLIB is a collection of subprograms optimized for use on HP servers and workstations, providing mathematical software and computational kernels for engineering and scientific applications. HP MLIB can be used on HP-UX systems ranging from single-processor workstations to multiprocessor high-end servers. HP MLIB is optimized for HP PA-RISC 2.0 processors and the Itanium Processor Family (IPF). HP MLIB has three components: VECLIB, LAPACK, and ScaLAPACK.
MLIB VECLIB VECLIB HP VECLIB contains robust callable subprograms. Together with a subset of the BLAS Standard subroutines, HP MLIB supports the legacy BLAS, a collection of routines for the solution of sparse symmetric systems of equations, a collection of commonly used Fast Fourier Transforms (FFTs), and convolutions.
MLIB LAPACK LAPACK HP Linear Algebra Package (LAPACK) is a collection of subprograms that provide mathematical software for applications involving linear equations, least squares, eigenvalue problems, and the singular value decomposition. For more information, please reference the latest edition of the LAPACK Users’ Guide at the Netlib repository: http://www.netlib.org/lapack/lug/index..
MLIB ScaLAPACK ScaLAPACK ScaLAPACK is a library of high-performance linear algebra routines capable of solving systems of linear equations, linear least squares problems, eigenvalue problems, and singular value problems. ScaLAPACK can also handle many associated computations such as matrix factorizations or estimating condition numbers. ScaLAPACK is a public domain software that was developed by Oak Ridge National Laboratory.
MLIB ScaLAPACK 82 Chapter 3
4 OpenMP OpenMP is a portable, scalable model that gives shared-memory parallel programmers a simple and flexible interface for developing parallel applications on platforms ranging from the desktop to the supercomputer.
OpenMP architectures, including UNIX and Windows NT.
OpenMP HP’s implementation of OpenMP HP’s implementation of OpenMP This section discusses HP’s implementation of OpenMP. Command-line option HP OpenMP directives are only accepted if the command-line option +Oopenmp is given. NOTE +Oopenmp implies +Onodynsel, +Oparallel, and +Onoautopar. Default The default command-line option is +Onoopenmp. If +Oopenmp is not given, all OpenMP directives (c$omp) are ignored. Optimization levels and parallelism +Oopenmp is accepted at all optimization levels.
OpenMP HP’s implementation of OpenMP • Parallel and work-shared directives (including the clauses for these directives) are only processed. While they will return right answers, you will not achieve parallel code. Each thread will run a serial version of the code. Optimization levels +O3 through +O4 When using optimization levels +O3 and +O4: • All sync and run-time library directives are processed and honored.
OpenMP HP’s implementation of OpenMP Portable timing routines There are two portable timing routines: DOUBLE PRECISION OMP_GET_WTIME() DOUBLE PRECISION OMP_GET_WTICK() Nested lock routines Nested lock routines are as follows: SUBROUTINE OMP_INIT_NEST_LOCK (NLOCK) SUBROUTINE OMP_DESTROY_NEST_LOCK (NLOCK) SUBROUTINE OMP_SET_NEST_LOCK (NLOCK) SUBROUTINE OMP_UNSET_NEST_LOCK (NLOCK) INTEGER FUNCTION OMP_TEST_NEST_LOCK (NLOCK) Additional features • Copyin now allows non-threadprivate objects in a parallel reg
OpenMP New library New library The OpenMP APIs are defined in the library libomp. These libraries are in patches PHSS_25028 (HP-UX 11.00) and PHSS_25029 (HP-UX 11.11).
OpenMP Implementation-defined behavior Implementation-defined behavior The following summarizes the behaviors that are described as implementation dependent in this API. Each behavior is cross-referenced back to its description in the OpenMP v2.0 main specification. HP, in conformance with the OpenMP v2.0 API, defines and documents the following behavior. 1. SCHEDULE(GUIDED,chunk): chunk specifies the size of the smallest piece, except possibly the last.
OpenMP Implementation-defined behavior 11. If the dynamic threads mechanism is enabled on entering a parallel region, the allocation status of an allocatable array that is not affected by a COPYIN clause that appears on the region will have an initial allocation status of not currently allocated (Section 2.6.1, page 32). 12. Due to resource constraints, it is not possible for an implementation to document the maximum number of threads that can be created successfully during a program's execution.
OpenMP From HP Programming Model to OpenMP From HP Programming Model to OpenMP This section discusses migration from the HP Programming Model (HPPM) to the OpenMP parallel programming model. Syntax The OpenMP parallel programming model is very similar to the HP Programming Model (HPPM). The general thread model is the same, the spawn (fork) mechanisms behave in a similar fashion, etc. However, the specific syntax to specify the underlying semantics has been changed significantly.
OpenMP From HP Programming Model to OpenMP Table 4-2 OpenMP and HPPM Directives/Clauses (Continued) HPPM OpenMP !$dir begin_tasks !$OMP parallel sections !$dir critical_section[(name)] !$OMP critical[(name)] !$dir wait_barrier !$OMP barrier !$dir ordered_section !$OMP ordered !$OMP end parallel !$dir end_tasks !$OMP end sections !$dir end_tasks !$OMP end parallel sections !$OMP end parallel do !
OpenMP From HP Programming Model to OpenMP HP Programming Model directives This section describes how the HP Programming Model (HPPM) directives are affected by the implementation of OpenMP. Not Accepted with +Oopenmp These HPPM directives will not be accepted when +Oopenmp is given.
OpenMP From HP Programming Model to OpenMP • node_private_pointer • near_shared • far_shared • block_shared • near_shared_pointer • far_shared_pointer NOTE If +Oopenmp is given, the directives above are ignored. Accepted with +Oopenmp These HPPM directives will continue to be accepted when +Oopenmp is given.
OpenMP More information on OpenMP More information on OpenMP For more information on OpenMP, see www.openmp.org.
OpenMP More information on OpenMP 96 Chapter 4
5 UPC Unified Parallel C (UPC), a parallel extension of the C programming language, is designed to support both of the common types of multiprocessor systems: those with a common global address space (such as SMP) and those with distributed memory.
UPC UPC provides a simple shared memory model for parallel programming, allowing data to be shared or distributed among a number of communicating processors. Constructs are provided in the language to permit simple declaration of shared data, distribute shared data across threads, and synchronize access to shared data across threads. This model promises significantly easier coding of parallel applications and maximum performance across shared memory, distributed memory, and hybrid systems.
UPC Compiling Compiling The upc command compiles UPC language source into machine-readable instructions. The desired output is specified with an option on the command line, and can be object files, translated C language source files, or symbolic assembly language. The compiler produces one object file for each file compiled. If the linker is called, and only one source file is specified on the command line, the single object file is deleted after the linking operation.
UPC Linking Linking Unless the -c compiler option is used, the upc command automatically links the compiled object modules into an executable UPC program file. This file is named a.out unless the -o option is used to change the file name. The upc command also includes the UPC Run-Time System (UPCRTS) in the link step automatically. The program must not be linked non-shared On Tru64 UNIX, the UPCRTS also references the Elantm and RMS libraries provided as a part of the AlphaServer SC system software.
UPC Running Running You can run UPC programs in one of three ways: • Single-threaded execution • Multithreaded execution on an AlphaServer SC system using the prun command • Multithreaded execution on an SMP system (or on a single-CPU system) using the UPC Run-Time Environment Running programs in single thread mode If a UPC program has been compiled with -fthreads set to 1, or without having specified a value for -fthreads , then it may be run in single thread mode by simply executing the program ima
UPC Running prun command.) If this happens, you need to modify one or more modules to correct the consistency problem, recompile one or more modules with a consistent value for -fthreads , and reissue the prun command with -n value consistent with the value used for -fthreads . If all modules compile successfully without specifying -fthreads , then any value for the -n option of prun may be used.
UPC Running Example 5-3 below illustrates a method to partition output files into separate streams, one for each thread, under program control. A given file name is appended with the thread number, and a separate output file is opened for each thread. A similar approach could also be used for input files. Note that the fopen function calls occur in parallel, one on each thread. Example 5-3 Partitioning Output Streams #include #include #include
UPC Debugging Debugging In Tru64 UNIX clusters, you can use the TotalView debugger from Etnus, Inc. to debug UPC programs. TotalView is a full-featured, GUI-based debugger specifically designed to meet the needs of parallel applications running on many processors. The Totalview documentation set is available directly from Etnus, Inc. However, Totalview is not included with the HP UPC software and is not supported. If you install and use TotalView and have problems with it, contact Etnus, Inc.
UPC Tuning Tuning -tune option [Tru64 UNIX Only] Note that -tune ev[x] does not imply -arch ev[x]. Unlike -arch, -tune does not cause instruction emulation or illegal instructions on any Alpha architecture. A program compiled with any of the options runs on any Alpha processor. Beginning with Version 4.0 of the operating system and continuing with subsequent versions, the operating system kernel includes an instruction emulator.
UPC Profiling Profiling -p [Tru64 UNIX Only] Perform profiling by periodically sampling the value of the program counter. This option affects only linking. When linking is done, this option replaces the standard run-time startup routine with the profiling run-time startup routine (mcrt0.o) and searches the level 1 profiling library (libprof1.a). When profiling is completed, the startup routine calls the monstartup routine and produces a mon.
6 Data privatization Once HP shared memory classes are assigned, they are implemented throughout your entire program. Very efficient programs are written using these memory classes.
Data privatization manual intervention. Any loops that manipulate variables that are explicitly assigned to a memory class must be manually parallelized. Once a variable is assigned a class, its class cannot change.
Data privatization Directives and pragmas for data privatization Directives and pragmas for data privatization This section describes the various directives and pragmas that are implemented to achieve data privatization. These directives and pragmas are discussed in Table 6-1. Table 6-1 Data Privatization Directives and Pragmas Directive / Pragma Description Level of parallelism loop_private (namelist) Declares a list of variables and/or arrays private to the following loop.
Data privatization Directives and pragmas for data privatization In some cases, data declared loop_private, task_private, or parallel_private is stored on the stacks of the spawned threads. Spawned thread stacks default to 80 Mbytes in size.
Data privatization Privatizing loop variables Privatizing loop variables This section describes the following directives and pragmas associated with privatizing loop variables: • loop_private • save_last loop_private The loop_private directive and pragma declares a list of variables and/or arrays private to the immediately following Fortran DO or C for loop. loop_private array dimensions must be identifiable at compile-time.
Data privatization Privatizing loop variables Example 6-1 loop_private The following is a Fortran example of loop_private: C$DIR LOOP_PRIVATE(S) DO I = 1, N C S IS ONLY CORRECTLY PRIVATE IF AT LEAST C ONE IF TEST PASSES ON EACH ITERATION: IF(A(I) .GT. 0) S = A(I) IF(U(I) .LT. V(I)) S = V(I) IF(X(I) .LE. Y(I)) S = Z(I) B(I) = S * C(I) + D(I) ENDDO A potential loop-carried dependence on S exists in this example.
Data privatization Privatizing loop variables Here, the LOOP_PARALLEL directive is required to parallelize the I loop because of the call to MFY. The X and Y arrays are in shared memory by default. X and Z are not written to, and the portions of Y written to in the J loop’s IF statement are disjoint, so these shared arrays require no special attention. The local array XMFIED, however, is written to. But because XMFIED carries no values into or out of the I loop, it is privatized using LOOP_PRIVATE.
Data privatization Privatizing loop variables ivar is required in all loop_parallel C loops. Its use is shown in the following example: #pragma _CNX loop_parallel(ivar=i) for(i=0; i
Data privatization Privatizing loop variables Example 6-6 Secondary induction variables In C, secondary induction variables are sometimes included in for statements, as shown in the following example: /* warning: unparallelizable code follows */ #pragma _CNX loop_parallel(ivar=i) for(i=j=0; i
Data privatization Privatizing loop variables The save_last directive and pragma allows you to save the final value of loop_private data objects assigned in the last iteration of the immediately following loop. • If list (the optional, comma-separated list of loop_private data objects) is specified, only the final values of those data objects in list are saved. • If list is not specified, the final values of all loop_private data objects assigned in the last loop iteration are saved.
Data privatization Privatizing loop variables . if(atemp > amax) { . . . In this example, the loop_private variable atemp is conditionally assigned in the loop. In order for atemp to be truly private, you must be sure that at least one of the conditions is met so that atemp is assigned on every iteration. When the loop terminates, the save_last pragma ensures that atemp and X contain the values they are assigned on the last iteration. These values can then be used later in the program.
Data privatization Privatizing task variables Privatizing task variables Task privatization is manually specified using the task_private directive and pragma. task_private declares a list of variables and/or arrays private to the immediately following tasks. It serves the same purpose for parallel tasks that loop_private serves for loops and parallel_private serves for regions.
Data privatization Privatizing task variables REAL*8 A(1000), B(1000), WRK(1000) . . . C$DIR BEGIN_TASKS, TASK_PRIVATE(WRK) DO I = 1, N WRK(I) = A(I) ENDDO DO I = 1, N A(I) = WRK(N+1-I) . . . ENDDO C$DIR NEXT_TASK DO J = 1, M WRK(J) = B(J) ENDDO DO J = 1, M B(J) = WRK(M+1-J) . . . ENDDO C$DIR END_TASKS In this example, the WRK array is used in the first task to temporarily hold the A array so that its order is reversed. It serves the same purpose for the B array in the second task.
Data privatization Privatizing region variables Privatizing region variables Regional privatization is manually specified using the parallel_private directive or pragma. parallel_private is provided to declare a list of variables and/or arrays private to the immediately following parallel region. It serves the same purpose for parallel regions as task_private does for tasks, and loop_private does for loops.
Data privatization Privatizing region variables parallel_private privatizes regions: REAL A(1000,8), B(1000,8), C(1000,8), AWORK(1000), SUM(8) INTEGER MYTID . . . C$DIR PARALLEL(MAX_THREADS = 8) C$DIR PARALLEL_PRIVATE(I,J,K,L,M,AWORK,MYTID) IF(NUM_THREADS() .LT.
Data privatization Privatizing region variables In the previous example, in the J loop, after AWORK is initialized, AWORK is effectively used in a reduction on A; at this point its contents are identical to the MYTID dimension of A. After A is modified and used in the K and L loops, each thread restores a dimension of A’s original values from its private copy of AWORK. This carries the appropriate dimension through the region unaltered.
7 Memory classes The V-Class server implements only one partition of hypernode-local memory. This is accessed using the thread_private and node_private virtual memory classes.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.
Memory classes • Private versus shared memory • Memory class assignments The information in this chapter is provided for programmers who want to manually optimize their shared-memory programs on a single-node server. This is ultimately achieved by using compiler directives or pragmas to partition memory and otherwise control compiler optimizations. It can also be achieved using storage class specifiers in C and C++.