(3M) Calculator User Manual

ManualsBrandsScotch Brand ManualsCalculator5.1.10

This function, which has already been considered by several authors for hyper-

cube target topologies [11, 21, 25], has several interesting properties: it is easy

to compute, allows incremental updates performed by iterative algorithms, and its

minimization favors the mapping of intensively intercommunicating processes onto

nearby process ors; regardless of the type of routage implemented on the target

machine (store- and-forward or cut-through), it models the traﬃc on the intercon-

nection network and thus the risk o f co ngestion.

The strong p ositive correlation between values of this function and eﬀective

execution times has been experimentally veriﬁed by Hammond [21] on the CM-2,

and by Hendrickson and Leland [26] on the nCUBE 2.

The quality of mappings is evaluated with respect to the criteria for quality that

we have chosen: the balance of the computation load across processors, and the

minimization of the interprocessor c ommunication cost modeled by function f

These criteria lead to the deﬁnition of several parameters, which are desc rib e d

below.

For load balance, one can deﬁne µ

map

, the average load per computational

power unit (which does not depend on the mapping), and δ

map

, the load imbalance

ratio, as

map

def

∈V (S)

)

∈V (T )

)

and

map

def

∈V (T )









)

∈ V (S)

S,T

) = v

)







− µ

map



∈V (S)

)

However, since the maximum load imbalance ratio is provided by the user in input

of the mapping, the information given by these parameters is of little interest, since

what matters is the minimization of the c ommunication c ost function under this

load balance cons traint.

For communication, the stra ightforward parameter to consider is f

. It can be

normalized as µ

exp

, the average edge expansion, which can be compared to µ

dil

the average edge dilation; these are deﬁned as

exp

def

∈E(S)

)

and µ

dil

def

∈E(S)

|ρ

S,T

|E(S)|

exp

def

exp

dil

is smaller than 1 when the mapper succeeds in putting heavily inter-

communicating processes closer to each other than it does for lightly communicating

processes; they ar e e qual if all edges have same weight.

3.1.3 The Dual Recursive Bipartitioning algorithm

Our mapping algorithm uses a divide and conquer approach to recursively allocate

subsets of processes to subsets of processors [41]. It starts by considering a set of

processors, also called domain, containing all the process ors of the target machine,

and with which is asso c iated the set of all the process e s to map. At each step, the

algorithm bipartitions a yet unproce ssed domain into two disjoint subdomains, and