Technical data

Optimizing with MACRO-32
Example 3–8 Matrix Multiply—Basic
msync R0 ;synchronize with scalar
LOOP:
vldl A(I,K),#4,V0 ;col of A is loaded into V0
vsmulf B(K,J),V0,V0 ;V0 gets the product of V0
;and the scalar value B(K,J)
vldl C(I,J),#4,V1 ;col of C gets loaded into V1
vvaddf V0,V1,V1 ;V1 gets V0 summed with V1
vstl V1,C(I,J),#4 ;V1 is stored back into C
INC K ;increment K by one
IF (K < N) GOTO LOOP ;Loop for all values of K
INC J ;increment J by vector length
IF (J < N) GOTO LOOP ;Loop for all values of J
INC I, RESET J ;increment I by vector length
IF (I < N) GOTO LOOP ;Loop for all values of I
msync R0 ;synchronize with scalar
Example 3–9 Matrix Multiply—Improved
msync R0 ;synchronize with scalar
IJLOOP:
vldl C(I,J),#4,V1 ;col of C gets loaded into V1
KLOOP:
vldl A(I,K),#4,V0 ;col of A is loaded into V0
vsmulf B(K,J),V0,V0 ;V0 gets the product of V0
;and the scalar value B(K,J)
vvaddf V0,V1,V1 ;V1 gets V0 summed with V1
INC K ;increment K by one
IF (K < N) GOTO KLOOP ;Loop for all values of K
vstl V2,C(I,J),#4 ;V2 is stored back into C
INC J, RESET K ;increment J by vector length
IF (J < N) GOTO IJLOOP ;Loop for all values of J
INC I, RESET J ;increment I by vector length
IF (I < N) GOTO IJLOOP ;Loop for all values of I
msync R0 ;synchronize with scalar
3–24