Installation guide

Intel® Parallel Studio XE 2015 Composer Edition for C++ Linux*

Installation Guide and Release Notes 32

o Improved performance of Level 3 BLAS functions for 64-bit processors

supporting Intel AVX2

o Improved ?GEMM performance on small matrices for all processors when

MKL_DIRECT_CALL or MKL_DIRECT_CALL_SEQ is defined during compilation

(see the Intel® Math Kernel Library User’s Guide for more details )

o Improved performance of DGER and DGEMM for the beta=1, k=1 case for 64-bit

processors supporting Intel SSE4.2, Intel® Advanced Vector Extensions (Intel®

AVX), and Intel AVX2 instruction sets

o Optimized (D/Z)AXPY for the Intel AVX-512 instruction set

o Optimized ?COPY for Intel AVX2 and AVX512 instruction sets

o Optimized DGEMV for Intel AVX-512 instruction set

o Improved performance of SSYR2K for 64-bit processors supporting Intel AVX

and Intel AVX2

o Improved threaded performance of ?AXPBY for all Intel processors

o Improved DTRMM performance for the side=R, uplo={U,L}, transa=N, diag={N,U}

cases for Intel AVX-512

 LINPACK:

o Improved performance of matrix generation in the heterogeneous Intel®

Optimized MP LINPACK Benchmark for Clusters

o Intel MIC Architecture offload option of the Intel Optimized MP LINPACK

Benchmark for Clusters package now supports Intel AVX2 hosts

o Improved performance of the Intel Optimized MP LINPACK for Clusters package

for 64-bit processors supporting Intel AVX2

 LAPACK:

o Improved performance of ?(SY/HE)RDB

o Improved performance of ?(SY/HE)EV when eigenvectors are needed

o Improved performance of ?(SY/HE)(EV/EVR/EVD) when eigenvectors are not

needed

o Improved performance of ?GELQF,?GELS and ?GELSS for underdetermined

case (M less than N)

o Improved performance of ?GEHRD,?GEEV and ?GEES

o Improved performance of NaN checkers in LAPACKE interfaces

o Improved performance of ?GELSX, ?GGSVP

o Improved performance of ?(SY/HE)(EV/EVR/EVD) when eigenvectors are not

needed

o Improved performance of ?GETRF

o Improved performance of (S/D)GE(SVD/SDD) when M>=N and singular vectors

are not needed

o Improved performance of ?POTRF UPLO=U in Automatic Offload mode on Intel

MIC Architecture

o Added Automatic Offload for ?SYRDB on Intel MIC Architecture, which speeds

up ?SY(EV/EVD/EVR) when eigenvectors are not needed

 PBLAS and ScaLAPACK: