Installation guide

Intel® Parallel Studio XE 2015 Composer Edition for C++ Linux*
Installation Guide and Release Notes 32
o Improved performance of Level 3 BLAS functions for 64-bit processors
supporting Intel AVX2
o Improved ?GEMM performance on small matrices for all processors when
MKL_DIRECT_CALL or MKL_DIRECT_CALL_SEQ is defined during compilation
(see the Intel® Math Kernel Library User’s Guide for more details )
o Improved performance of DGER and DGEMM for the beta=1, k=1 case for 64-bit
processors supporting Intel SSE4.2, Intel® Advanced Vector Extensions (Intel®
AVX), and Intel AVX2 instruction sets
o Optimized (D/Z)AXPY for the Intel AVX-512 instruction set
o Optimized ?COPY for Intel AVX2 and AVX512 instruction sets
o Optimized DGEMV for Intel AVX-512 instruction set
o Improved performance of SSYR2K for 64-bit processors supporting Intel AVX
and Intel AVX2
o Improved threaded performance of ?AXPBY for all Intel processors
o Improved DTRMM performance for the side=R, uplo={U,L}, transa=N, diag={N,U}
cases for Intel AVX-512
LINPACK:
o Improved performance of matrix generation in the heterogeneous Intel®
Optimized MP LINPACK Benchmark for Clusters
o Intel MIC Architecture offload option of the Intel Optimized MP LINPACK
Benchmark for Clusters package now supports Intel AVX2 hosts
o Improved performance of the Intel Optimized MP LINPACK for Clusters package
for 64-bit processors supporting Intel AVX2
LAPACK:
o Improved performance of ?(SY/HE)RDB
o Improved performance of ?(SY/HE)EV when eigenvectors are needed
o Improved performance of ?(SY/HE)(EV/EVR/EVD) when eigenvectors are not
needed
o Improved performance of ?GELQF,?GELS and ?GELSS for underdetermined
case (M less than N)
o Improved performance of ?GEHRD,?GEEV and ?GEES
o Improved performance of NaN checkers in LAPACKE interfaces
o Improved performance of ?GELSX, ?GGSVP
o Improved performance of ?(SY/HE)(EV/EVR/EVD) when eigenvectors are not
needed
o Improved performance of ?GETRF
o Improved performance of (S/D)GE(SVD/SDD) when M>=N and singular vectors
are not needed
o Improved performance of ?POTRF UPLO=U in Automatic Offload mode on Intel
MIC Architecture
o Added Automatic Offload for ?SYRDB on Intel MIC Architecture, which speeds
up ?SY(EV/EVD/EVR) when eigenvectors are not needed
PBLAS and ScaLAPACK: