Build Tensorflow from source, MKL, AVX2, FMA enabled

Dr. George Jen

Advanced Vector Extensions (AVX, also known as Sandy Bridge New Extensions) are extensions to the x86 instruction set architecture for microprocessors from Intel and AMD proposed by Intel in March 2008 and first supported by Intel with the Sandy Bridge processor shipping in Q1 2011 and later on by AMD with the Bulldozer processor shipping in Q3 2011. AVX provides new features, new instructions and a new coding scheme.

AVX2 expands most integer commands to 256 bits and introduces fused multiply-accumulate (FMA) operations. AVX-512 expands AVX to 512-bit support using a new EVEX prefix encoding proposed by Intel in July 2013 and first supported by Intel with the Knights Landing processor, which shipped in 2016

Benefiting applications:

· Suitable for floating point-intensive calculations in multimedia, scientific and financial applications (AVX2 adds support for integer operations).

· Increases parallelism and throughput in floating point SIMD calculations.

· Reduces register load due to the non-destructive instructions.

· Improves Linux RAID software performance (required AVX2)

Currently stock tensorflow installable by conda or pip does not come with AVX2/FMA enabled. If you need to have AVX2/FMA enabled in Tensorflow, you will need to compile from open source code, below is how I have built tensorflow with AVX2/FMA enabled on a virtualbox 6.0 CentOS 7.0 VM guest environment.

1. Install anacounda Python 3 (in this writing, it is Python 3.7):

https://www.anaconda.com/download/#linux

2. Install Bazel build tool, which is used to build tensorflow:

sudo yum install bazel

you can check version installed by

bazel --help

3. Clone current tensorflow master branch source

git clone https://github.com/tensorflow/tensorflow.git

4. Change to tensorflow root folder and check all branches:

cd tensorflow

git branch -a

5. In this case, release branch r1.13 appears to be the latest one (You would want to look for whatever latest release), check out it

git checkout -b r1.13

6. Verify locally

git branch

7. Pull the release r.13

git pull

git branch --set-upstream-to=origin/r1.13 r1.13

8. Install high performance portable MPI support for Tensorflow:

mkdir nfs

cd nfs

wget http://www.mpich.org/static/downloads/3.1.4/mpich-3.1.4.tar.gz

You might already have below installed, just make sure by do it again:

sudo yum install gcc gcc-c++ gcc-fortran kernel-devel -y

tar -xvf mpich-3.1.4.tar.gz

mkdir ~/nfs/mpich3

./configure --prefix=/opt/hadoop/nfs/mpich3

make && make install

vi ~/.bashrc

#Append below

export PATH=/opt/hadoop/nfs/mpich3/bin:$PATH

export LD_LIBRARY_PATH="/opt/hadoop/nfs/mpich3/lib:$LD_LIBRARY_PATH"

save, exit

source ~/.bashrc

9. Install OpenMP runtime dev package, search and install:

sudo yum search libomp-dev

sudo yum install centos-release-scl-rh

sudo yum install llvm-toolset-7-libomp-devel

10. In tensorflow root folder, run configure

./configure

Following is my answers to the configure questions:

-bash-4.2$ ./configure

WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".

INFO: Invocation ID: fc1a812a-c01a-4620-a9d9-dca8621b7f1a

You have bazel 0.21.0- (@non-git) installed.

Please specify the location of python. [Default is /opt/hadoop/anaconda3/bin/python]:

Found possible Python library paths:

/opt/hadoop/anaconda3/lib/python3.7/site-packages

Please input the desired Python library path to use. Default is [/opt/hadoop/anaconda3/lib/python3.7/site-packages]

Do you wish to build TensorFlow with XLA JIT support? [Y/n]: n

No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n

No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with ROCm support? [y/N]: n

No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: n

No CUDA support will be enabled for TensorFlow.

Do you wish to download a fresh release of clang? (Experimental) [y/N]: y

Clang will be downloaded and used to compile tensorflow.

Do you wish to build TensorFlow with MPI support? [y/N]: y

MPI support will be enabled for TensorFlow.

Please specify the MPI toolkit folder. [Default is /opt/hadoop/nfs/mpich3]:

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]:

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n

Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.

--config=mkl # Build with MKL support.

--config=monolithic # Config for mostly static monolithic build.

--config=gdr # Build with GDR support.

--config=verbs # Build with libverbs support.

--config=ngraph # Build with Intel nGraph support.

--config=dynamic_kernels # (Experimental) Build kernels into separate shared objects.

Preconfigured Bazel build configs to DISABLE default on features:

--config=noaws # Disable AWS S3 filesystem support.

--config=nogcp # Disable GCP support.

--config=nohdfs # Disable HDFS support.

--config=noignite # Disable Apacha Ignite support.

--config=nokafka # Disable Apache Kafka support.

--config=nonccl # Disable NVIDIA NCCL support.

Configuration finished

configure script generates a file called .tf_configure.bazelrc

-bash-4.2$ cat .tf_configure.bazelrc

build --action_env PYTHON_BIN_PATH="/opt/hadoop/anaconda3/bin/python"

build --action_env PYTHON_LIB_PATH="/opt/hadoop/anaconda3/lib/python3.7/site-packages"

build --python_path="/opt/hadoop/anaconda3/bin/python"

build:xla --define with_xla_support=true

build --action_env TF_NEED_OPENCL_SYCL="0"

build --action_env TF_NEED_ROCM="0"

build --action_env TF_NEED_CUDA="0"

build --action_env TF_DOWNLOAD_CLANG="1"

build --config=download_clang

test --config=download_clang

build:None --define with_mpi_support=true

build --config=None

build:opt --copt=-march=native

build:opt --copt=-Wno-sign-compare

build:opt --host_copt=-march=native

build:opt --define with_default_optimizations=true

build:v2 --define=tf_api_version=2

However, I manually modified it:

-bash-4.2$ cat .tf_configure.bazelrc

build --action_env PYTHON_BIN_PATH="/opt/hadoop/anaconda3/bin/python"

build --action_env PYTHON_LIB_PATH="/opt/hadoop/anaconda3/lib/python3.7/site-packages"

build --python_path="/opt/hadoop/anaconda3/bin/python"

build --define with_jemalloc=true

build:gcp --define with_gcp_support=true

build:hdfs --define with_hdfs_support=false

build:s3 --define with_s3_support=false

build:xla --define with_xla_support=false

build:gdr --define with_gdr_support=false

build:verbs --define with_verbs_support=false

build --action_env TF_NEED_OPENCL="0"

build --action_env GCC_HOST_COMPILER_PATH="/usr/bin/gcc"

build:opt --cxxopt=-mavx --copt=-mavx --host_cxxopt=-march=native --host_copt=-march=native

build:opt --cxxopt=-mavx2 --copt=-mavx2 --host_cxxopt=-march=native --host_copt=-march=native

build:opt --cxxopt=-mfma --copt=-mfma --host_cxxopt=-march=native --host_copt=-march=native

build:opt --cxxopt=-mfpmath=both --copt=-mfpmath=both --host_cxxopt=-march=native --host_copt=-march=native

build:mkl --define using_mkl=true

build:mkl -c opt

build:mkl --copt="-DEIGEN_USE_VML"

build:monolithic --define framework_shared_object=false

build --define framework_shared_object=true

build:android --crosstool_top=//external:android/crosstool

build:android --host_crosstool_top=@bazel_tools//tools/cpp:toolchain

build:android_arm --config=android

build:android_arm --cpu=armeabi-v7a

build:android_arm64 --config=android

build:android_arm64 --cpu=arm64-v8a

11. Before running bazel build command line, installed and/or upgrade required python libraries, or your build will fail.

conda install -c conda-forge keras-applications

conda install -c conda-forge keras-preprocessing

pip install scipy --upgrade

pip install cython --upgrade

pip install pandas --upgrade

pip install h5py --upgrade

12. The start run bazel build:

nohup bazel build -c opt --jobs 1 --local_resources 2048,0.5,1.0 --verbose_failures --define=mkl_dnn_no_ares=true --config=mkl --config=opt //tensorflow/tools/pip_package:build_pip_package&

On my VM with 2 CPU/12GB memory, it took about 3 hours to complete build.

13. Create a tensorflow wheel file:

bazel-bin/tensorflow/tools/pip_package/build_pip_package ../tensorflow_pkg

Which it will create a folder parallel to tensorflow root folder, called tensorflow_pkg, that contains

tensorflow-1.12.0-cp37-cp37m-linux_x86_64.whl

14. scp the whl file to a physical machine with same version of Python (must be 3.7, or must be the same Python 3 version this wheel is built upon), because VM does not support AVX2 FMA instruction set and import tensorflow (this build) will core dump.

15. On a physical computer with Intel CPU (my I7 machine), install this tensorflow wheel:

pip install tensorflow-1.12.0-cp37-cp37m-linux_x86_64.whl

16. Now you have tensorflow compiled with Intel AVX2 FMA instruction set, which is optimized for running many mathematical, linear algebra computations. Word of caution, it does not work well on VM or does not work at all on VM, unless the hypervisor can pass through FMA correctly, which is outside the scope of this work, which can be checked by “grep fma /proc/cpuinfo”, it must run on a bare metal machine.