Build Tensorflow from source, MKL, AVX2, FMA enabled
Dr. George Jen
Advanced Vector Extensions (AVX, also known as Sandy Bridge New Extensions) are extensions to the x86 instruction set architecture for microprocessors from Intel and AMD proposed by Intel in March 2008 and first supported by Intel with the Sandy Bridge processor shipping in Q1 2011 and later on by AMD with the Bulldozer processor shipping in Q3 2011. AVX provides new features, new instructions and a new coding scheme.
AVX2 expands most integer commands to 256 bits and introduces fused multiply-accumulate (FMA) operations. AVX-512 expands AVX to 512-bit support using a new EVEX prefix encoding proposed by Intel in July 2013 and first supported by Intel with the Knights Landing processor, which shipped in 2016
Benefiting applications:
· Suitable for floating point-intensive calculations in multimedia, scientific and financial applications (AVX2 adds support for integer operations).
· Increases parallelism and throughput in floating point SIMD calculations.
· Reduces register load due to the non-destructive instructions.
· Improves Linux RAID software performance (required AVX2)
Currently stock tensorflow installable by conda or pip does not come with AVX2/FMA enabled. If you need to have AVX2/FMA enabled in Tensorflow, you will need to compile from open source code, below is how I have built tensorflow with AVX2/FMA enabled on a virtualbox 6.0 CentOS 7.0 VM guest environment.
1. Install anacounda Python 3 (in this writing, it is Python 3.7):
https://www.anaconda.com/download/#linux
2. Install Bazel build tool, which is used to build tensorflow:
sudo yum install bazel
you can check version installed by
bazel --help
3. Clone current tensorflow master branch source
git clone https://github.com/tensorflow/tensorflow.git
4. Change to tensorflow root folder and check all branches:
cd tensorflow
git branch -a
5. In this case, release branch r1.13 appears to be the latest one (You would want to look for whatever latest release), check out it
git checkout -b r1.13
6. Verify locally
git branch
7. Pull the release r.13
git pull
git branch --set-upstream-to=origin/r1.13 r1.13
8. Install high performance portable MPI support for Tensorflow:
mkdir nfs
cd nfs
wget http://www.mpich.org/static/downloads/3.1.4/mpich-3.1.4.tar.gz
You might already have below installed, just make sure by do it again:
sudo yum install gcc gcc-c++ gcc-fortran kernel-devel -y
tar -xvf mpich-3.1.4.tar.gz
mkdir ~/nfs/mpich3
./configure --prefix=/opt/hadoop/nfs/mpich3
make && make install
vi ~/.bashrc
#Append below
export PATH=/opt/hadoop/nfs/mpich3/bin:$PATH
export LD_LIBRARY_PATH="/opt/hadoop/nfs/mpich3/lib:$LD_LIBRARY_PATH"
save, exit
source ~/.bashrc
9. Install OpenMP runtime dev package, search and install:
sudo yum search libomp-dev
sudo yum install centos-release-scl-rh
sudo yum install llvm-toolset-7-libomp-devel
10. In tensorflow root folder, run configure
./configure
Following is my answers to the configure questions:
-bash-4.2$ ./configure
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
INFO: Invocation ID: fc1a812a-c01a-4620-a9d9-dca8621b7f1a
You have bazel 0.21.0- (@non-git) installed.
Please specify the location of python. [Default is /opt/hadoop/anaconda3/bin/python]:
Found possible Python library paths:
/opt/hadoop/anaconda3/lib/python3.7/site-packages
Please input the desired Python library path to use. Default is [/opt/hadoop/anaconda3/lib/python3.7/site-packages]
Do you wish to build TensorFlow with XLA JIT support? [Y/n]: n
No XLA JIT support will be enabled for TensorFlow.
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with ROCm support? [y/N]: n
No ROCm support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: n
No CUDA support will be enabled for TensorFlow.
Do you wish to download a fresh release of clang? (Experimental) [y/N]: y
Clang will be downloaded and used to compile tensorflow.
Do you wish to build TensorFlow with MPI support? [y/N]: y
MPI support will be enabled for TensorFlow.
Please specify the MPI toolkit folder. [Default is /opt/hadoop/nfs/mpich3]:
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]:
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
--config=mkl # Build with MKL support.
--config=monolithic # Config for mostly static monolithic build.
--config=gdr # Build with GDR support.
--config=verbs # Build with libverbs support.
--config=ngraph # Build with Intel nGraph support.
--config=dynamic_kernels # (Experimental) Build kernels into separate shared objects.
Preconfigured Bazel build configs to DISABLE default on features:
--config=noaws # Disable AWS S3 filesystem support.
--config=nogcp # Disable GCP support.
--config=nohdfs # Disable HDFS support.
--config=noignite # Disable Apacha Ignite support.
--config=nokafka # Disable Apache Kafka support.
--config=nonccl # Disable NVIDIA NCCL support.
Configuration finished
configure script generates a file called .tf_configure.bazelrc
-bash-4.2$ cat .tf_configure.bazelrc
build --action_env PYTHON_BIN_PATH="/opt/hadoop/anaconda3/bin/python"
build --action_env PYTHON_LIB_PATH="/opt/hadoop/anaconda3/lib/python3.7/site-packages"
build --python_path="/opt/hadoop/anaconda3/bin/python"
build:xla --define with_xla_support=true
build --action_env TF_NEED_OPENCL_SYCL="0"
build --action_env TF_NEED_ROCM="0"
build --action_env TF_NEED_CUDA="0"
build --action_env TF_DOWNLOAD_CLANG="1"
build --config=download_clang
test --config=download_clang
build:None --define with_mpi_support=true
build --config=None
build:opt --copt=-march=native
build:opt --copt=-Wno-sign-compare
build:opt --host_copt=-march=native
build:opt --define with_default_optimizations=true
build:v2 --define=tf_api_version=2
However, I manually modified it:
-bash-4.2$ cat .tf_configure.bazelrc
build --action_env PYTHON_BIN_PATH="/opt/hadoop/anaconda3/bin/python"
build --action_env PYTHON_LIB_PATH="/opt/hadoop/anaconda3/lib/python3.7/site-packages"
build --python_path="/opt/hadoop/anaconda3/bin/python"
build --define with_jemalloc=true
build:gcp --define with_gcp_support=true
build:hdfs --define with_hdfs_support=false
build:s3 --define with_s3_support=false
build:xla --define with_xla_support=false
build:gdr --define with_gdr_support=false
build:verbs --define with_verbs_support=false
build --action_env TF_NEED_OPENCL="0"
build --action_env GCC_HOST_COMPILER_PATH="/usr/bin/gcc"
build:opt --cxxopt=-mavx --copt=-mavx --host_cxxopt=-march=native --host_copt=-march=native
build:opt --cxxopt=-mavx2 --copt=-mavx2 --host_cxxopt=-march=native --host_copt=-march=native
build:opt --cxxopt=-mfma --copt=-mfma --host_cxxopt=-march=native --host_copt=-march=native
build:opt --cxxopt=-mfpmath=both --copt=-mfpmath=both --host_cxxopt=-march=native --host_copt=-march=native
build:mkl --define using_mkl=true
build:mkl -c opt
build:mkl --copt="-DEIGEN_USE_VML"
build:monolithic --define framework_shared_object=false
build --define framework_shared_object=true
build:android --crosstool_top=//external:android/crosstool
build:android --host_crosstool_top=@bazel_tools//tools/cpp:toolchain
build:android_arm --config=android
build:android_arm --cpu=armeabi-v7a
build:android_arm64 --config=android
build:android_arm64 --cpu=arm64-v8a
11. Before running bazel build command line, installed and/or upgrade required python libraries, or your build will fail.
conda install -c conda-forge keras-applications
conda install -c conda-forge keras-preprocessing
pip install scipy --upgrade
pip install cython --upgrade
pip install pandas --upgrade
pip install h5py --upgrade
12. The start run bazel build:
nohup bazel build -c opt --jobs 1 --local_resources 2048,0.5,1.0 --verbose_failures --define=mkl_dnn_no_ares=true --config=mkl --config=opt //tensorflow/tools/pip_package:build_pip_package&
On my VM with 2 CPU/12GB memory, it took about 3 hours to complete build.
13. Create a tensorflow wheel file:
bazel-bin/tensorflow/tools/pip_package/build_pip_package ../tensorflow_pkg
Which it will create a folder parallel to tensorflow root folder, called tensorflow_pkg, that contains
tensorflow-1.12.0-cp37-cp37m-linux_x86_64.whl
14. scp the whl file to a physical machine with same version of Python (must be 3.7, or must be the same Python 3 version this wheel is built upon), because VM does not support AVX2 FMA instruction set and import tensorflow (this build) will core dump.
15. On a physical computer with Intel CPU (my I7 machine), install this tensorflow wheel:
pip install tensorflow-1.12.0-cp37-cp37m-linux_x86_64.whl
16. Now you have tensorflow compiled with Intel AVX2 FMA instruction set, which is optimized for running many mathematical, linear algebra computations. Word of caution, it does not work well on VM or does not work at all on VM, unless the hypervisor can pass through FMA correctly, which is outside the scope of this work, which can be checked by “grep fma /proc/cpuinfo”, it must run on a bare metal machine.