Machine Learning Corner
    Menu
    • ERPComputing.com
      • Menu

        Build Tensorflow from source, MKL, AVX2, FMA enabled

        Dr. George Jen


        Advanced Vector Extensions (AVX, also known as Sandy Bridge New Extensions) are extensions to the x86 instruction set architecture for microprocessors from Intel and AMD proposed by Intel in March 2008 and first supported by Intel with the Sandy Bridge processor shipping in Q1 2011 and later on by AMD with the Bulldozer processor shipping in Q3 2011. AVX provides new features, new instructions and a new coding scheme.

        AVX2 expands most integer commands to 256 bits and introduces fused multiply-accumulate (FMA) operations. AVX-512 expands AVX to 512-bit support using a new EVEX prefix encoding proposed by Intel in July 2013 and first supported by Intel with the Knights Landing processor, which shipped in 2016

        Benefiting applications:

        ·        Suitable for floating point-intensive calculations in multimedia, scientific and financial applications (AVX2 adds support for integer operations).

        ·        Increases parallelism and throughput in floating point SIMD calculations.

        ·        Reduces register load due to the non-destructive instructions.

        ·        Improves Linux RAID software performance (required AVX2)

         

        Currently stock tensorflow installable by conda or pip does not come with AVX2/FMA enabled. If you need to have AVX2/FMA enabled in Tensorflow, you will need to compile from open source code, below is how I have built tensorflow with AVX2/FMA enabled on a virtualbox 6.0 CentOS 7.0 VM guest environment.

        1. Install anacounda Python 3 (in this writing, it is Python 3.7):

        https://www.anaconda.com/download/#linux

        2. Install Bazel build tool, which is used to build tensorflow:

        sudo yum install bazel

        you can check version installed by

        bazel --help

        3. Clone current tensorflow master branch source

         git clone https://github.com/tensorflow/tensorflow.git

        4. Change to tensorflow root folder and check all branches:

        cd tensorflow

        git branch -a

        5. In this case, release branch r1.13 appears to be the latest one (You would want to look for whatever latest release), check out it

        git checkout -b r1.13


        6. Verify locally

        git branch

        7. Pull the release r.13

        git pull

        git branch --set-upstream-to=origin/r1.13 r1.13

        8. Install high performance portable MPI support for Tensorflow:

        mkdir nfs

        cd nfs

        wget http://www.mpich.org/static/downloads/3.1.4/mpich-3.1.4.tar.gz

        You might already have below installed, just make sure by do it again:

        sudo yum install gcc gcc-c++ gcc-fortran kernel-devel -y

        tar -xvf mpich-3.1.4.tar.gz

        mkdir ~/nfs/mpich3

        ./configure --prefix=/opt/hadoop/nfs/mpich3

        make && make install

        vi ~/.bashrc

        #Append below

        export PATH=/opt/hadoop/nfs/mpich3/bin:$PATH

        export LD_LIBRARY_PATH="/opt/hadoop/nfs/mpich3/lib:$LD_LIBRARY_PATH"

        save, exit

        source ~/.bashrc

         

        9. Install OpenMP runtime dev package, search and install:

        sudo yum search libomp-dev

        sudo yum install centos-release-scl-rh

        sudo yum install llvm-toolset-7-libomp-devel

         


         

        10. In tensorflow root folder, run configure

        ./configure

        Following is my answers to the configure questions:

         

        -bash-4.2$ ./configure

        WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".

        INFO: Invocation ID: fc1a812a-c01a-4620-a9d9-dca8621b7f1a

        You have bazel 0.21.0- (@non-git) installed.

        Please specify the location of python. [Default is /opt/hadoop/anaconda3/bin/python]:

        Found possible Python library paths:

          /opt/hadoop/anaconda3/lib/python3.7/site-packages

        Please input the desired Python library path to use.  Default is [/opt/hadoop/anaconda3/lib/python3.7/site-packages]

         

        Do you wish to build TensorFlow with XLA JIT support? [Y/n]: n

        No XLA JIT support will be enabled for TensorFlow.

         

        Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n

        No OpenCL SYCL support will be enabled for TensorFlow.

         

        Do you wish to build TensorFlow with ROCm support? [y/N]: n

        No ROCm support will be enabled for TensorFlow.

         

        Do you wish to build TensorFlow with CUDA support? [y/N]: n

        No CUDA support will be enabled for TensorFlow.

         

        Do you wish to download a fresh release of clang? (Experimental) [y/N]: y

        Clang will be downloaded and used to compile tensorflow.

         

        Do you wish to build TensorFlow with MPI support? [y/N]: y

        MPI support will be enabled for TensorFlow.

         

        Please specify the MPI toolkit folder. [Default is /opt/hadoop/nfs/mpich3]:

        Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]:

        Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n

        Not configuring the WORKSPACE for Android builds.

        Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.

                --config=mkl            # Build with MKL support.

                --config=monolithic     # Config for mostly static monolithic build.

                --config=gdr            # Build with GDR support.

                --config=verbs          # Build with libverbs support.

                --config=ngraph         # Build with Intel nGraph support.

                --config=dynamic_kernels        # (Experimental) Build kernels into separate shared objects.

        Preconfigured Bazel build configs to DISABLE default on features:

                --config=noaws          # Disable AWS S3 filesystem support.

                --config=nogcp          # Disable GCP support.

                --config=nohdfs         # Disable HDFS support.

                --config=noignite       # Disable Apacha Ignite support.

                --config=nokafka        # Disable Apache Kafka support.

                --config=nonccl         # Disable NVIDIA NCCL support.

        Configuration finished

         


         

        configure script generates a file called .tf_configure.bazelrc

         

        -bash-4.2$ cat .tf_configure.bazelrc

        build --action_env PYTHON_BIN_PATH="/opt/hadoop/anaconda3/bin/python"

        build --action_env PYTHON_LIB_PATH="/opt/hadoop/anaconda3/lib/python3.7/site-packages"

        build --python_path="/opt/hadoop/anaconda3/bin/python"

        build:xla --define with_xla_support=true

        build --action_env TF_NEED_OPENCL_SYCL="0"

        build --action_env TF_NEED_ROCM="0"

        build --action_env TF_NEED_CUDA="0"

        build --action_env TF_DOWNLOAD_CLANG="1"

        build --config=download_clang

        test --config=download_clang

        build:None --define with_mpi_support=true

        build --config=None

        build:opt --copt=-march=native

        build:opt --copt=-Wno-sign-compare

        build:opt --host_copt=-march=native

        build:opt --define with_default_optimizations=true

        build:v2 --define=tf_api_version=2

         


         

        However, I manually modified it:

        -bash-4.2$ cat .tf_configure.bazelrc

        build --action_env PYTHON_BIN_PATH="/opt/hadoop/anaconda3/bin/python"

        build --action_env PYTHON_LIB_PATH="/opt/hadoop/anaconda3/lib/python3.7/site-packages"

        build --python_path="/opt/hadoop/anaconda3/bin/python"

        build --define with_jemalloc=true

        build:gcp --define with_gcp_support=true

        build:hdfs --define with_hdfs_support=false

        build:s3 --define with_s3_support=false

        build:xla --define with_xla_support=false

        build:gdr --define with_gdr_support=false

        build:verbs --define with_verbs_support=false

        build --action_env TF_NEED_OPENCL="0"

        build --action_env GCC_HOST_COMPILER_PATH="/usr/bin/gcc"

        build:opt --cxxopt=-mavx --copt=-mavx --host_cxxopt=-march=native --host_copt=-march=native

        build:opt --cxxopt=-mavx2 --copt=-mavx2 --host_cxxopt=-march=native --host_copt=-march=native

        build:opt --cxxopt=-mfma --copt=-mfma --host_cxxopt=-march=native --host_copt=-march=native

        build:opt --cxxopt=-mfpmath=both --copt=-mfpmath=both --host_cxxopt=-march=native --host_copt=-march=native

        build:mkl --define using_mkl=true

        build:mkl -c opt

        build:mkl --copt="-DEIGEN_USE_VML"

        build:monolithic --define framework_shared_object=false

        build --define framework_shared_object=true

        build:android --crosstool_top=//external:android/crosstool

        build:android --host_crosstool_top=@bazel_tools//tools/cpp:toolchain

        build:android_arm --config=android

        build:android_arm --cpu=armeabi-v7a

        build:android_arm64 --config=android

        build:android_arm64 --cpu=arm64-v8a

         


         

        11. Before running bazel build command line, installed and/or upgrade required python libraries, or your build will fail.

        conda install -c conda-forge keras-applications

        conda install -c conda-forge keras-preprocessing

        pip install scipy --upgrade

        pip install cython --upgrade

        pip install pandas --upgrade

        pip install h5py --upgrade

         

        12. The start run bazel build:

        nohup bazel build -c opt --jobs 1 --local_resources 2048,0.5,1.0 --verbose_failures --define=mkl_dnn_no_ares=true --config=mkl --config=opt //tensorflow/tools/pip_package:build_pip_package&

         

        On my VM with 2 CPU/12GB memory, it took about 3 hours to complete build.

         

        13. Create a tensorflow wheel file:

        bazel-bin/tensorflow/tools/pip_package/build_pip_package ../tensorflow_pkg

        Which it will create a folder parallel to tensorflow root folder, called tensorflow_pkg, that contains

        tensorflow-1.12.0-cp37-cp37m-linux_x86_64.whl

        14.  scp the whl file to a physical machine with same version of Python (must be 3.7, or must be the same Python 3 version this wheel is built upon), because VM does not support AVX2 FMA instruction set and import tensorflow (this build) will core dump.

        15. On a physical computer with Intel CPU (my I7 machine), install this tensorflow wheel:

        pip install tensorflow-1.12.0-cp37-cp37m-linux_x86_64.whl

        16.  Now you have tensorflow compiled with Intel AVX2 FMA instruction set, which is optimized for running many mathematical, linear algebra computations.  Word of caution, it does not work well on VM or does not work at all on VM, unless the hypervisor can pass through FMA correctly, which is outside the scope of this work, which can be checked by “grep fma /proc/cpuinfo”, it must run on a bare metal machine.


          close lightbox