We recieved a DGX station a few weeks ago and it is a powerhouse. We will not dive into the specs of the machine since it was covered in a previous post, where we summarized the time saved by using a DGX for various training tasks. In this post we show how to add Python 3 compatibility to a DGX optimized container.
When you purchase a DGX-1 or DGX Station, NVIDIA provides access to the DGX cloud, which serves optimized docker containers for immediate use in machine learning efforts that utilize the DGX. This reduces the level of effort required in getting your projects up and running. To date, Python 3 is not supported in the provided DGX optimized containers. With Python 2.7 no longer being under active development (only security updates), the end-of-life coming in 2020, and most widely used packages being Python 3 compatible (95% of the top 360 used Python packages are Python 3 compatible), starting a project and sticking with Python 2.7 may not be desired.
Looking through the logs and history of the docker images, many flags and settings are configured specifically for the DGX. These are meant to optimize performance. According to NVIDIA, containers on the DGX cloud "benefit from continuous NVIDIA development, ensuring each deep learning framework is tuned for the fastest training possible on the latest NVIDIA GPUs. NVIDIA engineers continually optimize libraries, drivers, and containers, delivering monthly updates to ensure that users’ deep learning investments reap greater returns over time". If you are considering creating a docker image from scratch, you may lose the performance benefits received from using a DGX optimized container.
We decided to add Python 3 compatibility to NVIDIA's optimized containers without changing any of the flags and source used in the original build, except obviously the
PYTHON_BIN_PATH which determines which version of python to use.
Most of the custom networks created at KickView are written using Tensorflow, so we focussed on updating the Tensorflow docker image first. If you'd like to update your own docker image to support Python 3 feel free to follow these steps:
- Create a directory to hold the Dockerfile
- Create a Dockerfile that will hold the list of instructions used to build the new tensorflow3 image.
- Paste the following code into the Dockerfile
FROM nvcr.io/nvidia/tensorflow:17.10 MAINTAINER Kyle Muchmore <email@example.com> ENV PYTHON_BIN_PATH=/usr/bin/python3 ENV PYTHON_LIB_PATH=/usr/local/lib/python3.5/dist-packages RUN apt-get update && apt-get install -y --no-install-recommends \ pkg-config \ python3 \ python3-dev && \ rm -rf /var/lib/apt/lists/* RUN cd /tmp/ && \ curl -O https://bootstrap.pypa.io/get-pip.py && \ python3 get-pip.py && \ rm get-pip.py RUN pip3 install --upgrade --no-cache-dir numpy==1.11.0 pexpect psutil RUN cd /opt/tensorflow/ && \ yes "" | ./configure && \ bazel build -c opt --config=cuda tensorflow/tools/pip_package:build_pip_package && \ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/pip3 && \ pip3 install --upgrade /tmp/pip3/tensorflow-*.whl && \ rm -rf /tmp/pip3/tensorflow-*.whl && \ bazel clean --expunge
In a nutshell, this is going to create a new image based on the optimized image provided by NVIDIA, set the environment variables that determine the Python version used to build Tensorflow to now use Python 3, install the dependencies needed by Tensorflow to build, then run the same build command used by NVIDIA to compile the Tensorflow on the base image. This will also use the same source code used by the base image.
To build this new image just issue the following build command from the directory containing the Dockerfile:
docker build -t nvcr.io/<company_name>/tensorflow3:<tag> .
After a few minutes you will have a Tensorflow image that has a Python 3 and Python 2.7 compatible version of Tensorflow available. Start the container running either Python 2.7 or Python 3, and the correct version of Tensorflow will be used when calling
import tensorflow as tf
We ended up taking this effort further and making a few more images based off of our new Python 3 compatible tensorflow3 image including Keras and OpenCV.
NVIDIA Enterprise support has told us that they will be adding Python 3 support, but they haven't provided an exact timeframe. In the meantime, this should get you up and running on DGX!