Use Python on Pi supercomputer

According to the combations of Python versions and backend C compilers, the following Python programming platforms are available:

  1. Aanaconda 2
  2. Aanaconda 3
  3. Python 2.7 compiled against GCC
  4. Python 2.7 compiled against ICC
  5. Python 3.4 compiled against GCC
  6. Python 3.4 compiled agianst ICC

This document will shows you how to use vritualenv to set up customized Python environment in your $HOME.

By default, GCC-backend Python uses OpenBLAS as its BLAS libs, while ICC-backed uses MKL. You can consult environment modules for other BLAS implementations. Python job samples are available in /lustre/utility/pi-sample-code/python. To download packages via pip or conda, you may need to set the HTTP proxy first:

export http_proxy=http://proxy.pi.sjtu.edu.cn:3004
export https_proxy=https://proxy.pi.sjtu.edu.cn:3004

This tutorial targets for SLURM. However, it should work for LSF with slight modification.

Aanaconda 2

Anaconda is pre-compiled Python 2 distro, targeted for scientific and engineering computing. Anaconda 2 can be loaded via anaconda/2 module:

$ module load anaconda/2

Anaconda provides a tool named conda to manage separate Python environemnts. We can introduce multilple python modules when creating a python environment.

$ conda create --name mypython numpy scipy matplotlib

Activate a python environment before proceeding.

$ source activate mypython

Aanaconda 3

Anaconda is pre-compiled Python 2 distro, targeted for scientific and engineering computing. Anaconda 2 can be loaded via anaconda/2 module:3

$ module load anaconda/3

Anaconda provides a tool named conda to manage separate Python environemnts. We can introduce multilple python modules when creating a python environment.

$ conda create --name mypython numpy scipy matplotlib

Activate a python environment before proceeding.

$ source activate mypython

Python 2.7 compiled against GCC

Download virtualenv

$ wget --no-check-certificate "https://github.com/pypa/virtualenv/archive/15.0.2.zip" -O virtualenv.zip
$ unzip virtualenv.zip
$ cd virtualenv-*
$ module purge; module load gcc python/2.7
$ rm -rf ~/.pip ~/python27-gcc
$ export http_proxy=http://proxy.pi.sjtu.edu.cn:3004/; export https_proxy=http://proxy.pi.sjtu.edu.cn:3004/
$ python virtualenv.py ~/python27-gcc

Activate your python environment:

$ source ~/python27-gcc/bin/activate

pip is ready now.

$ which pip
~/python27-gcc/bin/pip

To use your own Python environment, the following line is required in your jobs script:

$ source ~/python27-gcc/bin/activate

Build numpy backed by OpenBLAS

Load Python and OpenBLAS modules:

$ module purge; module load gcc python/2.7 openblas
$ source ~/python27-gcc/bin/activate

Download and extract the numpy source code:

$ cd ~/python27-gcc && mkdir build; cd build
$ pip download numpy
$ tar xzpf numpy-*.tar.gz
$ cd numpy-*

Add configuration file site.cfg in ~/python27-gcc/build/numpy. We can insert it via Here Document:

$ cd numpy
$ cat << EOF > site.cfg
[openblas]
include_dirs = $BLASROOT/include
library_dirs = $BLASROOT/lib
openblas_libs = openblas
lapack_libs = openblas
EOF

Build and install numpy:

$ pip install .

Check the configuration:

$ grep -i openblas ~/python27-gcc/lib/python2.7/site-packages/numpy/__config__.py
lapack_opt_info={'libraries': ['openblas'], 'library_dirs': ['/lustre/usr/openblas/0.2-gcc49/lib'], 'language': 'c', 'define_macros': [('HAVE_CBLAS', None)]}
openblas_lapack_info={'libraries': ['openblas'], 'library_dirs': ['/lustre/usr/openblas/0.2-gcc49/lib'], 'language': 'c', 'define_macros': [('HAVE_CBLAS', None)]}
openblas_info={'libraries': ['openblas'], 'library_dirs': ['/lustre/usr/openblas/0.2-gcc49/lib'], 'language': 'c', 'define_macros': [('HAVE_CBLAS', None)]}
blas_opt_info={'libraries': ['openblas'], 'library_dirs': ['/lustre/usr/openblas/0.2-gcc49/lib'], 'language': 'c', 'define_macros': [('HAVE_CBLAS', None)]}

Verify the installation in ipython:

$ cd && ipython
>>> import numpy

Then other packages that depend on numpy can be installed.

$ pip install scipy qiime

mxnet environemnt module is required if you plan to try the mxnet library.

$ module load mxnet
$ cd ~/python27-gcc/build
$ git clone --recursive https://github.com/dmlc/mxnet && cd mxnet && git co 20160309 && git submodule update
$ python example/image-classification/train_mnist.py
$ cd python && pip install .

To use the Python environment with OpenBLAS-backend, the openblas environment module is required.

$ module purge; module load gcc python/2.7 openblas mxnet

Python 2.7 compiled against ICC

Download virtualenv

$ wget --no-check-certificate "https://github.com/pypa/virtualenv/archive/15.0.2.zip" -O virtualenv.zip
$ unzip virtualenv.zip
$ cd virtualenv-*
$ module purge; module load icc python/2.7
$ rm -rf ~/.pip ~/python27-icc
$ export http_proxy=http://proxy.pi.sjtu.edu.cn:3004/; export https_proxy=http://proxy.pi.sjtu.edu.cn:3004/
$ python virtualenv.py ~/python27-icc

Activate your python environment:

$ source ~/python27-icc/bin/activate

pip is ready now.

$ which pip
~/python27-icc/bin/pip

To use your own Python environment, the following line is required in your jobs script:

$ source ~/python27-icc/bin/activate

Build numpy backed by MKL

Load Python and OpenBLAS modules:

$ module purge; module load icc mkl python/2.7
$ source ~/python27-icc/bin/activate

Download and extract the numpy source code:

$ cd ~/python27-icc && mkdir build; cd build
$ pip download numpy
$ tar xzpf download/numpy-*.tar.gz
$ cd numpy-*

Add configuration file site.cfg in ~/python27-icc/build/numpy via Here Document:

$ cd numpy
$ cat << EOF > site.cfg
[mkl]
include_dirs = $MKLROOT/include
library_dirs = $MKLROOT/lib/intel64
mkl_libs = mkl_intel_lp64,mkl_intel_thread,mkl_core,mkl_rt
lapack_libs = mkl_intel_lp64,mkl_intel_thread,mkl_core,mkl_rt
EOF

Update compiler flags for ICC:

$ sed -i s/"icc -m64 -fPIC"/"icc -O3 -openmp -xhost -fPIC -m64 -shared -mkl -fp-model strict -fomit-frame-pointer"/g ./numpy/distutils/intelccompiler.py
$ sed -i s/"-xhost -openmp -fp-model strict"/"-xhost -openmp -fp-model strict -fPIC -mkl"/g ./numpy/distutils/fcompiler/intel.py

Build and install numpy:

$ pip install . config --compiler=intel build_clib --compiler=intel build_ext --compiler=intel

Check the configuration:

$ grep -i mkl ~/python27-icc/lib/python2.7/site-packages/numpy/__config__.py

Verify the installation:

$ ldd ~/python27-icc/lib/python2.7/site-packages/numpy/core/multiarray.so | grep -i mkl
libmkl_intel_lp64.so => /lustre/usr/intel_cluster_studio/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_intel_lp64.so (0x00002b3bc185f000)
libmkl_intel_thread.so => /lustre/usr/intel_cluster_studio/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_intel_thread.so (0x00002b3bc2173000)
libmkl_core.so => /lustre/usr/intel_cluster_studio/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_core.so (0x00002b3bc3594000)
libmkl_rt.so => /lustre/usr/intel_cluster_studio/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_rt.so (0x00002b3bc50f2000)

Then other packages that depend on numpy can be installed.

$ pip install scipy qiime

To use the Python environment with MKL-backend, the openblas environment module is required.

$ module purge; module load icc python/2.7 mkl

Python 3.4 compiled against GCC

Download virtualenv

$ wget --no-check-certificate "https://github.com/pypa/virtualenv/archive/15.0.2.zip" -O virtualenv.zip
$ unzip virtualenv.zip
$ cd virtualenv-*
$ module purge; module load gcc python/3.4
$ rm -rf ~/.pip ~/python34-gcc
$ export http_proxy=http://proxy.pi.sjtu.edu.cn:3004/; export https_proxy=http://proxy.pi.sjtu.edu.cn:3004/
$ python3 virtualenv.py ~/python34-gcc

Activate your python environment:

$ source ~/python34-gcc/bin/activate

pip3 is ready now.

$ which pip3
~/python34-gcc/bin/pip3

To use your own Python environment, the following line is required in your jobs script:

$ source ~/python34-gcc/bin/activate

Build numpy backed by OpenBLAS

Load Python and OpenBLAS modules:

$ module purge; module load gcc python/3.4 openblas
$ source ~/python34-gcc/bin/activate

Download and extract the numpy source code:

$ cd ~/python34-gcc && mkdir build; cd build
$ pip3 download numpy
$ tar xzpf download/numpy-*.tar.gz
$ cd numpy-*

Add configuration file site.cfg in ~/python34-gcc/build/numpy. We can insert it via Here Document:

$ cd numpy
$ cat << EOF > site.cfg
[openblas]
include_dirs = $BLASROOT/include
library_dirs = $BLASROOT/lib
openblas_libs = openblas
lapack_libs = openblas
EOF

Build and install numpy:

$ pip install .

Confirm that numpy libs have been built against OpenBLAS:

$ ldd ~/python34-gcc/lib/python3.4/site-packages/numpy/core/multiarray.cpython-34m.so | grep -i openblas
libopenblas.so.0 => /lustre/usr/openblas/0.2-gcc49/lib/libopenblas.so.0 (0x00002b4a15db6000)

Then other packages that depend on numpy can be installed. Warning: qiime is not compatible with Python 3 yet.

$ pip3 install scipy

To use the Python environment with OpenBLAS-backend, the openblas environment module is required.

module purge; module load gcc python/3.4 openblas

Python 3.4 compiled against ICC

Download virtualenv

$ wget --no-check-certificate "https://github.com/pypa/virtualenv/archive/15.0.2.zip" -O virtualenv.zip
$ unzip virtualenv.zip
$ cd virtualenv-*
$ module purge; module load icc python/3.4
$ rm -rf ~/.pip ~/python34-icc
$ export http_proxy=http://proxy.pi.sjtu.edu.cn:3004/; export https_proxy=http://proxy.pi.sjtu.edu.cn:3004/
$ python3 virtualenv.py ~/python34-icc

Activate your python environment:

$ source ~/python34-icc/bin/activate

pip3 is ready now.

$ which pip3
~/python34-icc/bin/pip3

To use your own Python environment, the following line is required in your jobs script:

source ~/python34-icc/bin/activate

Build numpy backed by MKL

Load Python and OpenBLAS modules:

$ module purge; module load icc mkl python/3.4
$ source ~/python34-icc/bin/activate

Download and extract the numpy source code:

$ cd ~/python34-icc && mkdir build; cd build
$ pip3 download numpy
$ tar xzpf download/numpy-*.tar.gz
$ cd numpy-*

Add configuration file site.cfg in ~/python34-icc/build/numpy via Here Document:

$ cd numpy
$ cat << EOF > site.cfg
[mkl]
include_dirs = $MKLROOT/include
library_dirs = $MKLROOT/lib/intel64
mkl_libs = mkl_intel_lp64,mkl_intel_thread,mkl_core,mkl_rt
lapack_libs = mkl_intel_lp64,mkl_intel_thread,mkl_core,mkl_rt
EOF

Update compiler flags for ICC:

$ sed -i s/"icc -m64 -fPIC"/"icc -O3 -openmp -xhost -fPIC -m64 -shared -mkl -fp-model strict -fomit-frame-pointer"/g ./numpy/distutils/intelccompiler.py
$ sed -i s/"-xhost -openmp -fp-model strict"/"-xhost -openmp -fp-model strict -fPIC -mkl"/g ./numpy/distutils/fcompiler/intel.py

Build and install numpy:

$ pip install . config --compiler=intel build_clib --compiler=intel build_ext --compiler=intel

Check the configuration:

$ grep -i mkl ~/python34-icc/lib/python3.4/site-packages/numpy/__config__.py

Verify the installation:

$ ldd ~/python34-icc/lib/python3.4/site-packages/numpy/core/multiarray.cpython-34m.so | grep -i mkl
libmkl_intel_lp64.so => /lustre/usr/intel_cluster_studio/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_intel_lp64.so (0x00002ac808e47000)
libmkl_intel_thread.so => /lustre/usr/intel_cluster_studio/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_intel_thread.so (0x00002ac80975b000)
libmkl_core.so => /lustre/usr/intel_cluster_studio/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_core.so (0x00002ac80ab7c000)
libmkl_rt.so => /lustre/usr/intel_cluster_studio/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_rt.so (0x00002ac80c6da000)

Then other packages that depend on numpy can be installed.

$ pip3 install scipy qiime

To use the Python environment with MKL-backend, the openblas environment module is required.

module purge; module load icc python/3.4 mkl

An iPython Trick

To use tensorflow or other modules installed via pip in iPython or Jupyter, it may require to set the library path first. Please replace HOME and ENVNAME with your home dir and anaconda environment name separately.

$ ipython
In [1]: import sys, pprint
In [2]: pprint.pprint(sys.path)
['',
...libraries from pip are missing...
]
In [3]: sys.path.append('/HOME/.conda/envs/ENVNAME/lib/python3.6/site-packages')
In [4]: import tensorflow

Reference