Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
ThorstenHa
Advisor
Advisor
4,590

Introduction


As a Data Scientist or Data Engineer you might not be too familiar hands-on with Docker. At least this was my start. I knew about the appealing concept of containerising applications but when developing pipelines or operators on SAP Data Intelligence I was always happy when having an existing docker image that I could use. With time requests had come to leave my comfort zone and to learn more about using Docker. Eventually I had to realise that working with docker directly is not that hard as expected and the learning curve is rather short and steep than painstakingly long.

In this blog I give a short introduction of Docker from an SAP Data Intelligence angle. This is followed firstly how to add python packages with pip and secondly what needs to be done if another package manager is required. Finally I delve into the challenge when more elaborate installation tasks had to be added to a Dockerfile. For the sake of your nerves and fingernails this should be done and tested interactively before building an image on a SAP Data Intelligence instance.

Docker on SAP Data Intelligence


In general you can use any docker image to run on DI. You only have to ensure that it is correctly tagged so that the pipeline scheduler can select the appropriate docker container that provides the libraries required by the operators.

You might run into the challenge of using operators having tags that none of the existing docker image complies with, e.g. 'flowagent' and 'python36'. Then either you

  • group parts of the pipeline for running them in different docker containers with the caveat of the data volume restriction or

  • enhance one of the images with the necessary packages


From performance reasons you might consider running a pipeline in one container then spread it to multiple ones.

Enhancing Existing Docker Images with pip


SAP has an enterprise support aggreement with Suse and uses SLES as the basis for most of the operators. If you like for example add python packages like 'pandas' then you can select the base image with the reference character '$'
FROM $com.sap.sles.base

or directly pull the image from the repository with the reference character '§'
FROM  §/com.sap.datahub.linuxx86_64/sles:15.0-sap-007

The latter might miss some enhancements that might be added to the Dockerfile in com.sap.sles.base. With that method you can also inherit from non-standard images that have been built and pushed to the local Docker registry from outside of SAP Data Hub / SAP Data Intelligence (on premise). This is often required when it is only allowed to use trusted images that have been hardened according to the company policy. The syntax is as follows: FROM §/<image-name-in-repo>:<version>.

With SAP Data Intelligence 3.0 you are required to run containers not a 'root' user. That means you have to add group and a user to each docker/container:
RUN groupadd -g 1972 cmddata && useradd -g 1972 -u 1972 -m cmddata
USER 1972:1972
WORKDIR "/home/cmddata"
ENV HOME=/home/cmddata
ENV PATH="${PATH}:${HOME}/.local/bin"

In addition I recommend to set some environment variables accordingly. In particular adding the user 'bin/' directory in case binaries are installed there as well.

Finally your new Dockerfile might look like:
FROM $com.sap.sles.base

RUN groupadd -g 1972 cmddata && useradd -g 1972 -u 1972 -m cmddata
USER 1972:1972
WORKDIR "/home/cmddata"
ENV HOME=/home/cmddata
ENV PATH="${PATH}:${HOME}/.local/bin"

RUN python3.6 -m pip --no-cache-dir install 'pandas' --user
RUN python3.6 -m pip --no-cache-dir install 'scikit-learn' --user

Do not forget adding the option '--user' to the pip command to ensure that the package is only installed with user authorities.

It is very important that you tag the new Docker image not only with the newly added packages but also refer to the tags of the base image. There is currently (SAP DI 2.6)  no inheritance process in place. In our particular case it would like as

  • default

  • sles

  • python36

  • tornado - 5.0.2

  • pandas

  • scikit-learn


Enhancing Existing Docker Images with other Package Manager


Enhancing the SAP provided and maintained imagages has its limitations because you can only use 'pip' for installing python packages. If the use of other package managers like 'apt-get' from ubuntu, 'zypper' from suse, etc. is necessary then you have to fall back to openly available images.

Fortunately there is already an image that contains the basic packages and can be enhanced as you like. It can be found in the Modeler ->repository/dockerfiles folder with the path:
$com.sap.opensuse.golang.zypper

and the definition:
FROM $com.sap.sles.base

RUN groupadd -g 1972 cmddata && useradd -g 1972 -u 1972 -m cmddata
USER 1972:1972
WORKDIR "/home/cmddata"
ENV HOME=/home/cmddata
ENV PATH="${PATH}:${HOME}/.local/bin"

ARG GOPATH=/gopath
ARG GOROOT=/goroot

ENV GOROOT=${GOROOT}
ENV GOPATH=${GOPATH}
ENV PATH=${GOROOT}/bin:${GOPATH}/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

RUN zypper --non-interactive update && \
# Install tar, gzip, python, python3, pip, pip3, gcc and libgthreadzypper --non-interactive install --no-recommends --force-resolution \
tar \
gzip \
python3 \
python3-pip \
gcc=7 \gcc-c++=7 \
libgthread-2_0-0=2.54.3 &&

# Install tornado
python3 -m pip --no-cache install tornado==5.0.2 --user

COPY sapgolang.tar.gz /tmp/sapgolang.tar.gz

RUN mkdir -p $GOROOT && \
tar -xzf /tmp/sapgolang.tar.gz --strip-components=1 -C ${GOROOT}

and the tags

  • opensuse

  • python36

  • tornado - 5.0.2

  • sapgolang - 1.12.1-bin

  • zypper


This base image enables you to run the package manager "zypper" for installing further packages to the image e.g.:
RUN zypper in gcc-fortran

Interactively Creating Dockerfiles


If you need to build more complex Dockerfiles than just adding a couple of simple packages with pip and zypper then you are strongly advised to do so locally first before adding lines in the Dockerfile on a SAP Data Intelligence instance unless you are an exceptional OS-admin and Docker guru. If you belong to the more ordinary kind of a developing data scientist or data engineer, the fast try-and-error approach might be more appropriate. This means you need to install Docker first locally,

and maybe read about the limited number of commands you are going to use in Dockerfiles. On my opinion a Dockerfile is just an installation batch-script that processes the commands outlined. In the vastness of the internet you are going to find hosts of good introductory pages to Docker.

In the following I take up a request from a customer in the meteorology business to use special libraries needed to write operators in Python. My first trial was just to add the necessary lines to my most favourite Docker image ($com.sap.sles.base)
RUN zypper addrepo https://download.opensuse.org/repositories/home:SStepke/openSUSE_Leap_15.0/home:SStepke.repo
RUN zypper refresh
RUN zypper install eccodes

and fell flat on my face. The succinct error message just told me that the build has failed.

So I started my search for enlightenment locally with the base image *opensuse/leap:15.0* and the basic extension of the Dockerfile '$com.sap.opensuse.golang.zypper'.

Preparation


I created a directory that contains the Dockerfile '$com.sap.opensuse.golang.zypper.Dockerfile' and 'sapgolang.tar.gz' because the latter is needed as well.



Then I opened a terminal, went to the above folder and started a build process with
docker build --tag eccodes .

and after a some time I got a list of my images with the command
docker images

$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
eccodes latest 44b88839c5b3 44 seconds ago 661MB
opensuse/leap 15.0 7b6c420ec38e 9 days ago 104MB

with
docker images --all

I could see that it was a stacked building process where a lot of child images had been produced.
$ docker images --all
REPOSITORY TAG IMAGE ID CREATED SIZE
eccodes latest 44b88839c5b3 3 minutes ago 661MB
<none> <none> 4dbeed3246d5 3 minutes ago 533MB
<none> <none> ad60d5a15d70 3 minutes ago 530MB
<none> <none> 773c4c187f90 3 minutes ago 526MB
<none> <none> 4744b754f3a7 3 minutes ago 517MB
<none> <none> fbfaf8d1d6c0 11 minutes ago 104MB
<none> <none> 55a13db79639 11 minutes ago 104MB
<none> <none> 7cd45134c515 11 minutes ago 104MB
<none> <none> ab159e9ee696 11 minutes ago 104MB
<none> <none> c7eeb77d5357 11 minutes ago 104MB
opensuse/leap 15.0 7b6c420ec38e 9 days ago 104MB

If the before mentioned new lines are added for installing the additional repository and the eccodes package then the image build is much faster but finally fails as well.

But now having the image locally I could run the docker container interactively using the shell and could test all commands step-by-step.

Step by Step Installation of  a new Docker Image


For the step-by-step installation I first needed to run the container interactively
eccodes-di d051079$ docker run -it eccodes bash (or  eccodes-di d051079$ docker run -it eccodes sh)

With this I am in the container and can enter the commands needed for the new Docker image.

1. Command


9b07363dfa92:/ # zypper addrepo https://download.opensuse.org/repositories/home:SStepke/openSUSE_Leap_15.0/home:SStepke.repo ``` 

->  - No issue

2. Command


9b07363dfa92:/ # zypper refresh
Retrieving repository 'SStepke's Home Project (openSUSE_Leap_15.0)' metadata ---------------------------------------------------------------[\]

New repository or package signing key received:

Repository: SStepke's Home Project (openSUSE_Leap_15.0)
Key Name: home:SStepke OBS Project <home:SStepke@build.opensuse.org>
Key Fingerprint: 02C16E40 E54FD96B 57CBFA85 B1A9061F 7E4A4A2F
Key Created: Tue Nov 6 15:33:51 2018
Key Expires: Thu Jan 14 15:33:51 2021
Rpm Name: gpg-pubkey-7e4a4a2f-5be1b45fDo you want to reject the key, trust temporarily, or trust always? [r/t/a/?] (r):

This is an interactive command where the default was not helping at all. With some internet research I got the answer by adding the option --gpg-auto-import-keys.

3. Command


9b07363dfa92:/ # zypper --non-interactive install eccodes 

ran when the option ```--non-interactive``` has been added.

Summary


Here we go. Now I had all the commands tested and the Dockerfile ran without complaints when the following  3 lines are added
RUN zypper addrepo https://download.opensuse.org/repositories/home:SStepke/openSUSE_Leap_15.0/home:SStepke.repo
RUN zypper refresh --gpg-auto-import-keys
RUN zypper --non-interactive install eccodes

Conclusion


With these learnings I am prepared to tackle a lot of challenges coming across when working with enhancing Dockerfiles with pip and  zypper package managers. Now I do not shy away when there is an ask for some sophisticated tasks like adding binaries, setting system variables etc.

Reference


SAP DI Help - Create Docker

SAP DI Help - Docker Inheritance
16 Comments