Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
Showing results for 
Search instead for 
Did you mean: 
Product and Topic Expert
Product and Topic Expert
With the rise of Large Language Models (LLMs), the potential to incorporate this cutting-edge technology into SAP-managed processes has become increasingly prominent. This blog post is part of a comprehensive series aimed at providing a detailed, hands-on tutorial on leveraging LLMs within the BTP (Business Technology Platform).


As developers of a custom AI solution, our objective is to deploy a mid-sized Language Model to AI Core for efficient inference in our processes. This blog will specifically delve into the technical fundamentals of constructing a Dockerfile to serve the model, with a particular emphasis on creating GPU-enabled images to enhance performance and acceleration.

Scenario overview

How content is deployed to AI Core:

AI Core serves as a container-based engine specifically designed for executing training and serving workloads. As a developer, you are responsible for creating two primary artifacts. Firstly, a template is crafted to define how AI Core should deploy the content. This template outlines the necessary resources, specifies the input artifacts to be loaded, and identifies the configuration parameters that require consideration.

Secondly, the actual content itself takes the form of a Docker container, which is bundled with all the essential dependencies, including libraries like Pytorch, Tensorflow, and more. Additionally, this container incorporates the custom implementation of logic, often in the form of Python scripts, tailored to meet the specific requirements of the AI solution. This comprehensive packaging ensures that the AI model and its supporting elements are efficiently encapsulated, facilitating seamless deployment and execution within the AI Core environment.

Choosing a base image:

Choosing a suitable base image for Docker is a critical decision, especially within the machine learning domain, where Linux operating systems are commonly employed. A key concept to consider is the layering of Docker images, which allows us to build upon existing images rather than installing every dependency from scratch. This inheritance mechanism significantly streamlines the entire process.

In practice, we aim to select a base image that already includes a compatible installation of CUDA, if applicable, and ideally encompasses the core Python libraries that are essential for our AI solution. By starting with a base image that meets these criteria, we can avoid redundant setup tasks and focus on customizing the container with the specific components and logic required for our AI application. This approach not only simplifies the Docker image construction but also ensures a seamless integration with AI Core, leading to more efficient and effective deployment and execution of our model.

Typically a nvidia/cuda base image is used. For this example we additionally need pytorch and transformers installed, that's why we opt for huggingface/transformers-gpu.

Understanding CUDA on AI Core:

CUDA, the tool utilized for harnessing GPU capabilities in ML workloads, plays a pivotal role in most cutting-edge libraries that leverage GPU acceleration. To gain insights into the foundational aspects to be considered, refer to the component overview from Nvidia.

CUDA Components

When selecting the base image, we opt for one equipped with the CUDA toolkit. However, it is essential to note that CUDA comprises two primary components: drivers and the toolkit. Typically, the drivers are installed directly on the host machine, and the same holds for AI Core. To enable the upper layer to locate these installed drivers accurately, we must specify the following commands:

RUN export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/local/cuda-10.0/targets/x86_64-linux/lib:/usr/local/cuda-10.2/targets/x86_64-linux/lib:/usr/local/cuda-11/targets/x86_64-linux/lib:/usr/local/cuda-11.6/targets/x86_64-linux/lib/stubs:/usr/local/cuda-11.6/compat:/usr/local/cuda-11.6/targets/x86_64-linux/lib
RUN export PATH=$PATH:/usr/local/cuda-11/bin

This snippet configures the environment variables LD_LIBRARY_PATH and PATH. These variables are analogous in function to the familiar Windows path variables and serve to indicate the locations in the file system where shared libraries are located. The CUDA toolkit utilizes these libraries to locate the drivers. To confirm GPU accessibility at the Python layer, execute the following code:
import torch

File System Access:

Certain libraries may require write access to the disk, such as when downloading models or performing data transformations. In the context of AI Core, it is essential for the executing script to comply with the host system's policies. A common approach involves granting permissions to write to a specific directory, following standard Linux conventions, as exemplified below:
RUN chgrp -R nogroup /serving && \
chmod -R 777 /serving

Multi stage Image:

To optimize Docker usage effectively, a recommended approach is to employ multi-stage images for debugging. This involves segregating the application logic, which can change frequently during debugging, from the dependency installation, which tends to change less frequently. This can be achieved using a single Dockerfile as demonstrated below or by utilizing multiple Dockerfiles and referencing the previously built ones. In the example we build a base image with the requirements and copy the application logic on top.

Make sure to specify the stage you want to build in the docker build command.
FROM python AS base

RUN python3 -m pip install --no-cache-dir --upgrade pip

WORKDIR /serving
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt

FROM base as final

COPY /application-logic /application-logic

Full Example:

Here we get the full example used in my next blog post: Deploying Language Models on AI Core. Where I use huggingface to deploy a LLM for inference. It uses a huggingface base image with all the dependencies for the transformers library installed. On top I install some libraries used for serving.
FROM huggingface/transformers-gpu AS base

RUN python3 -m pip install --no-cache-dir --upgrade pip

WORKDIR /serving
COPY requirements_gpu.txt requirements_gpu.txt
RUN pip3 install -r requirements_gpu.txt

FROM base as final


RUN export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/local/cuda-10.0/targets/x86_64-linux/lib:/usr/local/cuda-10.2/targets/x86_64-linux/lib:/usr/local/cuda-11/targets/x86_64-linux/lib
RUN export PATH=$PATH:/usr/local/cuda-11/bin

# file system
RUN chgrp -R nogroup /serving && \
chmod -R 777 /serving

ENV TRANSFORMERS_CACHE=/serving/transformerscache
COPY /serving /serving

ENV MODEL_NAME="EleutherAI/gpt-j-6B"

CMD ["uvicorn", "app:api", "--host", "", "--port", "8080"]

I hope this blog gave you a understanding of what you have to think of when building Docker images for AI Core, free to leave a comment.