Technology Blog Posts by SAP
cancel
Showing results for 
Search instead for 
Did you mean: 
felixbartler
Product and Topic Expert
Product and Topic Expert
1,354

In this brief blog post, I want to share an issue I've encountered multiple times with SAP AI Core. While packaging my Docker container using a standard Dockerfile, Python scripts, and execution commands, everything seemed fine locally. However, I encountered the following promptOSError: [Errno 13] Permission denied: '/nonexistent'.

felixbartler_0-1716287121028.png

First time I was confused, why do I see a difference in the execution of the Docker container compared to my local environment? Especially with Docker which is a technology specifically targeted at portability of code?

TL;DR AI Core runs our containers with the nobody user. It has special restrictions to write files on the file system in the container. To fix, you need to make sure to point the file I/O to a path with proper permissions.

Scenario:

Now let's look at the two examples I encountered this: First was with hugginface's transformers library. When loading models from the hub, they are typically cached in the file system. As the user of the library, you are free to control this path by setting the respective environment variable - BUT per default, it tries to write to the users "HOME" directory. This is a common practice in libraries and would result in something like a file write to /username/.cache/huggingface/hub. Per-se this is totally unproblematic. But this is something you will not see in the error log from the headline. This writing to the default HOME directory of the user is typically realized by using a path like ~/.cache including the tilde character.

The second example I stumbled on is from the AI Core tutorial. In there we use Scikit learn to build a sample ML Pipeline. 

 

# Load Datasets
from sklearn import datasets
data_house = datasets.fetch_california_housing()

 

During the first steps, the Python Script is downloading sample data from a online source. This data is cached locally in a file under the users home directory. This script executed perfectly fine on a local Docker container. The funny thing, in the tutorial it is even shown how to fix this issue: By setting a dedicated path to the download function like /app/data - which has been setup with additional read and write permissions.

Issue Root Cause:

But now why do I get this Error? And how is the environment different compared to locally?

First of all to the meaning of the error: Permission denied to write to a certain location in the file system is a common issue indicating, that the executing user does not have the proper privileges to write to that location. But where does the /nonexistant come from?

Turns out, we have a number of security settings for the containers. Locally, our container is per Docker default running on the root user. A user with a wide range of permissions, basically able to read and write from all files - and having his own home directory set to an existing folder /root. We can validate this easily by jumping into our containers shell and typing:

 

how to run the shell on a container
docker run -it python:3.7 /bin/sh

commands to validate
# whoami
root
# echo ~
/root

 

On AI Core this is slightly different. If we check which user is running our Python Script there, we will see something like this:

 

# whoami
nobody
# echo ~
/nonexistent

 

And this now explains the original Permission denied error. First of all this nobody user is a standard, predefined, non-privileged user in Unix-like systems. This user typically has very limited permissions and is used for running unprivileged processes. Its purpose is to ensure that processes have minimal access to the filesystem and system resources, adhering to the principle of least privilege. This reduces the risk of privilege escalation attacks. If a container is compromised, the attacker has very limited permissions, making it harder to affect the host system or other containers.

In fact: We do not have any file-IO permissions by default. This means that we have to create folders and give permissions to the executing user. Additionally the nobody user does not have a home directory. The /nonexistent path is in just a placeholder and not a actual directory.

Manage the File Permissions:

Now what is the solution? What can we do to ensure a clean and secure solution to handle file IO in our AI Core Docker containers?

There are three options:

1. Create Directories with permissions:

The first and preferred solution is to always use dedicated folders we create and manage for file IO. Typically libraries give us the option to specify where to cache data or models.
In the example of scikit learn, we can modify the code to:

 

# Load Datasets
from sklearn import datasets
data_house = datasets.fetch_california_housing(data_home="/app/data")

 

The additional argument to specify the data_home directory makes sure to write it to /app/data. Subsequently, we need to make sure this directory exists and has proper permissions like that in our Dockerfile:

 

# create directories
RUN mkdir -p /app/src
RUN mkdir -p /app/data

# enable permission
# (make the nobody group the owner and give the owner and the group all permissions to read, write, execute)
RUN chgrp -R 65534 /app
RUN chmod -R 770 /app 

 

 2. Set HOME directory manually:

The second solution would be to dedicatedly set the HOME directory for our current context to a location with proper permissions. This would be an option if we are not able to control where our library wants to cache things. This can be done in Python or on OS Level by setting the environment variable "HOME" to a folder of our choice.

 

os.environ["HOME"] = "/app/home"

 

And of course we need the permissions in Docker as well:

 

# create directories
RUN mkdir -p /app/src
RUN mkdir -p /app/home

# enable permission
# (make the nobody group the owner and give the owner and the group all permissions to read, write, execute)
RUN chgrp -R 65534 /app
RUN chmod -R 770 /app 

 

3. Manage /nonexistent permissions:

The third solution is more like a workaround: If we still want to use the default home directory, we at least need to assign proper privileges to the user to write there. By giving the nobody user read and write permissions to the to be created nonexistent folder. But is not very clean. 

 

# create directory
RUN mkdir -p /nonexistent

# enable permission
RUN chgrp -R 65534 /nonexistent
RUN chmod -R 770 /nonexistent 

 

Note: The original idea I had to solve this issue, was to specify a user myself in the dockerfile and ask it to run it like that. But this in fact does not work with AI Core. The nobody user is taken without regard of my USER setting.

# create a new user called 'ml'
RUN useradd -m ml
# set the user ml as executing user
USER ml

This does only work locally, it would write the files into the home directory of that new user under /home/ml.

Debugging locally:

How can I test my containers locally as they would run in AI Core?

Of course, to enable a fast workflow, we want to mimic the behavior of the platform we want to deploy to in our local environment. This is something we actually can accomplish at least for the above aspects quite easily. Using Dockers run command with the --user flag, we can dedicatedly tell the daemon to execute our workload with a specific user. To mimic AI Core, we would run:

 

# run container with the nobody user
docker run --user nobody containername

 

Or additionally we can specify the USER in the Dockerfile as shown above to be the AI Core default nobody. This makes sure that we will encounter the same Permission denied /nonexistent errors locally 🙂

Hope this deep dive to the file system permissions on AI Core helped you, feel free to leave a comment or reach out 🙂

1 Comment