Artificial Intelligence and Machine Learning Blogs
Explore AI and ML blogs. Discover use cases, advancements, and the transformative potential of AI for businesses. Stay informed of trends and applications.
cancel
Showing results for 
Search instead for 
Did you mean: 
YatseaLi
Product and Topic Expert
Product and Topic Expert
2,729

Authors: @YatseaLi, @cesarecalabria93, @amagnani37, @jacobahtan 

In my previous blog post about Bring Open-Source or or Open-Weight LLMs into SAP AI Core, we have gone through an overview introduction of deploying and running open-source LLMs in SAP AI Core with BYOM approach, the use cases of open-source LLMs, and the sample application byom-oss-llm-ai-core and its solution architecture, and various options of leveraging open-source LLM Inference Servers to serve the open-source LLMs within SAP AI Core, such as Ollama, LocalAI, llama.cpp and vLLM. (In the rest of this article, I will use Open-Source LLMs representing both Open-Source and Open-Weight LLMs for simplification)

Here you have the blog post series.

Blog post series of Bring Open-Source LLMs into SAP AI Core
Part 1 – Bring Open-Source LLms into SAP AI Core: Overview
Part 2 – Bring Open-Source LLMs into SAP AI Core with Ollama
Part 3 – Bring Open-Source LLMs into SAP AI Core with Custom Transformer Server (this blog post)
Part 4 – Bring Open-Source Text Embedding Models into SAP AI Core with Infinity
Part 5 – Bring Open-Source LLMs into SAP AI Core with LocalAI (To be published)
Part 6 – Bring Open-Source LLMs into SAP AI Core with llama.cpp (To be published)
Part 7 – Bring Open-Source LLMs into SAP AI Core with vLLM (to Te published)

Note: You can try it out the sample AI Core sample app byom-oss-llm-ai-core by following its manual here with all the technical details. The followup blog posts will just wrap up the technical details of each option.

In this blog post, we'll have an end-to-end technical deep dive into the option of bringing open-source LLMs into SAP AI Core through a Custom Inference Server with Hugging Face Transformers Library, which is presented in our recent webinar about Unleash Open-Source LLMs in your enterprise with SAP AI Core. Session replay available here.

  • Enhancing the use case Citizen Reporting with vision capability
  • Custom an Inference Server for Microsoft's Phi-3-vision-128k-instruct with Hugging Face Transformers Library. Namely "Custom Transformer Server".
  • Deploy and Run the Custom Transformer Server in SAP AI Core
  • Inference Microsoft's Phi-3-vision-128k-instruct served with the Customer Transformer Server in SAP AI Core through
    • Direct API call
    • SAP Generative AI Hub SDK and Langchain

Enhancing the use case of Citizen Reporting with vision capability

A quick recap

In February 2024, we presented a proof of concept using Generative AI Hub for managing and tracking public maintenance issues in a fictitious city. The city aimed to improve handling of citizen-reported issues by analyzing social media posts with generative AI. More details about citizen reporting use case are available in this blog post.

Here's a recap of the Citizen Reporting application's workflow:

  1. A citizen reports an incident on the city's social media page.
  2. The app receives the post and notifies the public administration office.
  3. A Large Language Model via Generative AI Hub processes the post, extracting key points, summarizing the issue, identifying the issue type and urgency, determining the location, and analyzing sentiment.
  4. The Maintenance Manager reviews the details and decides to approve or reject the incident.
  5. Approved incidents generate a maintenance notification in SAP S/4HANA Cloud.

The initial implementation used GPT-4 through Generative AI Hub. The solution could only process text posts, the images in the posts are skipped in processing. The efficiency of issue reporting process merely relies on the quality of issue text described by the citizens.

How can we improve the user experience and process efficiency of issue reporting?

Enhancing the citizen reporting with handling images

To enhance the citizen reporting app, we can integrate a foundation model with vision capabilities. This model can analyze images to identify public facility issues, even when posts contain little text.

Large Modal Models (LMMs) can process multiple data types, including text and images, simultaneously. Integrating such a model will allow our solution to accept images with instructions and provide insights through automatic image recognition and scene understanding. As of 20 June 2024, the Azure OpenAI GPT-4o(LMM) isn't available in SAP Generative AI Hub. As a result, we'll use an open-source LMM (Microsoft's Phi3-vision model) in SAP AI Core through BYOM(Bring Your Own Model) for this purpose. Once GPT-4o is listed in Generative AI Hub, you can have the option to move to GPT-4o.

This enhancement simplifies the user experience, enabling users to report issues by easily uploading photos through the app.

YatseaLi_7-1719582920257.png

Solution Architecture of the enhancement with image processing with LMM

Let's zoom into the Generative AI components, as of 20 June 2024, phi3-vision isn't supported in Ollama, hence we need to custom an inference server to serve it in SAP AI Core by bringing your own code with Hugging Face Transformers Library and Model Hub. Of course, once phi3-vision is supported Ollama, you can simply replace the custom transformer server with Ollama. More details available in the blog post about Bring Open-Source LLMs into SAP AI Core with Ollama.

YatseaLi_0-1719583826323.png

An end-to-end demo of Citizen Reporting App enhanced with Vision capability

Let's have a look at the final end-to-end demo of this enhanced citizen reporting use case.

 

Technical Implementation Deep-dive

Next, we'll deep dive into enhancing the Citizen Reporting app with vision capability of Microsoft's Phi3-vision model that is served with in SAP AI Core through BYOM(Bring Your Own Model) approach.

Important Note on Hands-on

  • All the sample code(byom-oss-llm-ai-core) showed in this section could be found here. To try out this custom transformer server, you are recommended to follow its manual.
  • The steps described below mainly aims to explain the process, are automated in the jupyter notebooks 00-init-config.ipynb and /transformer/01-deployment.ipynb, which you can run it through its whole implementation, and please pay attention to its prerequisites. Alternatively, you can perform step 4~6 through SAP AI Core by manual. 

Before that, let's have a quick look at Mircosoft's Phi3-vision model. 

A quick glance at Microsoft's Phi-3-vision-128k-instruct model

Microsoft's Phi-3-vision model s a lightweight with only 4.5 billion parameters and 8 GB in size, yet top-ranking open-source multi-modal model. Its primary use cases are especially in limited computing resource environments for image, chart and table understanding, OCR etc. In our case, we use its general image understanding capability to automatically detect public facility issues from the images uploaded to citizen reporting app by users.

In its model card on Hugging Face, it also provide some sample inference code as below through Hugging Face Transformers Library, a generic library from Hugging Face to download, train and inference the pretrained models published in Hugging Face Model Hub.  Later on, we’ll need to convert these code snippet into a custom inference server.

 

from PIL import Image 
import requests 
from transformers import AutoModelForCausalLM 
from transformers import AutoProcessor 

model_id = "microsoft/Phi-3-vision-128k-instruct" 

model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cuda", trust_remote_code=True, torch_dtype="auto", _attn_implementation='flash_attention_2') # use _attn_implementation='eager' to disable flash attention

processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True) 

messages = [ 
    {"role": "user", "content": "<|image_1|>\nWhat is shown in this image?"}, 
    {"role": "assistant", "content": "The chart displays the percentage of respondents who agree with various statements about their preparedness for meetings. It shows five categories: 'Having clear and pre-defined goals for meetings', 'Knowing where to find the information I need for a meeting', 'Understanding my exact role and responsibilities when I'm invited', 'Having tools to manage admin tasks like note-taking or summarization', and 'Having more focus time to sufficiently prepare for meetings'. Each category has an associated bar indicating the level of agreement, measured on a scale from 0% to 100%."}, 
    {"role": "user", "content": "Provide insightful questions to spark discussion."} 
] 

url = "https://assets-c4akfrf5b4d3f4b7.z01.azurefd.net/assets/2024/04/BMDataViz_661fb89f3845e.png" 
image = Image.open(requests.get(url, stream=True).raw) 

prompt = processor.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = processor(prompt, [image], return_tensors="pt").to("cuda:0") 

generation_args = { 
    "max_new_tokens": 500, 
    "temperature": 0.0, 
    "do_sample": False, 
} 

generate_ids = model.generate(**inputs, eos_token_id=processor.tokenizer.eos_token_id, **generation_args) 

# remove input tokens 
generate_ids = generate_ids[:, inputs['input_ids'].shape[1]:]
response = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0] 

print(response)

 

Outline of Implementation Process

YatseaLi_0-1719802334166.png

As visually illustrated in the component diagram, let's quickly walk through the end-to-end implementation process. 

  1. Custom an Inference Server(server.py) for Microsoft's Phi-3-vision with Hugging Face Transformers Library
  2. Build a docker image(Dockerfile) for the Custom Transformer Server and push it to dockerhub
  3. Prepare and host a serving template(transformer-template.yaml) in Github Repository for the Custom Transformer Server
  4. Onboard the Github Repository
  5. Create an application and synchronize the serving templates in the associated Github Repository
  6. Create a configuration and start a deployment
  7. Inference the Phi3-vision model in Vision Assistant of Citizen Reporting App through
    • Direct API Call
    • SAP Generative AI Hub SDK

In design time, it only needs the following files:

  • A server.py and its dependencies in requirements.txt as the custom transformer server to serve Phi3-vision model for API inference.
  • A dockerfile to wrap the custom transformer server into a docker image for SAP AI Core.
  • A serving template yaml file(transformer-template.yaml) to describe what docker image to be run on what kind of infrastructure spec in SAP AI Core with configurable input parameters etc.

In runtime, we’ll deploy and run an custom transformer server in SAP AI Core based on its serving template.

With that, let's start the end-to-end journey of technical implementation.

Step 1: Custom an Inference Server for Microsoft's Phi-3-vision with Hugging Face Transformers Library

In this step, we'll wrap phi3-vision's sample inference code into a custom inference server for serving inference API.

Design the APIs of the Custom Transformer Server

Before jumping into the implementation of the custom transformer server in SAP AI Core. Let’s have a look at the design options of its inference API. Of course, such inference API could have many kind of implementations, mainly depends on how the APIs will be consumed.

Common requirements for the Inference APIs in SAP AI Core in run-time

There are some common requirements for the Inference APIs in SAP AI Core regardless of API schemas

  • Endpoint must start with a prefix versioning, like /v1/xxx or /v2/yyy. Let us simply use the endpoint as /v1/generate.
  • The target resource group in AI Core should be explicitly set in the http header if the deployment isn’t in the default resource group.
  • Endpoint access is protected with AI Core assess token for security.
Option 1 - A simple custom API Not compatible with SAP Generative AI Hub SDK

The first option is to have a simple custom API. Here is just an example for illustration.

 

curl http://<YOUR_AI_CORE_DEPLOYMENT_URL>/v1/generate \
    -H "Content-Type: application/json" \
    -H "ai-resource-group: <YOUR_RESOURCE_GROUP>" \
    -H "Authorization: <YOUR_AI_CORE_ACCESS_TOKEN>" \
    -d '{
    "prompt": "What is shown in this image?",
    "image": "<image-binary-encoded-data> or <image-url>",
    "model": "microsoft/Phi-3-vision-128k-instruct",
    "max_new_tokens": 500,
    "temperature": 0.7
    }

Response:
{ "response": "A large tree has fallen across the sreet, blocking the way through..."}​

 

We can design a minimal request body, in our case, the request at least need one prompt, one image and model etc. And it only returns a text response in JSON. Looks pretty simple and straight forward.

But, what if you need to switch on to another proprietary model in SAP Generative AI Hub for better quality one day in the future, for instance GPT-4o, then you will need to rewrite the LLM inference part of your application for it is a different API.

Option 2 - An OpenAI-like API Compatible with SAP Generative AI Hub SDK

Alternatively, we can implement the API as an OpenAI-like API for chat completion, which can be compatible with SAP Generative AI Hub SDK.

 

curl http://<YOUR_AI_CORE_DEPLOYMENT_URL>/v1/chat/completions \
    -H "Content-Type: application/json" \
    -H "ai-resource-group: <YOUR_RESOURCE_GROUP>" \
    -H "Authorization: <YOUR_AI_CORE_ACCESS_TOKEN>" \
    -d '{
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant"
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What is shown in this image?"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://xxx/test-imgae.png"
                    }
                }
            ]
        }
    ],
    "model": "microsoft/Phi-3-vision-128k-instruct",
    "max_new_tokens": 500,
    "temperature": 0.7,
    "do_sample": "True",
    "response_format": {"type": "json_object"} 
}'

#Response:
{
  "id": "1718385804",
  "object": "chat.completion",
  "created": 1718385804,
  "model": "microsoft/Phi-3-vision-128k-instruct",
  "system_fingerprint": "fp_44709d6fcb",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The image shows a large tree has fallen across a street, blocking the way through..."
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 1,
    "total_tokens": 1
  }
}​

 

  • An OpenAI-like endpoint as /v1/chat/completion, the same header requirements as option 1.
  • The request body is made of a list of messages with different roles. For the image understanding, it requires one text prompt and an associated image as image_url. The rest are similar generation configurations, such target model, and temperature etc.
  • The response will return a list of choices for completion results, and some usage data, such as completion_token, total_token etc.
Best Practice of API Design

As a best practice, option 2-An OpenAI like API is recommended for smooth portability to another foundation models in SAP Generative AI Hub with minimal code change. In addition, you will be able to use the Generative AI Hub SDK to consume these open-source LLM hosted in SAP AI Core for application development, which will simplify the access to the open-source LLM hosted in SAP AI Core. For the rest of this blog post, we go with option 2.

Custom Inference Server with OpenAI-like API for Phi3-vision

Here is the sample code snippet of the custom transformer server as server.py in our sample github repository, which implements the inference API as OpenAI-like API. Here is the demo recording that walks you through the technical detail.

Step 2: Build a docker image and Push it to Docker Hub

This step has been automated with /transformer/01-deployment.ipynb for the sample byom-oss-llm-ai-core. In this step, we will wrap the custom inference server(server.py) into a docker image and push it to docker

I have prepared a dockerfile of the custom inference server adapted for SAP AI Core, let's walk it through.

 

ARG BASE_IMAGE=pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime
FROM ${BASE_IMAGE} AS runtime

WORKDIR /app
COPY app/* ./

ENV CUDA_HOME=/usr/local/cuda

RUN python3 -m pip install -r requirements.txt && \
    rm -rf /root/.cache/pip

# Adaptation for SAP AI Core
RUN mkdir -p /nonexistent && \
    mkdir -p /hf-home && \ 
    chown -R nobody:nogroup /nonexistent /hf-home /app && \
    chmod -R 770 /nonexistent /hf-home /app

ENV HF_HOME=/hf-home

# Start the server with uvicoin
CMD ["uvicorn", "server:app", "--host", "0.0.0.0", "--port", "8080"]

 

The code is somehow self-explained. Here just highlight some important adaptations for SAP AI Core.

  1. Choose a proper base image
    As we would like a pytorch rum-time with cuda and cudnn to leveraging the GPU in SAP AI Core, instead of of build the image from scractch, we can search a standard base image from pytorch on docerhub that can meet our need.

  2. Hugging Faces Transformers library requires some directory to download the model from model hub in run-time.
    We’ll create some environment variable(HF_HOME) and directories with appropriate permission required by running customer transformer server with nobody user in SAP AI Core.

  3. Start the server on the docker container starts.
    The Custom Transformer Server will be started with uvicorn in SAP AI Core

With dockerfile in place, next we can build the docker image and push it to docker hub with commands below:

 

# 0.Login to docker hub
docker login -u <YOUR_DOCKER_USER> -p <YOUR_DOCKER_ACCESS_TOKEN>

# 1.Build the docker image
docker build --platform=linux/amd64 -t docker.io/<YOUR_DOCKER_USER>/transformer:ai-core .

# 2.Push the docker image to docker hub to be used by deployment in SAP AI Core
docker push docker.io/<YOUR_DOCKER_USER>/transformer:ai-core 

 

As a result, the docker image is built and pushed to docker hub. Please note that your note down your own docker image path, which will be used in next step.

YatseaLi_0-1719810534253.png

Step 3: Prepare a Serving Template for the Custom Transformer Server and host it in a GitHub Repository

I have prepared a sample serving template for the custom transformer server(transformer-template.yaml). Let's walk it through.

 

apiVersion: ai.sap.com/v1alpha1
kind: ServingTemplate
metadata:
  name: transformer
  annotations:
    scenarios.ai.sap.com/description: "Run a custom transformer server on SAP AI Core"
    scenarios.ai.sap.com/name: "transformer"
    executables.ai.sap.com/description: "Run a custom transformer server on SAP AI Core"
    executables.ai.sap.com/name: "transformer"
  labels:
    scenarios.ai.sap.com/id: "transformer"
    ai.sap.com/version: "0.0.1"
spec:
  inputs:
    parameters:
      - name: modelName # placeholder name
        default: "microsoft/Phi-3-vision-128k-instruct" 
        type: string # required for every parameters
        description: "Model Name to be used in SAP Generative AI Hub SDK. Ensure an identical value as alias parameter"
      - name: resourcePlan
        type: "string"
        default: "infer.s"
        description: "Resource Plan of SAP AI Core. Supported: infer.s, infer.m, infer.l, train.l"
  template:
    apiVersion: "serving.kserve.io/v1beta1"
    metadata:
      annotations: |
        autoscaling.knative.dev/metric: concurrency
        autoscaling.knative.dev/target: 1
        autoscaling.knative.dev/targetBurstCapacity: 0
      labels: |
        ai.sap.com/resourcePlan: "{{inputs.parameters.resourcePlan}}"
    spec: |
      predictor:
        imagePullSecrets:
        - name: <YOUR_DOCKER_SECRET>
        minReplicas: 1
        maxReplicas: 1
        containers:
        - name: kserve-container
          image: docker.io/<YOUR_DOCKER_USER>/transformer:ai-core
          ports:
            - containerPort: 8080
              protocol: TCP
          command: ["/bin/sh", "-c"]
          args:
            - >
              set -e && echo "-------------Starting transformer Server--------------" 
              && uvicorn server:app --host 0.0.0.0 --port 8080
          env:
            - name: MODEL_NAME
              value: "{{inputs.parameters.modelName}}" 

 

Two Input Parameters:

  • modelName: Target model to be downloaded from Hugging Face Model Hub on server starts, and will be used for inference with Generative AI Hub SDK
  • resourcePlan:  Target resource plan to be used in the deployment. Default as infer.s , more detail here.

Step 4: Onboard the Github repository to SAP AI Core

As mentioned in Step 3, the serving template need to be hosted in a Github Repository for SAP AI Core. This step has been automated for the sample byom-oss-llm-ai-core in 00-init-config.ipynb. Alternatively, you can onboard a github repository through SAP AI Launchpad by manual. As a result, the associated Github Repository has been onboarded into SAP AI Core.

YatseaLi_2-1719579743036.jpeg

Step 5: Create an application SAP AI Core and sync with the GitHub repository

This step has been automated for the sample byom-oss-llm-ai-core in 00-init-config.ipynb. Alternatively, you can can create your own app and sync with its associated github repo on-boarded on step 4 through SAP AI Launchpad by manual. As a result, an application has been created, a scenario for transformer is created after synchronization.

YatseaLi_0-1719669277732.png

If we have a look at the transformer scenario, it has two input parameters modelName and resourcePlan as defined in its serving template(transformer-template.yaml).

YatseaLi_1-1719810942531.png

Step 6: Create a configuration and start a deployment

This step has been automated with 01-deployment.ipynb for the sample byom-oss-llm-ai-core. Alternatively, you can can create a configuration and start a deployment with SAP AI Launchpad. Here you have a look at the demo recording:

As a result, the custom transformer server is up and running in SAP AI Core with Microsoft's Phi3-vision model ready for inference through API.

Step 7: Inference Microsoft's Phi3-vision served with the Custom Transformer Server in SAP AI Core

Next, we'll enhance our sample btp-generative-ai-hub-use-cases/01-social-media-citizen-reporting-genai-hub by analyzing images reported by citizens through a mobile app to identify issues related to public facilities with Microsft's Phi3-vision model served in SAP AI Core.
Here are the tasks:

  • Analyze images reported by citizens through a mobile app to identify issues related to public facilities. If no issue identified, go to step 5, otherwise continue with next steps
  • Extract photographic date and location information from images for accurate documentation.
  • Categorize identified issues based on predefined categories (e.g., infrastructure damage, cleanliness, safety hazards).
  • Assess the severity and priority of identified issues to determine appropriate action plans.
  • Output with JSON schema:

Given a test image of littering street uploaded to the citizen reporting app as below, we'll will inference Microsft's Phi3-vision model for identifying public facility issue from it with the two options below.

  • Option 1-Inference with direct API Call
  • Option 2-Inference with SAP Generative AI Hub SDK and Langchain

A littered streetA littered street

Option 1-Inference with direct API call

You can inference the model with HTTP calls, which is applicable for any programing language that support http calls to a remote http server, such as JavaScript(CAP), Java(CAP), ABAP, Python etc. Here is some code snippet in Python for illustration. Please check out the full sample jupyter notebook with /transformer/02-transformer-direct-api-call.ipynb

 

# first you need to login to AI Core with an ai api client...
# then get its token for API call to SAP AI Core
token = ai_api_client.rest_client.get_token()
headers = {
        "Authorization": token,
        'ai-resource-group': resource_group,
        "Content-Type": "application/json"}

model = "microsoft/Phi-3-vision-128k-instruct"
deployment = ai_api_client.deployment.get(deployment_id)

inference_base_url = f"{deployment.deployment_url}"
openai_chat_api_endpoint = f"{inference_base_url}/v1/chat/completions"

# Test image of a littered street
image_url = "https://raw.githubusercontent.com/SAP-samples/btp-generative-ai-hub-use-cases/main/10-byom-oss-llm-ai-core/resources/11-dirty-street.jpg"

# Prepare the prompt
user_msg = '''
You are a helpful Assistant of Public Facilities Issue Spotter for city council.
Responsible for analyzing images reported by citizens through a mobile app to identify issues related to public facilities. 
Here are your tasks:
1.Analyze images reported by citizens through a mobile app to identify issues related to public facilities. 
If no issue identified, go to step 5, otherwise continue with next steps 
2.Extract photographic date and location information from images for accurate documentation. 
3.Categorize identified issues based on predefined categories (e.g., infrastructure damage, cleanliness, safety hazards).
4.Assess the severity and priority of identified issues to determine appropriate action plans. 
5.Only Output JSON result with schema as below:
{ "issue_identified": "{{true or false}}", 
#below section only output when there is an issue identified
"title": "{{A short title about the issue}}", 
"description": "{{A detail description about the issue}}", 
"photo_date": "{{Extracted photographic date from its metadata in yyyy-mm-dd:hh:mm:ss format. Leave it blank if no metadata found in it.}}", 
"longitude": "{{Extracted longitude of photographic location from its metadata. Do not make up any number. Output -1 if fails to extract location info from image}}",
"latitude": "{{Extracted latitude of photographic location from its metadata. Do not make up any number. Output -1 if fails to extract location info from image}}",
"category": "{{Identified category: 01-Infrastructure Damage, 02-Cleanliness, 03-Safety Hazards, 04-Duplicated}}",
"priority": "{{Suggested Priority: 01-Very High, 02-High, 03-Medium, 04-Low}}",
"suggested_action": "{{01-Immediate Attendance, 02-Schedule Inspection, 03-Schedule Service, 04-Refer to similar issue }}"
} 
'''

json_data = {
    "model": model,
    "messages": [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": user_msg},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": image_url
                    },
                },
            ]
        }
    ],
    "max_new_tokens": 500,
    "temperature": 0.7,
    "do_sample": "True" 
}

response = requests.post(url=openai_chat_api_endpoint, headers=headers, json=json_data)
print("Result:", response.text)

 

The output result in JSON:

 

{
  "id": "1718386052",
  "object": "chat.completion",
  "created": 1718386052,
  "model": "microsoft/Phi-3-vision-128k-instruct",
  "system_fingerprint": "fp_44709d6fcb",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "{  
                        \"issue_identified\": true,
                        \"title\": \"Litter on Roadway\",
                        \"description\": \"A street with litter on the ground and along the side of the road.\",
                        \"photo_date\": \"2023-04-01 13:47:35\",
                        \"longitude\": \"123.456789\",
                        \"latitude\": \"45.678910\",
                        \"category\": \"01-Infrastructure Damage\",
                        \"priority\": \"01-Very High\",
                        \"suggested_action\": \"01-Immediate Attendance\"}"
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 1,
    "total_tokens": 1
  }
}

 

Option 2-Inference with SAP Generative AI Hub SDK and Langchain

You can also inference the model with SAP Generative AI Hub SDK, which can simplify the access to SAP Generative AI Hub for application development or integration. Please check its home page of python package as above for details. As of 20 Jun 2024,  the SDK is only available as a python package, hence it is only available for python application development. Here is code snippet for illustration, please check out the full jupyter notebook /transformer/03-transformer-sap-genai-hub-sdk.ipynb for more detail. There are two approaches of inferencing with SAP Generative AI Hub SDK:

  1. Proxy with OpenAI-like interface
  2. Proxy with Langchain-like interface
i. Install SAP Generative AI Hub SDK and Langchain

 

pip install generative-ai-hub-sdk[langchain]

 

ii. Register the scenario as a foundation model scenario

 

from gen_ai_hub.proxy.gen_ai_hub_proxy import GenAIHubProxyClient
GenAIHubProxyClient.add_foundation_model_scenario(
 scenario_id="transformer",
 config_names="transformer*",
 prediction_url_suffix="/v1/chat/completions",
)
proxy_client = GenAIHubProxyClient(ai_core_client = ai_core_client)

 

iii. Inference with SAP Generative API Hub SDK

 

# Option 1: Proxy with OpenAI-like interface
messages = [
            {
                "role": "system",
                "content": sys_msg
            },
            {
                "role": "user",
                "content": user_msg
            }
        ]
result = openai.chat.completions.create(
    deployment_id=deployment_id,
    model=model,
    response_format=response_format,
    messages=messages
)

print("Option 1: Proxy with OpenAI-like interface\n", result.choices[0].message.content)

# Option 2: Proxy with Langchain-like interface
from gen_ai_hub.proxy.langchain.openai import ChatOpenAI
from langchain.schema.messages import HumanMessage, SystemMessage
#JSON Mode
response_format={"type": "json_object"} 

messages = [SystemMessage(content=sys_msg),
            HumanMessage(content=user_msg)]
llm = ChatOpenAI(
    proxy_client=proxy_client,
    deployment_id=deployment_id,
    model_name=model
).bind(
 response_format=response_format
)

completion = llm.invoke(messages)
print("Option 2: Proxy with Langchain-like interface\n", completion.content)

 

As a result, the citizens can report an public facility issue by taking and uploading a photo to the citizen reporting app effortless without the need of describing the issue in text by manual. which significantly improve the user experience of citizen reporting app, and the process efficiency of identifying the issue from the image.

Try it out

  • Please refer to this manual to try out deploying and running Microsoft's Phi3-vision model with custom transformer server using Hugging Face Transformers Library in SAP AI Core. The source code of this sample is released under Apache 2.0 license. You should be accountable for your own choice of commercially viable open-source LLMs/LMMs.
  • In this blog post, we have just take Microsoft's Phi3-vision model as example, however, the same steps are applicable to bring any other open-source models supported with Hugging Face Transformers Library with customer transformer server into SAP AI Core

 

Summary

For SAP developers that need to leverage generative AI in their solutions, SAP provides Generative AI Hub as an easy access to a wild range of leading LLMs including both proprietary and open-source, which should have most of your use-cases covered.

In particular business cases where you need different open-source LLM which are not yet available in Generative AI Hub,  a custom inference server can be built with Hugging Face Transformers Library, and deployed to SAP AI Core to serve it in a manner of your choice and your responsibility, as long as the model is supported by Hugging Face Transformer Library.

Disclaimer: SAP notes that posts about potential uses of generative AI and large language models are merely the individual poster's ideas and opinions, and do not represent SAP's official position or future development roadmap. SAP has no legal obligation or other commitment to pursue any course of business, or develop or release any functionality, mentioned in any post or related content on this website.