Authors: @YatseaLi, @cesarecalabria93, @amagnani37, @jacobahtan
In my previous blog post about Bring Open-Source or or Open-Weight LLMs into SAP AI Core, we have gone through an overview introduction of deploying and running open-source LLMs in SAP AI Core with BYOM approach, the use cases of open-source LLMs, and the sample application byom-oss-llm-ai-core and its solution architecture, and various options of leveraging open-source LLM Inference Servers to serve the open-source LLMs within SAP AI Core, such as Ollama, LocalAI, llama.cpp and vLLM. (In the rest of this article, I will use Open-Source LLMs representing both Open-Source and Open-Weight LLMs for simplification)
Here you have the blog post series.
Blog post series of Bring Open-Source LLMs into SAP AI Core |
Part 1 – Bring Open-Source LLms into SAP AI Core: Overview Part 2 – Bring Open-Source LLMs into SAP AI Core with Ollama Part 3 – Bring Open-Source LLMs into SAP AI Core with Custom Transformer Server (this blog post) Part 4 – Bring Open-Source Text Embedding Models into SAP AI Core with Infinity Part 5 – Bring Open-Source LLMs into SAP AI Core with LocalAI (To be published) Part 6 – Bring Open-Source LLMs into SAP AI Core with llama.cpp (To be published) Part 7 – Bring Open-Source LLMs into SAP AI Core with vLLM (to Te published) Note: You can try it out the sample AI Core sample app byom-oss-llm-ai-core by following its manual here with all the technical details. The followup blog posts will just wrap up the technical details of each option. |
In this blog post, we'll have an end-to-end technical deep dive into the option of bringing open-source LLMs into SAP AI Core through a Custom Inference Server with Hugging Face Transformers Library, which is presented in our recent webinar about Unleash Open-Source LLMs in your enterprise with SAP AI Core. Session replay available here.
In February 2024, we presented a proof of concept using Generative AI Hub for managing and tracking public maintenance issues in a fictitious city. The city aimed to improve handling of citizen-reported issues by analyzing social media posts with generative AI. More details about citizen reporting use case are available in this blog post.
Here's a recap of the Citizen Reporting application's workflow:
The initial implementation used GPT-4 through Generative AI Hub. The solution could only process text posts, the images in the posts are skipped in processing. The efficiency of issue reporting process merely relies on the quality of issue text described by the citizens.
How can we improve the user experience and process efficiency of issue reporting?
To enhance the citizen reporting app, we can integrate a foundation model with vision capabilities. This model can analyze images to identify public facility issues, even when posts contain little text.
Large Modal Models (LMMs) can process multiple data types, including text and images, simultaneously. Integrating such a model will allow our solution to accept images with instructions and provide insights through automatic image recognition and scene understanding. As of 20 June 2024, the Azure OpenAI GPT-4o(LMM) isn't available in SAP Generative AI Hub. As a result, we'll use an open-source LMM (Microsoft's Phi3-vision model) in SAP AI Core through BYOM(Bring Your Own Model) for this purpose. Once GPT-4o is listed in Generative AI Hub, you can have the option to move to GPT-4o.
This enhancement simplifies the user experience, enabling users to report issues by easily uploading photos through the app.
Let's zoom into the Generative AI components, as of 20 June 2024, phi3-vision isn't supported in Ollama, hence we need to custom an inference server to serve it in SAP AI Core by bringing your own code with Hugging Face Transformers Library and Model Hub. Of course, once phi3-vision is supported Ollama, you can simply replace the custom transformer server with Ollama. More details available in the blog post about Bring Open-Source LLMs into SAP AI Core with Ollama.
Let's have a look at the final end-to-end demo of this enhanced citizen reporting use case.
Next, we'll deep dive into enhancing the Citizen Reporting app with vision capability of Microsoft's Phi3-vision model that is served with in SAP AI Core through BYOM(Bring Your Own Model) approach.
Important Note on Hands-on
Before that, let's have a quick look at Mircosoft's Phi3-vision model.
Microsoft's Phi-3-vision model s a lightweight with only 4.5 billion parameters and 8 GB in size, yet top-ranking open-source multi-modal model. Its primary use cases are especially in limited computing resource environments for image, chart and table understanding, OCR etc. In our case, we use its general image understanding capability to automatically detect public facility issues from the images uploaded to citizen reporting app by users.
In its model card on Hugging Face, it also provide some sample inference code as below through Hugging Face Transformers Library, a generic library from Hugging Face to download, train and inference the pretrained models published in Hugging Face Model Hub. Later on, we’ll need to convert these code snippet into a custom inference server.
from PIL import Image
import requests
from transformers import AutoModelForCausalLM
from transformers import AutoProcessor
model_id = "microsoft/Phi-3-vision-128k-instruct"
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cuda", trust_remote_code=True, torch_dtype="auto", _attn_implementation='flash_attention_2') # use _attn_implementation='eager' to disable flash attention
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
messages = [
{"role": "user", "content": "<|image_1|>\nWhat is shown in this image?"},
{"role": "assistant", "content": "The chart displays the percentage of respondents who agree with various statements about their preparedness for meetings. It shows five categories: 'Having clear and pre-defined goals for meetings', 'Knowing where to find the information I need for a meeting', 'Understanding my exact role and responsibilities when I'm invited', 'Having tools to manage admin tasks like note-taking or summarization', and 'Having more focus time to sufficiently prepare for meetings'. Each category has an associated bar indicating the level of agreement, measured on a scale from 0% to 100%."},
{"role": "user", "content": "Provide insightful questions to spark discussion."}
]
url = "https://assets-c4akfrf5b4d3f4b7.z01.azurefd.net/assets/2024/04/BMDataViz_661fb89f3845e.png"
image = Image.open(requests.get(url, stream=True).raw)
prompt = processor.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(prompt, [image], return_tensors="pt").to("cuda:0")
generation_args = {
"max_new_tokens": 500,
"temperature": 0.0,
"do_sample": False,
}
generate_ids = model.generate(**inputs, eos_token_id=processor.tokenizer.eos_token_id, **generation_args)
# remove input tokens
generate_ids = generate_ids[:, inputs['input_ids'].shape[1]:]
response = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
print(response)
As visually illustrated in the component diagram, let's quickly walk through the end-to-end implementation process.
In design time, it only needs the following files:
In runtime, we’ll deploy and run an custom transformer server in SAP AI Core based on its serving template.
With that, let's start the end-to-end journey of technical implementation.
In this step, we'll wrap phi3-vision's sample inference code into a custom inference server for serving inference API.
Before jumping into the implementation of the custom transformer server in SAP AI Core. Let’s have a look at the design options of its inference API. Of course, such inference API could have many kind of implementations, mainly depends on how the APIs will be consumed.
There are some common requirements for the Inference APIs in SAP AI Core regardless of API schemas
The first option is to have a simple custom API. Here is just an example for illustration.
curl http://<YOUR_AI_CORE_DEPLOYMENT_URL>/v1/generate \
-H "Content-Type: application/json" \
-H "ai-resource-group: <YOUR_RESOURCE_GROUP>" \
-H "Authorization: <YOUR_AI_CORE_ACCESS_TOKEN>" \
-d '{
"prompt": "What is shown in this image?",
"image": "<image-binary-encoded-data> or <image-url>",
"model": "microsoft/Phi-3-vision-128k-instruct",
"max_new_tokens": 500,
"temperature": 0.7
}
Response:
{ "response": "A large tree has fallen across the sreet, blocking the way through..."}
We can design a minimal request body, in our case, the request at least need one prompt, one image and model etc. And it only returns a text response in JSON. Looks pretty simple and straight forward.
But, what if you need to switch on to another proprietary model in SAP Generative AI Hub for better quality one day in the future, for instance GPT-4o, then you will need to rewrite the LLM inference part of your application for it is a different API.
Alternatively, we can implement the API as an OpenAI-like API for chat completion, which can be compatible with SAP Generative AI Hub SDK.
curl http://<YOUR_AI_CORE_DEPLOYMENT_URL>/v1/chat/completions \
-H "Content-Type: application/json" \
-H "ai-resource-group: <YOUR_RESOURCE_GROUP>" \
-H "Authorization: <YOUR_AI_CORE_ACCESS_TOKEN>" \
-d '{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant"
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is shown in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://xxx/test-imgae.png"
}
}
]
}
],
"model": "microsoft/Phi-3-vision-128k-instruct",
"max_new_tokens": 500,
"temperature": 0.7,
"do_sample": "True",
"response_format": {"type": "json_object"}
}'
#Response:
{
"id": "1718385804",
"object": "chat.completion",
"created": 1718385804,
"model": "microsoft/Phi-3-vision-128k-instruct",
"system_fingerprint": "fp_44709d6fcb",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The image shows a large tree has fallen across a street, blocking the way through..."
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 1,
"total_tokens": 1
}
}
As a best practice, option 2-An OpenAI like API is recommended for smooth portability to another foundation models in SAP Generative AI Hub with minimal code change. In addition, you will be able to use the Generative AI Hub SDK to consume these open-source LLM hosted in SAP AI Core for application development, which will simplify the access to the open-source LLM hosted in SAP AI Core. For the rest of this blog post, we go with option 2.
Here is the sample code snippet of the custom transformer server as server.py in our sample github repository, which implements the inference API as OpenAI-like API. Here is the demo recording that walks you through the technical detail.
This step has been automated with /transformer/01-deployment.ipynb for the sample byom-oss-llm-ai-core. In this step, we will wrap the custom inference server(server.py) into a docker image and push it to docker
I have prepared a dockerfile of the custom inference server adapted for SAP AI Core, let's walk it through.
ARG BASE_IMAGE=pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime
FROM ${BASE_IMAGE} AS runtime
WORKDIR /app
COPY app/* ./
ENV CUDA_HOME=/usr/local/cuda
RUN python3 -m pip install -r requirements.txt && \
rm -rf /root/.cache/pip
# Adaptation for SAP AI Core
RUN mkdir -p /nonexistent && \
mkdir -p /hf-home && \
chown -R nobody:nogroup /nonexistent /hf-home /app && \
chmod -R 770 /nonexistent /hf-home /app
ENV HF_HOME=/hf-home
# Start the server with uvicoin
CMD ["uvicorn", "server:app", "--host", "0.0.0.0", "--port", "8080"]
The code is somehow self-explained. Here just highlight some important adaptations for SAP AI Core.
With dockerfile in place, next we can build the docker image and push it to docker hub with commands below:
# 0.Login to docker hub
docker login -u <YOUR_DOCKER_USER> -p <YOUR_DOCKER_ACCESS_TOKEN>
# 1.Build the docker image
docker build --platform=linux/amd64 -t docker.io/<YOUR_DOCKER_USER>/transformer:ai-core .
# 2.Push the docker image to docker hub to be used by deployment in SAP AI Core
docker push docker.io/<YOUR_DOCKER_USER>/transformer:ai-core
As a result, the docker image is built and pushed to docker hub. Please note that your note down your own docker image path, which will be used in next step.
I have prepared a sample serving template for the custom transformer server(transformer-template.yaml). Let's walk it through.
apiVersion: ai.sap.com/v1alpha1
kind: ServingTemplate
metadata:
name: transformer
annotations:
scenarios.ai.sap.com/description: "Run a custom transformer server on SAP AI Core"
scenarios.ai.sap.com/name: "transformer"
executables.ai.sap.com/description: "Run a custom transformer server on SAP AI Core"
executables.ai.sap.com/name: "transformer"
labels:
scenarios.ai.sap.com/id: "transformer"
ai.sap.com/version: "0.0.1"
spec:
inputs:
parameters:
- name: modelName # placeholder name
default: "microsoft/Phi-3-vision-128k-instruct"
type: string # required for every parameters
description: "Model Name to be used in SAP Generative AI Hub SDK. Ensure an identical value as alias parameter"
- name: resourcePlan
type: "string"
default: "infer.s"
description: "Resource Plan of SAP AI Core. Supported: infer.s, infer.m, infer.l, train.l"
template:
apiVersion: "serving.kserve.io/v1beta1"
metadata:
annotations: |
autoscaling.knative.dev/metric: concurrency
autoscaling.knative.dev/target: 1
autoscaling.knative.dev/targetBurstCapacity: 0
labels: |
ai.sap.com/resourcePlan: "{{inputs.parameters.resourcePlan}}"
spec: |
predictor:
imagePullSecrets:
- name: <YOUR_DOCKER_SECRET>
minReplicas: 1
maxReplicas: 1
containers:
- name: kserve-container
image: docker.io/<YOUR_DOCKER_USER>/transformer:ai-core
ports:
- containerPort: 8080
protocol: TCP
command: ["/bin/sh", "-c"]
args:
- >
set -e && echo "-------------Starting transformer Server--------------"
&& uvicorn server:app --host 0.0.0.0 --port 8080
env:
- name: MODEL_NAME
value: "{{inputs.parameters.modelName}}"
Two Input Parameters:
As mentioned in Step 3, the serving template need to be hosted in a Github Repository for SAP AI Core. This step has been automated for the sample byom-oss-llm-ai-core in 00-init-config.ipynb. Alternatively, you can onboard a github repository through SAP AI Launchpad by manual. As a result, the associated Github Repository has been onboarded into SAP AI Core.
This step has been automated for the sample byom-oss-llm-ai-core in 00-init-config.ipynb. Alternatively, you can can create your own app and sync with its associated github repo on-boarded on step 4 through SAP AI Launchpad by manual. As a result, an application has been created, a scenario for transformer is created after synchronization.
If we have a look at the transformer scenario, it has two input parameters modelName and resourcePlan as defined in its serving template(transformer-template.yaml).
This step has been automated with 01-deployment.ipynb for the sample byom-oss-llm-ai-core. Alternatively, you can can create a configuration and start a deployment with SAP AI Launchpad. Here you have a look at the demo recording:
As a result, the custom transformer server is up and running in SAP AI Core with Microsoft's Phi3-vision model ready for inference through API.
Next, we'll enhance our sample btp-generative-ai-hub-use-cases/01-social-media-citizen-reporting-genai-hub by analyzing images reported by citizens through a mobile app to identify issues related to public facilities with Microsft's Phi3-vision model served in SAP AI Core.
Here are the tasks:
Given a test image of littering street uploaded to the citizen reporting app as below, we'll will inference Microsft's Phi3-vision model for identifying public facility issue from it with the two options below.
You can inference the model with HTTP calls, which is applicable for any programing language that support http calls to a remote http server, such as JavaScript(CAP), Java(CAP), ABAP, Python etc. Here is some code snippet in Python for illustration. Please check out the full sample jupyter notebook with /transformer/02-transformer-direct-api-call.ipynb
# first you need to login to AI Core with an ai api client...
# then get its token for API call to SAP AI Core
token = ai_api_client.rest_client.get_token()
headers = {
"Authorization": token,
'ai-resource-group': resource_group,
"Content-Type": "application/json"}
model = "microsoft/Phi-3-vision-128k-instruct"
deployment = ai_api_client.deployment.get(deployment_id)
inference_base_url = f"{deployment.deployment_url}"
openai_chat_api_endpoint = f"{inference_base_url}/v1/chat/completions"
# Test image of a littered street
image_url = "https://raw.githubusercontent.com/SAP-samples/btp-generative-ai-hub-use-cases/main/10-byom-oss-llm-ai-core/resources/11-dirty-street.jpg"
# Prepare the prompt
user_msg = '''
You are a helpful Assistant of Public Facilities Issue Spotter for city council.
Responsible for analyzing images reported by citizens through a mobile app to identify issues related to public facilities.
Here are your tasks:
1.Analyze images reported by citizens through a mobile app to identify issues related to public facilities.
If no issue identified, go to step 5, otherwise continue with next steps
2.Extract photographic date and location information from images for accurate documentation.
3.Categorize identified issues based on predefined categories (e.g., infrastructure damage, cleanliness, safety hazards).
4.Assess the severity and priority of identified issues to determine appropriate action plans.
5.Only Output JSON result with schema as below:
{ "issue_identified": "{{true or false}}",
#below section only output when there is an issue identified
"title": "{{A short title about the issue}}",
"description": "{{A detail description about the issue}}",
"photo_date": "{{Extracted photographic date from its metadata in yyyy-mm-dd:hh:mm:ss format. Leave it blank if no metadata found in it.}}",
"longitude": "{{Extracted longitude of photographic location from its metadata. Do not make up any number. Output -1 if fails to extract location info from image}}",
"latitude": "{{Extracted latitude of photographic location from its metadata. Do not make up any number. Output -1 if fails to extract location info from image}}",
"category": "{{Identified category: 01-Infrastructure Damage, 02-Cleanliness, 03-Safety Hazards, 04-Duplicated}}",
"priority": "{{Suggested Priority: 01-Very High, 02-High, 03-Medium, 04-Low}}",
"suggested_action": "{{01-Immediate Attendance, 02-Schedule Inspection, 03-Schedule Service, 04-Refer to similar issue }}"
}
'''
json_data = {
"model": model,
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": user_msg},
{
"type": "image_url",
"image_url": {
"url": image_url
},
},
]
}
],
"max_new_tokens": 500,
"temperature": 0.7,
"do_sample": "True"
}
response = requests.post(url=openai_chat_api_endpoint, headers=headers, json=json_data)
print("Result:", response.text)
The output result in JSON:
{
"id": "1718386052",
"object": "chat.completion",
"created": 1718386052,
"model": "microsoft/Phi-3-vision-128k-instruct",
"system_fingerprint": "fp_44709d6fcb",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "{
\"issue_identified\": true,
\"title\": \"Litter on Roadway\",
\"description\": \"A street with litter on the ground and along the side of the road.\",
\"photo_date\": \"2023-04-01 13:47:35\",
\"longitude\": \"123.456789\",
\"latitude\": \"45.678910\",
\"category\": \"01-Infrastructure Damage\",
\"priority\": \"01-Very High\",
\"suggested_action\": \"01-Immediate Attendance\"}"
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 1,
"total_tokens": 1
}
}
You can also inference the model with SAP Generative AI Hub SDK, which can simplify the access to SAP Generative AI Hub for application development or integration. Please check its home page of python package as above for details. As of 20 Jun 2024, the SDK is only available as a python package, hence it is only available for python application development. Here is code snippet for illustration, please check out the full jupyter notebook /transformer/03-transformer-sap-genai-hub-sdk.ipynb for more detail. There are two approaches of inferencing with SAP Generative AI Hub SDK:
pip install generative-ai-hub-sdk[langchain]
from gen_ai_hub.proxy.gen_ai_hub_proxy import GenAIHubProxyClient
GenAIHubProxyClient.add_foundation_model_scenario(
scenario_id="transformer",
config_names="transformer*",
prediction_url_suffix="/v1/chat/completions",
)
proxy_client = GenAIHubProxyClient(ai_core_client = ai_core_client)
# Option 1: Proxy with OpenAI-like interface
messages = [
{
"role": "system",
"content": sys_msg
},
{
"role": "user",
"content": user_msg
}
]
result = openai.chat.completions.create(
deployment_id=deployment_id,
model=model,
response_format=response_format,
messages=messages
)
print("Option 1: Proxy with OpenAI-like interface\n", result.choices[0].message.content)
# Option 2: Proxy with Langchain-like interface
from gen_ai_hub.proxy.langchain.openai import ChatOpenAI
from langchain.schema.messages import HumanMessage, SystemMessage
#JSON Mode
response_format={"type": "json_object"}
messages = [SystemMessage(content=sys_msg),
HumanMessage(content=user_msg)]
llm = ChatOpenAI(
proxy_client=proxy_client,
deployment_id=deployment_id,
model_name=model
).bind(
response_format=response_format
)
completion = llm.invoke(messages)
print("Option 2: Proxy with Langchain-like interface\n", completion.content)
As a result, the citizens can report an public facility issue by taking and uploading a photo to the citizen reporting app effortless without the need of describing the issue in text by manual. which significantly improve the user experience of citizen reporting app, and the process efficiency of identifying the issue from the image.
For SAP developers that need to leverage generative AI in their solutions, SAP provides Generative AI Hub as an easy access to a wild range of leading LLMs including both proprietary and open-source, which should have most of your use-cases covered.
In particular business cases where you need different open-source LLM which are not yet available in Generative AI Hub, a custom inference server can be built with Hugging Face Transformers Library, and deployed to SAP AI Core to serve it in a manner of your choice and your responsibility, as long as the model is supported by Hugging Face Transformer Library.
Disclaimer: SAP notes that posts about potential uses of generative AI and large language models are merely the individual poster's ideas and opinions, and do not represent SAP's official position or future development roadmap. SAP has no legal obligation or other commitment to pursue any course of business, or develop or release any functionality, mentioned in any post or related content on this website.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
7 | |
2 | |
1 | |
1 | |
1 | |
1 | |
1 |