Artificial Intelligence and Machine Learning Blogs
Explore AI and ML blogs. Discover use cases, advancements, and the transformative potential of AI for businesses. Stay informed of trends and applications.
cancel
Showing results for 
Search instead for 
Did you mean: 
jacobahtan
Product and Topic Expert
Product and Topic Expert
1,821

Authors: @YatseaLi, @cesarecalabria, @amagnani@jacobahtan

In the previous blog post about Bring Open-Source or or Open-Weight LLMs into SAP AI Core, we have gone through an overview introduction of deploying and running open-source LLMs in SAP AI Core with BYOM approach, the use cases of open-source LLMs, and the sample application byom-oss-llm-ai-core and its solution architecture, and various options of leveraging open-source LLM Inference Servers to serve the open-source LLMs within SAP AI Core, such as Ollama, LocalAI, llama.cpp and vLLM. (In the rest of this article, I will use Open-Source LLMs representing both Open-Source and Open-Weight LLMs for simplification)

Here you have the blog post series.

Blog post series of Bring Open-Source LLMs into SAP AI Core
Part 1 – Bring Open-Source LLms into SAP AI Core: Overview
Part 2 – Bring Open-Source LLMs into SAP AI Core with Ollama
Part 3 – Bring Open-Source LLMs into SAP AI Core with Custom Transformer Server
Part 4 – Bring Open-Source Text Embedding Models into SAP AI Core with Infinity (this blog post)
Part 5 – Bring Open-Source LLMs into SAP AI Core with LocalAI (to be published)
Part 6 – Bring Open-Source LLMs into SAP AI Core with llama.cpp (to be published)
Part 7 – Bring Open-Source LLMs into SAP AI Core with vLLM (to be published)

Note: You can try it out the sample AI Core sample app byom-oss-llm-ai-core by following its manual here with all the technical details. The followup blog posts will just wrap up the technical details of each option.

In this blog post, we'll have an end-to-end technical deep dive into the first option of bringing open-source text embedding models within SAP AI Core through Infinity.

A quick glance at Infinity

jacobahtan_1-1719901358973.png

Infinity is a high-performance REST API for serving vector embeddings generated by various text-embedding models. It's open-source under the MIT license, making it freely available for integration into SAP AI Core.

Key features of Infinity include:

  • Model Agnostic: Supports a wide range of models from the SentenceTransformers library.
  • Fast Inference: Leverages backends like PyTorch, ONNX, TensorRT, and CTranslate2 for optimized performance across different hardware (CPU, GPU).
  • Dynamic Batching: Maximizes throughput by processing requests efficiently while the GPU is busy.
  • Accurate Embeddings: Delivers embeddings consistent with SentenceTransformers (up to numerical precision).
  • Easy to Use: Employs FastAPI for a user-friendly API with Swagger documentation.
  • OpenAI Compatibility: API aligns with OpenAI's Embedding specifications, simplifying integration with other tools.

Here you have a short demo about Infinity.

With Infinity, running it locally on your own computer is very straight-forward with three steps: Install > Start > Inference

Infinity is freely available for download from its website

 

 

 

pip install infinity-emb[all]

 

 

 

After your pip install, with your venv active, you can run the CLI directly:

 

 

 

infinity_emb --url-prefix "/v1" --model-name-or-path "nreimers/MiniLM-L6-H384-uncased" --port "7998"

 

 

 

 

 

If you would like to understand what are the various options/parameters:

 

 

 

 

 

infinity_emb --help

 

 

 

 

 

jacobahtan_2-1719901951317.png

For our case, we used one of the models from Massive Text Embedding Benchmark (MTEB)nreimers/MiniLM-L6-H384-uncased.

jacobahtan_3-1719902210826.png

Once it's is up and running, now we can inference the text embedding model. First, we can first check if the model is defined correctly.

 

 

 

 

 

curl -X 'GET' \
 'http://0.0.0.0:7998/v1/models' -H 'accept: application/json' \

Response:
{
   "data":[
      {
         "id":"nreimers/MiniLM-L6-H384-uncased",
         "stats":{
            "queue_fraction":0.0,
            "queue_absolute":0,
            "results_pending":0,
            "batch_size":32
         },
         "object":"model",
         "owned_by":"infinity",
         "created":1719902031,
         "backend":"torch"
      }
   ],
   "object":"list"
}

 

 

 

 

 

 

Next, let's inference against the text embedding model, to generate the following vector embeddings for the following sample text: "a sentence to encode"

 

 

 

 

 

curl -X 'POST' \
  'http://0.0.0.0:7998/v1/embeddings' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "input": [
    "a sentence to encode."
  ]
}'

Response:
{
   "object":"embedding",
   "data":[
      {
         "object":"embedding",
         "embedding":[
            -0.07419787347316742,
            0.05378536880016327,
            "...",
            "...",
            0.008502810262143612,
            0.010130912065505981
         ],
         "index":0
      }
   ],
   "model":"nreimers/MiniLM-L6-H384-uncased",
   "usage":{
      "prompt_tokens":21,
      "total_tokens":21
   },
   "id":"infinity-c522de74-90d2-415d-9455-d3ea480785a0",
   "created":1719903053
}

 

 

 

 

 

That looks promising. You may wonder if you can deploy and run Infinity within SAP AI Core for serving open-source Text Embedding Models. The answer is YES.

Deploy and Run Infinity within SAP AI Core

In this section, we will go through the technical details of bringing Infinity into SAP AI Core through BYOM (Bring Your Own Model) approach. All the sample code showed in this section could be found here.

To bring a custom inference server in SAP AI Core, there are some common requirements in SAP AI Core.

  • A docker image to run the custom inference server. The endpoints of its inference API in run-time must start with a versioning prefix, for example, /v1/xxx, /v2/yyy etc.
  • A serving template of SAP AI Core with the metadata of the custom inference server 
  • A github repository for hosting the serving templates to be onboard into SAP AI Core
  • An application in SAP AI Core associated with the github repository above, whose serving templates are created as scenarios artifact in the application.
  • ...

We won't go through all the steps in this blog post. For the prerequisites and initial configuration(on-boarding github repository and create an application in SAP AI Core), please refer to the prerequisites section of the sample app byom-oss-llm-ai-core.

Instead, we'll take Infinity as a sample custom inference server to deploy and run in SAP AI Core, which is already a ready-to-use inference server to serve open-source Text Embedding models, no extra server inference code is needed. Have that said, apart from bring your own inference code, we can also bring a ready-to-use inference server to SAP AI Core, as long as it is compliant with requirements mentioned above.

Here is the conceptual components diagram of Deploying and Running Infinity within SAP AI Core.

jacobahtan_4-1719904329787.png
In design time, it only needs two files:

  • A dockerfile to wrap infinity into a docker image for SAP AI Core.
  • A serving template yaml file (infinity-template.yaml) to describe what docker image to be run on what kind of infrastructure spec in SAP AI Core with configurable input parameters etc.

In runtime, we’ll deploy and run an Infinity server in SAP AI Core based on its serving template.

Important Note: The steps described below mainly aims to explain the process, are automated in the jupyter notebooks 00-init-config.ipynb and /infinity-emb/01-deployment.ipynb, which you can run it through, and please pay attention to its prerequisites. Alternatively, you can perform step 4 and 5 through SAP AI Core by manual. 

Step 1: Prepare a Dockerfile to wrap Infinity into a docker image for SAP AI Core

I have prepared a dockerfile of Infinity adapted for SAP AI Core, let's walk it through.

Dockerfile:

 

 

 

 

 

FROM pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime AS runtime

WORKDIR /usr/src

# Update and install dependencies
RUN apt-get update && \
    apt-get install -y \
    ca-certificates \
    nginx \
    curl && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

RUN python3 -m pip install --upgrade pip==23.2.1 && \
    python3 -m pip install "infinity-emb[all]" && \
    rm -rf /root/.cache/pip

EXPOSE 7997

# Adaptation for SAP AI Core
COPY run.sh /usr/src/run.sh

RUN mkdir -p /nonexistent/ && \
    mkdir -p /hf-home/ && \ 
    chown -R nobody:nogroup /nonexistent /hf-home/ && \
    chmod -R 770 /nonexistent/ /hf-home/ && \
    chmod +x /usr/src/run.sh

ENV HF_HOME=/hf-home
    # Note: Uncomment this ENV with MODEL_NAME & URL_PREFIX if you're running Docker locally. Don't forget about the backslash \
    # MODEL_NAME="nreimers/MiniLM-L6-H384-uncased"
    # URL_PREFIX="/v1"

ENTRYPOINT [ "/usr/src/run.sh" ]

 

 

 

 

 

run.sh:

 

 

 

 

 

#!/bin/bash

# Set default Host to 0.0.0.0 if not already set
HOST="${HOST:-0.0.0.0}"
OPT+=" --host ${HOST}"

# Add port to options if PORT is set and --port is not already in ARG
if [ ! -z "${PORT}" ] && [[ ! "${ARG}" =~ --port ]]; then
	OPT+=" --port ${PORT}"
fi

# Echo the SERVE_FILES_PATH and the options to be used
echo ${MODEL_NAME}
echo ${URL_PREFIX}

# Use set -x to print commands and their arguments as they are executed.
set -x

# Run the service with the model and the prepared options
infinity_emb --url-prefix "${URL_PREFIX}" --model-name-or-path "${MODEL_NAME}"

 

 

 

 

 

The code is somehow self-explained. Here just highlight some important adaptations of Infinity for SAP AI Core.

  1. Adapt the Infinity's API endpoints with /v1 prefix
    As mentioned in the requirements of SAP AI Core, the endpoints of the custom inference server must start with a versioning prefix(/v1/xxx, /v2/yyy etc.).
    Thankfully, Infinity provides such option to configure an URL prefix to the endpoint, --url-prefix "URL_PREFIX", of which this value would be defined through setting up a configuration in SAP AI Core.

  2. Ensure the Embedding Dimension matched to your Vector Database (e.g. SAP HANA Vector Database Engine) Vector column type value
    When storing text embeddings in a vector database, the embedding dimension (number of features) is crucial for ensuring data integrity. Make sure the Text Embedding model you're using uses the right embedding dimensions. For our case, the model, nreimers/MiniLM-L6-H384-uncased uses an Embedding Dimensions of 384.

 

Step 2: Build a Docker image for Infinity and Push it to Docker Hub

This step has been automated with /infinity-emb/01-deployment.ipynb for the sample byom-oss-llm-ai-core. Once the dockerfile is in place, next we can build the docker image and push it to docker hub with commands below:

 

 

 

 

 

# 0.Login to docker hub
docker login -u <YOUR_DOCKER_USER> -p <YOUR_DOCKER_ACCESS_TOKEN>

# 1.Build the docker image
docker build --platform=linux/amd64 -t docker.io/<YOUR_DOCKER_USER>/infinity:ai-core .

# 2.Push the docker image to docker hub to be used by deployment in SAP AI Core
docker push docker.io/<YOUR_DOCKER_USER>/infinity:ai-core 

 

 

 

 

 

Step 3: Prepare a Serving Template for Infinity and host it in a Github Repository

I have prepared a sample serving template for Infinity (infinity-template.yaml). Let's walk it through.

 

 

 

 

apiVersion: ai.sap.com/v1alpha1
kind: ServingTemplate
metadata:
  name: infinity
  annotations:
    scenarios.ai.sap.com/description: "Run an Infinity embedding inference server on SAP AI Core"
    scenarios.ai.sap.com/name: "infinity"
    executables.ai.sap.com/description: "Run an Infinity embedding inference server on SAP AI Core"
    executables.ai.sap.com/name: "infinity"
  labels:
    scenarios.ai.sap.com/id: "infinity"
    ai.sap.com/version: "0.0.1"
spec:
  inputs:
    parameters:
    - name: image
      type: "string"
      default: "docker.io/<YOUR_DOCKER_USER>/infinity:ai-core"
      description: "Define the location of the Docker image of which you have built for Infinity following the steps."
    - name: modelName
      type: "string"
      default: "nreimers/MiniLM-L6-H384-uncased"
      description: "Define the Sentence Transformer model (MTEB) you would like to use that Infinity supports. More info: https://michaelfeil.eu/infinity/latest/"
    - name: urlPrefix
      type: "string"
      default: "/v1"
      description: "It is required for SAP AI Core to base the root of the inference server to start with /v1."
    - name: portNumber
      type: "string"
      default: "7997"
      description: "When you run a container, if you want to access the application in the container via a port number."
    - name: resourcePlan
      type: "string"
      default: "infer.s"
      description: "Resource plans are used to select resources in workflow and serving templates."
    - name: minReplicas
      type: "string"
      default: "1"
      description: "The lower limit for the number of replicas to which the autoscaler can scale down."
    - name: maxReplicas
      type: "string"
      default: "1"
      description: "The upper limit for the number of replicas to which the autoscaler can scale down."
  template:
    apiVersion: "serving.kserve.io/v1beta1"
    metadata:
      annotations: |
        autoscaling.knative.dev/metric: concurrency
        autoscaling.knative.dev/target: 1
        autoscaling.knative.dev/targetBurstCapacity: -1
        autoscaling.knative.dev/window: "10m"
        autoscaling.knative.dev/scaleToZeroPodRetentionPeriod: "10m"
      labels: |
        ai.sap.com/resourcePlan: "{{inputs.parameters.resourcePlan}}"
    spec: |
      predictor:
        imagePullSecrets:
        - name: <YOUR_DOCKER_SECRET>
        minReplicas: {{inputs.parameters.minReplicas}}
        maxReplicas: {{inputs.parameters.maxReplicas}}
        containers:
        - name: kserve-container
          image: "{{inputs.parameters.image}}"
          ports:
          - containerPort: {{inputs.parameters.portNumber}}
            protocol: TCP
          env:
          - name: MODEL_NAME
            value: "{{inputs.parameters.modelName}}"
          - name: URL_PREFIX
            value: "{{inputs.parameters.urlPrefix}}"

 

 

 

 

There are 7 input parameters, which are pretty self explanatory, of which more information you may refer to the description value of it.

Step 4: Onboard the Github repository to SAP AI Core

As mentioned in Step 3, the serving template need to be hosted in a Github Repository for SAP AI Core. This step has been automated for the sample byom-oss-llm-ai-core in 00-init-config.ipynb. Alternatively, you can onboard a github repository through SAP AI Launchpad by manual. As a result, the associated Github Repository has been onboarded into SAP AI Core.

jacobahtan_10-1719907267825.png

 

Step 5: Create an application SAP AI Core and sync with the GitHub repository

This step has been automated for the sample byom-oss-llm-ai-core in 00-init-config.ipynb. Alternatively, you can can create your own app and sync with its associated github repo on-boarded on step 4 through SAP AI Launchpad by manual. As a result, an application has been created, a scenario for ollama is created after synchronization.

jacobahtan_8-1719907096097.png

If we have a look at the infinity scenario, it has 7 input parameters as defined in its serving template (infinity-template.yaml).

jacobahtan_9-1719907198332.png

Step 6: Create a configuration and start a deployment

This step has been automated with 01-deployment.ipynb for the sample byom-oss-llm-ai-core. Alternatively, you can can create a configuration and start a deployment with SAP AI Launchpad. Here you have the demo recording:

Step 7: Inference the open-source Text Embedding model (nreimers/MiniLM-L6-H384-uncased) served with Infinity in SAP AI Core

In our sample btp-generative-ai-hub-use-cases/01-social-media-citizen-reporting-genai-hub, the use case is about a fictitious city called "Sagenai City" facing challenges in managing and tracking maintenance in public areas. The city wants to improve the way they handle reported issues from the citizens, by analyzing social media posts & making informed decisions and so effectively tracking & managing issues in public spaces. and output in json schema with plain prompting.

  • Category
  • Priority
  • Summary
  • Description
  • Address
  • Sentiment

Previously, in one of the use case, to deduplicate citizen reported issues, we used one of the SAP AI Core's generative AI Text Embedding model from OpenAI - Text Embedding ADA 002, to generate vector embeddings and store it in SAP HANA Vector Engine. More info you may read about it here.

Option 1 - Inference with direct API call

You can inference the model in Infinity with HTTP calls, which is applicable for any programming language that support http calls to a remote http server, such as JavaScript(CAP), Java(CAP), ABAP, Python etc. Here is some code snippet in Python for illustration. Please check out the full sample jupyter notebook with /infinity-emb/02-embedding.ipynb

 

 

 

 

deployment = ai_api_client.deployment.get(deployment_id)
inference_base_url = f"{deployment.deployment_url}"
endpoint = f"{inference_base_url}/v1/embeddings"
json_data = {
  "input": [
    "A sentence to encode."
  ]
}
response = requests.post(endpoint, headers=headers, json=json_data)
x = json.loads(response.content)
print(x['data'][0]['embedding'])

Results:
[-0.0741010531783104, 0.05380121245980263,...,..., -0.050218887627124786, -0.024147523567080498]

 

 

 

 

Option 2 - Inference with SAP Generative AI Hub SDK and LangChain

You can also inference the model in Infinity with SAP Generative AI Hub SDK, which can simplify the access to SAP Generative AI Hub for application development or integration. Please check its home page of python package as above for details. As of 20 Jun 2024,  the SDK is only available as a python package, hence it is only available for python application development. Here is code snippet for illustration, please check out the full jupyter notebook infinity-emb/02-embedding-sap-genai-hub-sdk.ipynb for more detail.
The high-level flow is as follows:

  1. Load configurations info
  2. Connect to SAP AI Core via SDK
  3. Check the status and logs of the deployment
  4. Inference the model through SAP Generative AI Hub SDK using OpenAI Embeddings
  5. Using Infinity to generate embeddings to store in SAP HANA Cloud Vector Engine
  6. Perform Similarity Search using Infinity > bge-small-en embedding model with SAP HANA Cloud Vector Engine
i. Install SAP Generative AI Hub SDK and Langchain

 

 

 

 

pip install generative-ai-hub-sdk[langchain]

 

 

 

 

ii. Register the scenario as a foundation model scenario

 

 

 

 

from gen_ai_hub.proxy.gen_ai_hub_proxy import GenAIHubProxyClient

GenAIHubProxyClient.add_foundation_model_scenario(
    scenario_id="byom-infinity-server",
    config_names="infinity*",
    prediction_url_suffix="/v1/embeddings",
)
proxy_client = GenAIHubProxyClient(ai_core_client = ai_core_client)

 

 

 

 

iii. Inference with SAP Generative API Hub SDK

 

 

 

 

from gen_ai_hub.proxy.native.openai import embeddings

response = embeddings.create(
    input="Every decoding is another encoding.",
    model_name="nreimers/MiniLM-L6-H384-uncased",
    encoding_format='base64'
)
print(response.data[0].embedding)

 

 

 

 

(Bonus) Integration with SAP HANA Cloud, Vector Engine: Generate embeddings and store in Vector Database with SAP HANA Cloud, Vector Engine

In the following code snippet, you will find some examples on how to integrate with SAP HANA Cloud, Vector Engine. 

 

 

# 1. Connect to SAP HANA Cloud Vector Engine with hana_ml library
from hana_ml import ConnectionContext
cc= ConnectionContext(
    address='<HANA_CLOUD_IP>.hna0.prod-eu10.hanacloud.ondemand.com', 
    port='<PORT>', 
    user='<USER>', 
    password='<PASSWORD>', 
    encrypt=True
    )

# 2. Define the Embedding method - using open source embedding model
from gen_ai_hub.proxy.native.openai import embeddings
def get_embedding(input, model="nreimers/MiniLM-L6-H384-uncased") -> str: 
    response = embeddings.create(
        model_name ="nreimers/MiniLM-L6-H384-uncased",
        input=input
    )
    return response.data[0].embedding

# 3. Create a table
cursor = cc.connection.cursor()
sql_command = '''CREATE TABLE SOCIAL_CITIZEN_GENAI_PROCESSEDISSUES(ID INTEGER, PROCESSOR NVARCHAR(5000), PROCESSDATE NVARCHAR(5000), PROCESSTIME NVARCHAR(5000), REPORTEDBY NVARCHAR(5000), DECISION NVARCHAR(5000), REDDITPOSTID NVARCHAR(5000), MAINTENANCENOTIFICATIONID NVARCHAR(5000), ADDRESS NVARCHAR(5000), LOCATION NVARCHAR(5000), LAT NVARCHAR(5000), LONG NVARCHAR(5000), GENAISUMMARY NVARCHAR(5000), GENAIDESCRIPTION NVARCHAR(5000), PRIORITY NVARCHAR(5000), PRIORITYDESC NVARCHAR(5000), SENTIMENT NVARCHAR(5000), CATEGORY NVARCHAR(5000), DATE DATE, TIME NVARCHAR(5000), TEXT NVARCHAR(5000), EMBEDDING NCLOB);'''
cursor.execute(sql_command)
cursor.close()

# 4. Add REAL_VECTOR column (Take note here that we define the Embedding Dimensions of 384, which matches with the Text Embedding model)
cursor = cc.connection.cursor()
sql_command = '''ALTER TABLE SOCIAL_CITIZEN_GENAI_PROCESSEDISSUES ADD (VECTOR REAL_VECTOR(384));'''
cursor.execute(sql_command)
cursor.close()

# 5. Generate embeddings from the text
import math
rows = []
for index, row in df.iterrows():
    data_to_insert = row.to_dict()
    text=row.TEXT
    x=row.MAINTENANCENOTIFICATIONID

    # check on maintenance notification id as some values are NaN
    if math.isnan(x):
        maintenanceNotID=0
    else:
        maintenanceNotID=row.MAINTENANCENOTIFICATIONID
    
    text_vector = get_embedding(input=text)
    
    myrow = (row['ID'], row['PROCESSOR'], row['PROCESSDATE'], row['PROCESSTIME'], row['REPORTEDBY'], row['DECISION'], 
             row['REDDITPOSTID'], maintenanceNotID, row['ADDRESS'], row['LOCATION'], row['LAT'], 
             row['LONG'], row['GENAISUMMARY'], row['GENAIDESCRIPTION'], row['PRIORITY'], row['PRIORITYDESC'], row['SENTIMENT'], 
             row['CATEGORY'], row['DATE'], row['TIME'], row['TEXT'], str(text_vector), str(text_vector))
    
    rows.append(myrow)

# Bulk insert of 23 fields parameterised in rows variable
cc.connection.setautocommit(False)
cursor = cc.connection.cursor()
sql = '''INSERT INTO SOCIAL_CITIZEN_GENAI_PROCESSEDISSUES 
VALUES(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,TO_REAL_VECTOR(?));'''
try:
    cursor.executemany(sql, rows)
except Exception as e:
    cc.connection.rollback()
    print("An error occurred:", e)
try:
    cc.connection.commit()
finally:
    cursor.close()
cc.connection.setautocommit(True)

 

 

(Bonus) Integration with SAP HANA Cloud, Vector Engine: Perform Similarity Search

 

 

# 1. Define a run vector search method
def run_vector_search(query: str, metric="COSINE_SIMILARITY", k=4):
    if metric == 'L2DISTANCE':
        sort = 'ASC'
    else:
        sort = 'DESC'
    query_vector = get_embedding(input=query)
    sql = '''SELECT TOP {k} "ID", "CATEGORY", "TEXT", "DATE", "LOCATION", "{metric}"("VECTOR", TO_REAL_VECTOR('{qv}')) AS SIM
        FROM "DBADMIN"."SOCIAL_CITIZEN_GENAI_PROCESSEDISSUES"
        ORDER BY "SIM" {sort}'''.format(k=k, metric=metric, qv=query_vector, sort=sort)
    hdf = cc.sql(sql)
    df_context = hdf.head(k).collect()
    return df_context

# 2. Prepare a text string sample
vector_str = """📢 Urgent Report for Public Attention 📢
Dear neighbours of Sagenai,
I hope this post finds you well. I am writing today to bring to your attention a pressing issue that requires immediate action from our local authorities. 🚮
In the heart of our beautiful neighbourhood, specifically at 27-3 Victoria Rd, London, UK, we are currently facing a problem that greatly affects our daily lives: an overflowing dustbin. 🗑️ The pungent odor, unsightly sight, and the potential for vermin and health hazards pose a significant inconvenience for all of us. 🤢
I kindly request our esteemed local administration to address this matter promptly, ensuring the cleanliness and hygiene we deserve in our shared spaces. 🙏🏼 Let's work together to maintain the charm and cleanliness of our beloved Sagenai!
Thank you for your attention and support in resolving this matter.
Best regards,
Concerned Citizen 🌟
Coordinates:(51.553842239632296,0.0041263312666776075)
&amp;#x200B;
https://preview.redd.it/8f6f8tumw"""

# 3. Perform Similarity Search
df = run_vector_search(query = vector_str, k = 10)
df

 

 

Please note that the following bonuses is part of the blog post showing you how to explore the Generative AI capabilities of SAP AI Foundation, along with proof of concepts in the form of use cases:

Replay: Embedding Business Context with the SAP HANA Cloud Vector Engine

 

Try it out

Bring Open-Source LLMs into SAP AI Core with Infinity:
Please refer to this manual to try out deploying and running open-source Text Embedding Models with Infinity in SAP AI Core. The source code of this sample is released under Apache 2.0 license. You should be accountable for your own choice of commercially viable open-source LLMs/LMMs/Text Embedding Models.

Summary

This blog post has explored the exciting potential of integrating open-source Text Embedding Models with SAP AI Core's Bring Your Own Model (BYOM) approach. We've delved into the technical details of deploying and running Infinity, a high-performance text embedding inference server, within your SAP AI Core environment.

We have been witnessing a fundamental transformation in some industries and functions by Generative AI, such as education, media press, software development, marketing, customer service etc. The open-source LLM community is evolving rapidly, and has a role to play in the need of data protection and privacy. 

For SAP developers that need to leverage generative AI in their solutions, SAP provides Generative AI Hub as an easy access to a wild range of leading LLMs including both proprietary and open-source, which should have most of your use-cases covered.

In particular business cases where you need different open-source LLM which are not yet available in Generative AI Hub, or you have fine tune some open-source model with your own data, SAP AI Core can be used to deploy and run them with custom inference server or read-to-use open-source inference servers, such as Infinity a in a manner of your choice and your responsibility.

Disclaimer: SAP notes that posts about potential uses of generative AI and large language models are merely the individual poster's ideas and opinions, and do not represent SAP's official position or future development roadmap. SAP has no legal obligation or other commitment to pursue any course of business, or develop or release any functionality, mentioned in any post or related content on this website.

4 Comments
Akaay
Discoverer
0 Kudos

Hi,

I am using llama3.1:8b model deployed over sap-ai-core using ollama for embeddings but facing the error. 

For chat the model works fine. Could you please advice on this?

jagdish_chandrasekaran
Product and Topic Expert
Product and Topic Expert
0 Kudos

Hi,

can open source multimodal embeddings(like imgbeddings,Sentence Transformers)  can also be deployed and used via SAP AI Core?

 

 

YatseaLi
Product and Topic Expert
Product and Topic Expert
0 Kudos

@Akaay,

llama 3.1 models family is for text generation, not for text embedding purpose. 

Here list some text embedding available in ollama.

  1. nomic-embed-text
  2. hellord/mxbai-embed-large-v1

Once the embedding model pulled in ollama in SAP AI Core, then you can access the embedding api through endpoint: /v1/api/embed. Please refer to ollama's Generate Embedding API for detail.

YatseaLi
Product and Topic Expert
Product and Topic Expert
0 Kudos

@jagdish_chandrasekaran,

For sentence-transformer,  as mentioned in Why Infinity section in its official github repo 

  • Deploy any model from HuggingFace: deploy any embedding, reranking, clip and sentence-transformer model from HuggingFace

For imgbeddings, it is technically feasible to custom an embedding API server with imgbeddings. Here is an example about how we can custom a transformer inference server in SAP AI Core for text generation. So, you can achieve the same for imgbeddings.

Labels in this area