Re: May Developer Challenge - Week 3: Vectorizing ...

ajmaradiaga · ‎2026 May 18

Welcome to Week 3! We're halfway through the challenge and the pipeline is taking shape. You can produce events, get them to a broker, and consume them from code/an integration platform. Now we get to the part that ties this challenge to AI: vectorization.

Links:
May's developer challenge blog post: https://community.sap.com/t5/integration-blog-posts/may-2026-developer-challenge-from-events-to-inte...
Week 1: Getting familiar with the events
Week 2: Connecting to the broker and consuming events
Week 3: Vectorizing the event payload
Week 4: Storing vectors and enabling RAG

This week, we take the data field from the Business Partner event payload and convert it into a vector embedding — a numerical representation of the content that captures its semantic meaning. This is the step that will later allow us to do similarity searches and power a RAG application.

A quick primer on embeddings

An embedding is a list of floating-point numbers — a vector — that represents the meaning of a piece of text in a high-dimensional space. Text with similar meaning ends up close together in that space. This is what makes semantic search possible: instead of matching exact keywords, you match meaning.

To generate embeddings, you need an embedding model. You pass in a piece of text, and it returns a vector. For our purposes, we'll be embedding the content of the data field of our Business Partner events — typically after converting the JSON object to a string or extracting the most relevant fields.

Embedding models in SAP AI Core

For example, from this event payload:

{
  "BusinessPartner": "1003783",
  "BusinessPartnerUUID": "456872b9-b9a2-4b93-894d-dff37abd3070",
  "BusinessPartnerFullName": "Daniela-Anita Macedo",
  "BusinessPartnerCategory": "1",
  "BusinessPartnerGrouping": "BP02",
  "FirstName": "Daniela-Anita",
  "LastName": "Macedo",
  "IsNaturalPerson": "X",
  "CreationDate": "/Date(1518393600000)/",
  "CreatedByUser": "CC0000000002",
  "BusinessPartnerAddress": {
    "Country": "PT",
    "Region": "",
    "CityName": "Quarteira",
    "PostalCode": "1385-831",
    "StreetName": "Travessa de Sousa, 6",
    "HouseNumber": "681",
    "AddressTimeZone": "WEST"
  }
}

You might produce a string like:

BusinessPartner: 1003783. Name: Daniela-Anita Macedo. Category: 1. CityName: Quarteira.

And that string is what you send to the embedding model.

Your task this week

👉 Extend your consumer from Week 2 so that, after receiving a Business Partner event, it generates a vector embedding of the event's data field.

Steps:

Receive the event (as you did in Week 2)
Extract the data field and prepare it as a string
Send that string to an embedding model and get back a vector
Log the vector (or a truncated version of it) to confirm it's working

Embedding model options:

SAP options: SAP AI Core (via the Generative AI Hub) — models like text-embedding-3-small_autogenerated or similar are available depending on your setup
Open-source / cloud options: OpenAI Embeddings API, HuggingFace Sentence Transformers (fully local, no API key needed), Ollama with a local embedding model

If you want to run everything locally without any API keys, HuggingFace Sentence Transformers is an excellent option. A model like all-MiniLM-L6-v2 is small, fast, and produces 384-dimensional embeddings that are more than sufficient for this challenge.

Add a comment in this discussion with:

A snippet of code/screenshot showing how you prepared the event data and called your embedding model
A truncated example of the vector embedding you received back (e.g., the first 10 dimensions)
Which embedding model/service you used and why

SAP solution note — I will share how I solved this using SAP AI Core (Generative AI Hub) in the comments below

Some food for thought:

Does the order or structure of the fields you include in your text affect the quality of the embeddings?
If a Business Partner record has missing fields, how would you handle that before sending it to the embedding model?
What is the trade-off between using a cloud-hosted embedding model vs. a locally hosted one?

JMV · ‎2026 May 19

Week 3 Submission

(Week 1, we got the events routing into Solace from SAP, and in Week 2, we successfully consumed them.)

This week (Week 3), I have enhanced the Python consumer to push those events over to the HANA Vector DB. using locally running Embedding Model - Ollama - nomic-embed-text to HANA Cloud DB

Python Output

Vectorized Data on HANA Cloud

I just went ahead and consumed Vecorized Data in Claude using MCP.

KanishkaDeshak · ‎2026 May 20

Hello,

Thanks for the challenge

As suggested I tried doing it locally using SAP BAS trial, but the storage was an issue so i'm doing it with COHERE Api, as it was available for free to consume.

Below are the steps performed.

1. received the event

2. Converted the JSON into String

3. sent it to an embedding model(cohere api)

4. got back the vector and logged it into data store

umberto_panico · ‎2026 May 20

Week 3 with SAP AI CORE:

Step 1:

Step 2:

Ihor_Haranichev · ‎2026 May 22

Thanks @ajmaradiaga for the challenge!

A snippet code:

The vector embedding response:

I used the SentenceTransformers library with the all-MiniLM-L6-v2 embedding model to generate vector representations of Business Partner event data. It runs locally without requiring API keys or external services It is lightweight and fast (suitable for development and prototyping).

Joery · ‎2026 May 24

Hi,

For week 3 I created a vectorize_business_partner function that extracts key fields from the CloudEvent's data payload (ID, Name, Region, Country, and Industry) and formats them into a descriptive natural language string.

Here is a sample of a created vector:

[0.03595711290836334, 0.6024591326713562, -3.6318137645721436, -0.44531744718551636, 1.060988187789917, -0.374258428812027, 0.0002853244368452579, 1.3283621072769165, 0.19313935935497284, -1.1381871700286865]

The embedding model used in Ollama is nomic-embed-text
By using Ollama locally, I ensure that sensitive Business Partner data never leaves my environment.
The model is a highly efficient model with a large context window (8192 tokens), making it ideal for processing most business data. It generates standard 768-dimension vectors that are compatible with local vector stores like ChromaDB.

Concerning the food for thought points

Does the order or structure of the fields affect the quality of the embeddings?
Yes, it does. Since models like nomic-embed-text are based on transformer architectures, they are context-aware. A natural language sentence ("Name is X, located in Y") provides more semantic value than a raw list of attributes. The model's attention mechanism processes the relationship between words, so the structure helps in creating a more accurate representation in the vector space.
How would you handle missing fields in a Business Partner record?
In my implementation, I use a "Default/Unknown" strategy. For example, if the Industry field is missing, I default it to "General". This ensures the sentence structure remains consistent. Another approach would be to omit the missing field entirely to avoid introducing "noise," but for RAG purposes, I think keeping a stable template helps the model compare records more effectively.
What is the trade-off between cloud-hosted vs. locally hosted embedding models
- Cloud-hosted (SAP Generative AI Hub, OpenAI)
  Offers the best performance and scalability, but might introduce introduces data privacy concerns and potential costs based on token consumption
- Locally hosted (Ollama)
  Guaranteed data privacy (data stays on my machine) and no variable costs. However, it requires local hardware (GPU/RAM) and the performance depends on the local machine's capability.

Finally a code snippet

Spoiler

def vectorize_business_partner(data):
    # Extract and Prepare
    bp_id = data.get('BusinessPartner', 'Unknown')
    name = data.get('CustomerName', 'Unknown')
    country = data.get('Country', 'Unknown')
    region = data.get('Region', 'Unknown')
    industry = data.get('Industry', 'General')

    # Create descriptive string
    text = f"Business Partner {bp_id}: Name is {name}, located in {region}, {country}. Industry: {industry}."
    
    # Generate Embedding (via Ollama API)
    payload = {"model": "nomic-embed-text", "prompt": text}
    response = requests.post("http://localhost:11434/api/embeddings", json=payload)
    embedding = response.json()["embedding"]
    
    return embedding

def vectorize_business_partner(data): # Extract and Prepare bp_id = data.get('BusinessPartner', 'Unknown') name = data.get('CustomerName', 'Unknown') country = data.get('Country', 'Unknown') region = data.get('Region', 'Unknown') industry = data.get('Industry', 'General') # Create descriptive string text = f"Business Partner {bp_id}: Name is {name}, located in {region}, {country}. Industry: {industry}." # Generate Embedding (via Ollama API) payload = {"model": "nomic-embed-text", "prompt": text} response = requests.post("http://localhost:11434/api/embeddings", json=payload) embedding = response.json()["embedding"] return embedding

Kr,
Joery

MioYasutake · ‎2026 May 26

@ajmaradiaga

My submission for week 3:

1. Just like in week 1, I published an event from Solace 'Try Me!'.

2. The vector embedding received from OpenAI Embedding API is shown below:

3. I used OpenAI "text-embedding-3-small" because I wanted to try out the AI adapter in Cloud Integration.

Andrii · ‎2026 May 29

I used SAP CPI to consume Business Partner events from Solace.
Extract the data field into a string and send it to OpenAI's text-embedding-3-small model.
The returned 1536-dimensional embedding vector was logged in the message properties.
Screenshots show the input, the iFlow, and the first 10 dimensions of the generated vector.

Laco · ‎2026 Jun 01

Steps:

Receive the event (as you did in Week 2), Extract the data field and prepare it as a string, Send that string to an embedding model and get back a vector:

import pika, sys, os, json
from ollama import embed

def createEmbedding(humanReadable):
    batch = embed(model='mistral',
                  input=[humanReadable])
    printTruncatedVector(batch['embeddings'])

def printTruncatedVector(vector):
    max=len(vector[0])
    print(" truncated embedding vector:")
    print(f"[{vector[0][0]},{vector[0][1]},{vector[0][2]}, ........... ,{vector[0][max-3]},{vector[0][max-2]},{vector[0][max-1]}]")

def callback(ch, method, properties, body):
    print("\n [x] Received\n")
    bpData = json.loads(body)["data"]
    humanReadable = (
        f"BusinessPartnerUUID: {bpData["BusinessPartnerUUID"]}"    
        f" BusinessPartner: {bpData["BusinessPartner"]}" 
        f" BusinessPartnerFullName: {bpData["BusinessPartnerFullName"]}"
        f" BusinessPartnerCategory: {bpData["BusinessPartnerCategory"]}"
        f" BusinessPartnerGrouping: {bpData["BusinessPartnerGrouping"]}"
        f" FirstName: {bpData["FirstName"]}"
        f" LastName: {bpData["LastName"]}"
        f" IsNaturalPerson: {bpData["IsNaturalPerson"]}"
        f" Country: {bpData["BusinessPartnerAddress"]["Country"]}"
        f" Region: {bpData["BusinessPartnerAddress"]["Region"]}"
        f" CityName: {bpData["BusinessPartnerAddress"]["CityName"]}"
        f" PostalCode: {bpData["BusinessPartnerAddress"]["PostalCode"]}"
        f" StreetName: {bpData["BusinessPartnerAddress"]["StreetName"]}"
        f" HouseNumber: {bpData["BusinessPartnerAddress"]["HouseNumber"]}"
        f" AddressTimeZone: {bpData["BusinessPartnerAddress"]["AddressTimeZone"]}")
    print(f" human readable:\n{humanReadable}\n")
    createEmbedding(humanReadable)

def followTheWhiteRabbit():
    connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
    channel = connection.channel()
    channel.queue_declare(queue='SAPDEV2026MAY', durable=True, arguments={'x-queue-type': 'quorum'})
    channel.basic_consume(queue='SAPDEV2026MAY',
                          auto_ack=True,
                          on_message_callback=callback)
    print(' [*] Waiting for messages. To exit press CTRL+C')
    channel.start_consuming()

if __name__ == '__main__':
    try:
        followTheWhiteRabbit()
    except KeyboardInterrupt:
        print('Interrupted')
        try:
            sys.exit(0)
        except SystemExit:
            os._exit(0)

Log the vector (or a truncated version of it) to confirm it's working:

3. Which embedding model/service you used: ollama Mistral model running locally

By Category

Related Content

Activity Groups

Industry Groups

Influence and Feedback Groups

Interest Groups

Location Groups

Customer Only Groups

Forums

Related Resources

Products

Learning and Support

About

My SAP Profile

My SAP Profile

May Developer Challenge - Week 3: Vectorizing the event payload

A quick primer on embeddings

Your task this week

May Developer Challenge - Week 3: Vectorizing the event payload

A quick primer on embeddings

Your task this week

Share your work