a week ago - last edited Monday
Welcome to Week 3! We're halfway through the challenge and the pipeline is taking shape. You can produce events, get them to a broker, and consume them from code/an integration platform. Now we get to the part that ties this challenge to AI: vectorization.
Links:
- May's developer challenge blog post: https://community.sap.com/t5/integration-blog-posts/may-2026-developer-challenge-from-events-to-inte...
- Week 1: Getting familiar with the events
- Week 2: Connecting to the broker and consuming events
- Week 3: Vectorizing the event payload
- Week 4: Storing vectors and enabling RAG
This week, we take the data field from the Business Partner event payload and convert it into a vector embedding — a numerical representation of the content that captures its semantic meaning. This is the step that will later allow us to do similarity searches and power a RAG application.
An embedding is a list of floating-point numbers — a vector — that represents the meaning of a piece of text in a high-dimensional space. Text with similar meaning ends up close together in that space. This is what makes semantic search possible: instead of matching exact keywords, you match meaning.
To generate embeddings, you need an embedding model. You pass in a piece of text, and it returns a vector. For our purposes, we'll be embedding the content of the data field of our Business Partner events — typically after converting the JSON object to a string or extracting the most relevant fields.
Embedding models in SAP AI Core
For example, from this event payload:
{
"BusinessPartner": "1003783",
"BusinessPartnerUUID": "456872b9-b9a2-4b93-894d-dff37abd3070",
"BusinessPartnerFullName": "Daniela-Anita Macedo",
"BusinessPartnerCategory": "1",
"BusinessPartnerGrouping": "BP02",
"FirstName": "Daniela-Anita",
"LastName": "Macedo",
"IsNaturalPerson": "X",
"CreationDate": "/Date(1518393600000)/",
"CreatedByUser": "CC0000000002",
"BusinessPartnerAddress": {
"Country": "PT",
"Region": "",
"CityName": "Quarteira",
"PostalCode": "1385-831",
"StreetName": "Travessa de Sousa, 6",
"HouseNumber": "681",
"AddressTimeZone": "WEST"
}
}
You might produce a string like:
BusinessPartner: 1003783. Name: Daniela-Anita Macedo. Category: 1. CityName: Quarteira.And that string is what you send to the embedding model.
👉 Extend your consumer from Week 2 so that, after receiving a Business Partner event, it generates a vector embedding of the event's data field.
Steps:
data field and prepare it as a stringEmbedding model options:
text-embedding-3-small_autogenerated or similar are available depending on your setupIf you want to run everything locally without any API keys, HuggingFace Sentence Transformers is an excellent option. A model like
all-MiniLM-L6-v2is small, fast, and produces 384-dimensional embeddings that are more than sufficient for this challenge.
Add a comment in this discussion with:
SAP solution note — I will share how I solved this using SAP AI Core (Generative AI Hub) in the comments below
Some food for thought:
a week ago
Week 3 Submission
(Week 1, we got the events routing into Solace from SAP, and in Week 2, we successfully consumed them.)
This week (Week 3), I have enhanced the Python consumer to push those events over to the HANA Vector DB. using locally running Embedding Model - Ollama - nomic-embed-text to HANA Cloud DB
Python Output
Vectorized Data on HANA Cloud
I just went ahead and consumed Vecorized Data in Claude using MCP.
a week ago
Hello,
Thanks for the challenge
As suggested I tried doing it locally using SAP BAS trial, but the storage was an issue so i'm doing it with COHERE Api, as it was available for free to consume.
Below are the steps performed.
1. received the event
2. Converted the JSON into String
3. sent it to an embedding model(cohere api)
4. got back the vector and logged it into data store
a week ago
Week 3 with SAP AI CORE:
Step 1:
Step 2:
Friday
Thanks @ajmaradiaga for the challenge!
A snippet code:
The vector embedding response:
I used the SentenceTransformers library with the all-MiniLM-L6-v2 embedding model to generate vector representations of Business Partner event data. It runs locally without requiring API keys or external services It is lightweight and fast (suitable for development and prototyping).
Sunday
Hi,
For week 3 I created a vectorize_business_partner function that extracts key fields from the CloudEvent's data payload (ID, Name, Region, Country, and Industry) and formats them into a descriptive natural language string.
Here is a sample of a created vector:
[0.03595711290836334, 0.6024591326713562, -3.6318137645721436, -0.44531744718551636, 1.060988187789917, -0.374258428812027, 0.0002853244368452579, 1.3283621072769165, 0.19313935935497284, -1.1381871700286865]The embedding model used in Ollama is nomic-embed-text
By using Ollama locally, I ensure that sensitive Business Partner data never leaves my environment.
The model is a highly efficient model with a large context window (8192 tokens), making it ideal for processing most business data. It generates standard 768-dimension vectors that are compatible with local vector stores like ChromaDB.
Concerning the food for thought points
Does the order or structure of the fields affect the quality of the embeddings?
Yes, it does. Since models like nomic-embed-text are based on transformer architectures, they are context-aware. A natural language sentence ("Name is X, located in Y") provides more semantic value than a raw list of attributes. The model's attention mechanism processes the relationship between words, so the structure helps in creating a more accurate representation in the vector space.
How would you handle missing fields in a Business Partner record?
In my implementation, I use a "Default/Unknown" strategy. For example, if the Industry field is missing, I default it to "General". This ensures the sentence structure remains consistent. Another approach would be to omit the missing field entirely to avoid introducing "noise," but for RAG purposes, I think keeping a stable template helps the model compare records more effectively.
What is the trade-off between cloud-hosted vs. locally hosted embedding models
Cloud-hosted (SAP Generative AI Hub, OpenAI)
Offers the best performance and scalability, but might introduce introduces data privacy concerns and potential costs based on token consumption
Finally a code snippet
def vectorize_business_partner(data):
# Extract and Prepare
bp_id = data.get('BusinessPartner', 'Unknown')
name = data.get('CustomerName', 'Unknown')
country = data.get('Country', 'Unknown')
region = data.get('Region', 'Unknown')
industry = data.get('Industry', 'General')
# Create descriptive string
text = f"Business Partner {bp_id}: Name is {name}, located in {region}, {country}. Industry: {industry}."
# Generate Embedding (via Ollama API)
payload = {"model": "nomic-embed-text", "prompt": text}
response = requests.post("http://localhost:11434/api/embeddings", json=payload)
embedding = response.json()["embedding"]
return embedding
Kr,
Joery
Tuesday
How do i connect the CPI Iflow to SAP AI Core and link would be helpful ?
Tuesday
My submission for week 3:
1. Just like in week 1, I published an event from Solace 'Try Me!'.
2. The vector embedding received from OpenAI Embedding API is shown below:
3. I used OpenAI "text-embedding-3-small" because I wanted to try out the AI adapter in Cloud Integration.