Integration Forum
cancel
Showing results for 
Search instead for 
Did you mean: 
Read only

May Developer Challenge - Week 3: Vectorizing the event payload

ajmaradiaga
Developer Advocate
Developer Advocate
0 Likes
239

Welcome to Week 3! We're halfway through the challenge and the pipeline is taking shape. You can produce events, get them to a broker, and consume them from code/an integration platform. Now we get to the part that ties this challenge to AI: vectorization.

Links:

This week, we take the data field from the Business Partner event payload and convert it into a vector embedding — a numerical representation of the content that captures its semantic meaning. This is the step that will later allow us to do similarity searches and power a RAG application.

A quick primer on embeddings

An embedding is a list of floating-point numbers — a vector — that represents the meaning of a piece of text in a high-dimensional space. Text with similar meaning ends up close together in that space. This is what makes semantic search possible: instead of matching exact keywords, you match meaning.

To generate embeddings, you need an embedding model. You pass in a piece of text, and it returns a vector. For our purposes, we'll be embedding the content of the data field of our Business Partner events — typically after converting the JSON object to a string or extracting the most relevant fields.

Embedding models in SAP AI CoreEmbedding models in SAP AI Core

For example, from this event payload:

{
  "BusinessPartner": "1003783",
  "BusinessPartnerUUID": "456872b9-b9a2-4b93-894d-dff37abd3070",
  "BusinessPartnerFullName": "Daniela-Anita Macedo",
  "BusinessPartnerCategory": "1",
  "BusinessPartnerGrouping": "BP02",
  "FirstName": "Daniela-Anita",
  "LastName": "Macedo",
  "IsNaturalPerson": "X",
  "CreationDate": "/Date(1518393600000)/",
  "CreatedByUser": "CC0000000002",
  "BusinessPartnerAddress": {
    "Country": "PT",
    "Region": "",
    "CityName": "Quarteira",
    "PostalCode": "1385-831",
    "StreetName": "Travessa de Sousa, 6",
    "HouseNumber": "681",
    "AddressTimeZone": "WEST"
  }
}

You might produce a string like:

BusinessPartner: 1003783. Name: Daniela-Anita Macedo. Category: 1. CityName: Quarteira.

And that string is what you send to the embedding model.

Your task this week

👉 Extend your consumer from Week 2 so that, after receiving a Business Partner event, it generates a vector embedding of the event's data field.

Steps:

  1. Receive the event (as you did in Week 2)
  2. Extract the data field and prepare it as a string
  3. Send that string to an embedding model and get back a vector
  4. Log the vector (or a truncated version of it) to confirm it's working

Embedding model options:

  • SAP options: SAP AI Core (via the Generative AI Hub) — models like text-embedding-3-small_autogenerated or similar are available depending on your setup
  • Open-source / cloud options: OpenAI Embeddings API, HuggingFace Sentence Transformers (fully local, no API key needed), Ollama with a local embedding model

If you want to run everything locally without any API keys, HuggingFace Sentence Transformers is an excellent option. A model like all-MiniLM-L6-v2 is small, fast, and produces 384-dimensional embeddings that are more than sufficient for this challenge.

Share your work

Add a comment in this discussion with:

  1. A snippet of code/screenshot showing how you prepared the event data and called your embedding model
  2. A truncated example of the vector embedding you received back (e.g., the first 10 dimensions)
  3. Which embedding model/service you used and why

SAP solution note — I will share how I solved this using SAP AI Core (Generative AI Hub) in the comments below


Some food for thought:

  1. Does the order or structure of the fields you include in your text affect the quality of the embeddings?
  2. If a Business Partner record has missing fields, how would you handle that before sending it to the embedding model?
  3. What is the trade-off between using a cloud-hosted embedding model vs. a locally hosted one?
3 REPLIES 3
Read only

JMV
Explorer
0 Likes
121

Week 3 Submission

(Week 1, we got the events routing into Solace from SAP, and in Week 2, we successfully consumed them.)

This week (Week 3), I have enhanced the Python consumer to push those events over to the HANA Vector DB. using locally running Embedding Model - Ollama - nomic-embed-text to HANA Cloud DB

Python Output

JMV_0-1779151044545.png

Vectorized Data on HANA Cloud

JMV_1-1779151084467.png

I just went ahead and consumed Vecorized Data in Claude using MCP.

JMV_2-1779151141489.png

 

Read only

KanishkaDeshak
Explorer
0 Likes
10

Hello,

Thanks for the challenge

As suggested I tried doing it locally using SAP BAS trial, but the storage was an issue so i'm doing it with COHERE Api, as it was available for free to consume.

Below are the steps performed.

1. received the event

KanishkaDeshak_1-1779282619678.png

2. Converted the JSON into String

KanishkaDeshak_2-1779282668100.png

3. sent it to an embedding model(cohere api)

KanishkaDeshak_3-1779282728291.png

 

4. got back the vector and logged it into data store

KanishkaDeshak_4-1779282785070.png

 

 

 

Read only

umberto_panico
Participant
0 Likes
4

Week 3 with SAP AI CORE:

Step 1:

umberto_panico_0-1779285179329.png

Step 2:

umberto_panico_1-1779285889824.png