Integration Forum
cancel
Showing results for 
Search instead for 
Did you mean: 
Read only

May Developer Challenge - Week 3: Vectorizing the event payload

ajmaradiaga
Developer Advocate
Developer Advocate
0 Likes
209

Welcome to Week 3! We're halfway through the challenge and the pipeline is taking shape. You can produce events, get them to a broker, and consume them from code/an integration platform. Now we get to the part that ties this challenge to AI: vectorization.

Links:

This week, we take the data field from the Business Partner event payload and convert it into a vector embedding â€” a numerical representation of the content that captures its semantic meaning. This is the step that will later allow us to do similarity searches and power a RAG application.

A quick primer on embeddings

An embedding is a list of floating-point numbers — a vector — that represents the meaning of a piece of text in a high-dimensional space. Text with similar meaning ends up close together in that space. This is what makes semantic search possible: instead of matching exact keywords, you match meaning.

To generate embeddings, you need an embedding model. You pass in a piece of text, and it returns a vector. For our purposes, we'll be embedding the content of the data field of our Business Partner events — typically after converting the JSON object to a string or extracting the most relevant fields.

Embedding models in SAP AI CoreEmbedding models in SAP AI Core

For example, from this event payload:

{
  "BusinessPartner": "1003783",
  "BusinessPartnerUUID": "456872b9-b9a2-4b93-894d-dff37abd3070",
  "BusinessPartnerFullName": "Daniela-Anita Macedo",
  "BusinessPartnerCategory": "1",
  "BusinessPartnerGrouping": "BP02",
  "FirstName": "Daniela-Anita",
  "LastName": "Macedo",
  "IsNaturalPerson": "X",
  "CreationDate": "/Date(1518393600000)/",
  "CreatedByUser": "CC0000000002",
  "BusinessPartnerAddress": {
    "Country": "PT",
    "Region": "",
    "CityName": "Quarteira",
    "PostalCode": "1385-831",
    "StreetName": "Travessa de Sousa, 6",
    "HouseNumber": "681",
    "AddressTimeZone": "WEST"
  }
}

You might produce a string like:

BusinessPartner: 1003783. Name: Daniela-Anita Macedo. Category: 1. CityName: Quarteira.

And that string is what you send to the embedding model.

Your task this week

👉 Extend your consumer from Week 2 so that, after receiving a Business Partner event, it generates a vector embedding of the event's data field.

Steps:

  1. Receive the event (as you did in Week 2)
  2. Extract the data field and prepare it as a string
  3. Send that string to an embedding model and get back a vector
  4. Log the vector (or a truncated version of it) to confirm it's working

Embedding model options:

  • SAP options: SAP AI Core (via the Generative AI Hub) — models like text-embedding-3-small_autogenerated or similar are available depending on your setup
  • Open-source / cloud options: OpenAI Embeddings API, HuggingFace Sentence Transformers (fully local, no API key needed), Ollama with a local embedding model

If you want to run everything locally without any API keys, HuggingFace Sentence Transformers is an excellent option. A model like all-MiniLM-L6-v2 is small, fast, and produces 384-dimensional embeddings that are more than sufficient for this challenge.

Share your work

Add a comment in this discussion with:

  1. A snippet of code/screenshot showing how you prepared the event data and called your embedding model
  2. A truncated example of the vector embedding you received back (e.g., the first 10 dimensions)
  3. Which embedding model/service you used and why

SAP solution note â€” I will share how I solved this using SAP AI Core (Generative AI Hub) in the comments below


Some food for thought:

  1. Does the order or structure of the fields you include in your text affect the quality of the embeddings?
  2. If a Business Partner record has missing fields, how would you handle that before sending it to the embedding model?
  3. What is the trade-off between using a cloud-hosted embedding model vs. a locally hosted one?
1 REPLY 1
Read only

JMV
Explorer
0 Likes
91

Week 3 Submission

(Week 1, we got the events routing into Solace from SAP, and in Week 2, we successfully consumed them.)

This week (Week 3), I have enhanced the Python consumer to push those events over to the HANA Vector DB. using locally running Embedding Model - Ollama - nomic-embed-text to HANA Cloud DB

Python Output

JMV_0-1779151044545.png

Vectorized Data on HANA Cloud

JMV_1-1779151084467.png

I just went ahead and consumed Vecorized Data in Claude using MCP.

JMV_2-1779151141489.png