Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
KevinR
Developer Advocate
Developer Advocate
9,098

Artificial Intelligence is the most significant trend in 2024, and for good reason. The possibilities for users leveraging AI are immense, including generating code, AI-supported debugging, image and video generation, text generation, and process automation. Popular AI models like ChatGPT and DALL·E 3 showcase these capabilities. The real question for developers is how to harness the power of these models effectively. While these models are not typically trained on business-specific domains and may produce inaccuracies or "hallucinations," understanding and mitigating these limitations can unlock their full potential.

"AI hallucinations are incorrect or misleading results that AI models generate.
These errors can be caused by a variety of factors, including insufficient training data, incorrect assumptions made by the model, or biases in the data used to train the model."
- What are AI hallucinations? (Google Cloud)

We need a way to extend the knowledge of the Large Language Model (LLM) with information about the context of our business domain to create meaningful applications.

A way to do that is via the use of so called vector embeddings.

"Vector embeddings are mathematical representations used to encode objects into multi-dimensional vector space. These embeddings capture the relationships and similarities between objects.
SAP HANA Cloud Vector Engine will facilitate the storage and analysis of complex and unstructured vector data(embeddings) into a format that can be seamlessly processed, compared, and utilized in building various intelligent data applications and adding more context in case of GenAI scenarios."
- Vectorize your Data : SAP HANA Cloud's Vector Engine for Unified Data Excellence

Using the SAP HANA Cloud's Vector Engine is a convenient way for SAP Developers to create context for the AI models provided by SAP Core AI. SAP HANA Cloud is already part of the SAP ecosystem, that makes the integration easier. SAP is offering LLMs through the use of SAP's partner foundation models. That gives us all the tools we need to add context to an LLM like GPT-4.

With an LLM and the right contextual embeddings, a developer can build an application consuming the model's capabilities, exposing features or APIs. That allows for meaningful business software to improve user and developer experience.

Scenario

Today, we'll look into a scenario where a Cloud Application Programming Model (CAP) application will be implemented. The application has multiple API endpoints to expose the functionality of the SAP HANA Vector Engine and the SAP Gen AI features.

Imagine the following use case; For the past two years, Thomas Jung and Rich Heilman have done an ABAP and CAP CodeJam road show through Europe. The community wants to know where an event happens on a specific date. Instead of manually looking up the information, a call to an API endpoint executes a Retrieval-Augmented Generation (RAG) request, sending the stored vector embeddings and the user question to an LLM within the SAP AI Core. The LLM provides the answer to the user's question. 

Following steps will be necessary for this use case to work:

  1. Create an instance of SAP AI Core.
  2. Create deployments for a model support ChatCompletion (e.g, gpt-35-turbo or gpt-4) and an embedding model (text-embedding-ada-002).
  3. Establish a connection to SAP AI Core via Destination Services.
  4. Create SAP HANA Cloud with Vector Engine (QRC 1/2024 or later).
  5. Implement the CAP service using the CAP LLM Plugin (Beta).
  6. Create an input document to create the contextual data in form of vector embeddings.
  7. Create and store vector embeddings for SAP HANA Cloud Vector Engine.
  8. Send the RAG request with the needed vector embeddings to the AI model within SAP AI Core.
  9. Enjoy the response!

NOTE: Steps 1–4 aren't being covered in this blog post. The needed resources are linked in the previous list.

Architecture

End2End_Vector_Embedding_Solution_Diagram.pngA CAP application is connected to an SAP HANA Cloud instance on the SAP Business Technology Platform (BTP). The application is interacting with the SAP HANA Cloud Vector Engine. The CAP LLM Plugin used within the application is executing AI-specific tasks on the SAP AI Core services. The connection to the SAP AI Core goes through BTP's destination service. SAP AI Core is routing the requests through to the partner foundation models, for example Azure Open AI Services, and is sending back the response.

CAP LLM Plugin

Within the CAP service, the CAP LLM Plugin can be used not only to connect to the SAP AI Core or SAP HANA Cloud Vector Engine but also to execute operations like anonymization of data, creation of embeddings, executing similarity search, chat completion, and RAG responses. The plugin is available as npm package, the documentation and samples are available on GitHub and the plugin is currently in beta (not suitable for production use cases yet).

CAP LLM BD.png

Connection & LLM configuration

The ./.cdsrc.json file is being used, for the configuration of which embedding and chat model the CAP LLM Plugin can connect to via the SAP AI Core.

 

 

 

"GENERATIVE_AI_HUB": {
        "CHAT_MODEL_DESTINATION_NAME": "AICoreAzureOpenAIDestination",
        "CHAT_MODEL_DEPLOYMENT_URL": "/v2/inference/deployments/d01dff41125cfa27",
        "CHAT_MODEL_RESOURCE_GROUP": "default",
        "CHAT_MODEL_API_VERSION": "2023-05-15",
        "EMBEDDING_MODEL_DESTINATION_NAME": "AICoreAzureOpenAIDestination",
        "EMBEDDING_MODEL_DEPLOYMENT_URL": "/v2/inference/deployments/d7b8e46fc3d5c25f",
        "EMBEDDING_MODEL_RESOURCE_GROUP": "default",
        "EMBEDDING_MODEL_API_VERSION": "2023-05-15"
      },
      "AICoreAzureOpenAIDestination": {
        "kind": "rest",
        "credentials": {
          "destination": "<destination name>",
          "requestTimeout": "300000"
        }
      }
}

 

 

 

Preparing the input data for the vector embeddings

By default, none of the partner foundation models know anything about our specific business domain. That means we have to train the model on any relevant context information needed for a successful response. The relevant information can be fed through vector embeddings to the model so it will understand and learn the context of the business case. The information can be provided, through a simple text document which gets chunked and send to the SAP AI Core to create the needed vector embeddings. These then get stored in the vector engine database of an SAP HANA Cloud instance.

Before creating the embeddings and training the model, it's interesting to see what the model returns on a question about the road show. Let's ask the model about the location for a road show event on April, 19th 2024. The response will most-likely be a hallucination or it can't be answered at all (don't mind about how the original request gets constructed, this will be covered in the next section):

 

 

 

{
  "@odata.context": "$metadata#Edm.String",
  "value": {
    "completion": {
      "content": "I'm sorry, but I cannot provide the current or future locations of individuals unless this information is publicly available and well-known. If you have any other questions or need information on a different topic, feel free to ask!",
      "role": "assistant"
    },
    "additionalContents": []
  }
}

 

 

 

In the CAP application code the LangChain Text Loader and Recursive Character Text Splitter are used to read and chunk the text file:

 

 

 

const { TextLoader } = require('langchain/document_loaders/fs/text')
const { RecursiveCharacterTextSplitter } = require('langchain/text_splitter')

let textChunkEntries = []
const loader = new TextLoader(path.resolve('path/input.txt'))
const document = await loader.load()
const splitter = new RecursiveCharacterTextSplitter({
      chunkSize: 500,
      chunkOverlap: 0,
      addStartIndex: true
})
        
const textChunks = await splitter.splitDocuments(document)

 

 

 

In this scenario the input text file is an excerpt of the SAP CodeJam Roadshow blog post written by Rich Heilman.

SAP CodeJam Roadshow 2024
Developer Advocate Rich Heilman
Yes, someone did say “Roadshow”! Attention SAP Developers!
Thomas Jung and I are hitting the road again this year for another SAP CodeJam Roadshow in Europe.
Last year, we drove about 1500km around Germany and the Netherlands, I don’t think we will be doing that again anytime soon.
This time the roadshow is a bit more spread out geographically, so it will be planes and trains for us this time.
We will have 5 stops on the roadshow from April 12th through April 22nd 2024.
Below is a listing of the SAP CodeJam events on the roadshow schedule:
- ABAP Cloud & RESTful Application Programming Model 04/12/2024 Amsterdam, Netherlands
- ABAP Cloud & RESTful Application Programming Model 04/15/2024 Bucharest, Romania
- ABAP Cloud & RESTful Application Programming Model 04/17/2024 Leverkusen, Germany
- ABAP Cloud & RESTful Application Programming Model 04/19/2024 Paris, France
- ABAP Cloud/RAP & CAP 04/22/2024 Madrid, Spain
I have a feeling these events will fill up fast, so if you want to join us for a day of coding awesomeness, make sure to register to secure your seat.
I’d also like to take a second and thank our hosts for partnering with us on these events.
Thanks to PVH and Partners in Technology in Amsterdam, IBM in Bucharest, Covestro in Leverkusen, VINCI Energies in Paris,
and of course our own Antonio Maradiaga at SAP Espana in Madrid.
See you all somewhere in Europe on the roadshow!

The resulting text chunks will be sent to SAP AI Core via the CAP LLM Plugin. The embedding API retrieves the embeddings based on the text chunks. They will then be written into an object in form of a vector and the object is then inserted into the database:

 

 

 

try {
    const vectorPlugin = await cds.connect.to('cap-llm-plugin')
    // For each text chunk generate the embeddings
    for (const chunk of textChunks) {
       const embedding = await vectorPlugin.getEmbedding(chunk.pageContent)
       const entry = {
           "text_chunk": chunk.pageContent,
           "metadata_column": loader.filePath,
           "embedding": array2VectorBuffer(embedding)
       }
       textChunkEntries.push(entry)
    }
    // Insert the text chunk with embeddings into db
    const insertStatus = await INSERT.into(DocumentChunk).entries(textChunkEntries)
    if (!insertStatus) {
        throw new Error("Insertion of text chunks into db failed!")
    }
        return `Embeddings stored successfully to db.`
    } catch (error) {
      // Handle any errors that occur during the execution
      console.log('Error while generating and storing vector embeddings:', error)
     throw error
}

 

 

 

The database table containing the text chunks will look something like this (entries are shortened for readability of this post):

 

 

 

{
  "@odata.context": "$metadata#DocumentChunk",
  "value": [
    {
      "text_chunk": "SAP CodeJam Roadshow 2024\nDeveloper Advocate Rich Heilman\nYes, someone did say “Roadshow”!",
      "metadata_column": null
    },
    {
      "text_chunk": "We will have 5 stops on the roadshow from April 12th through April 22nd 2024.\nBelow is a listing of the SAP CodeJam events on the roadshow schedule:\n- ABAP Cloud & RESTful Application Programming Model 04/12/2024 Amsterdam, Netherlands\n- ABAP Cloud & RESTful Application Programming Model 04/15/2024 Bucharest, Romania\n- ABAP Cloud & RESTful Application Programming Model 04/17/2024 Leverkusen, Germany\n- ABAP Cloud & RESTful Application Programming Model 04/19/2024 Paris, France",
      "metadata_column": null
    },
    {
      "text_chunk": "- ABAP Cloud/RAP & CAP 04/22/2024 Madrid, Spain",
      "metadata_column": null
    },
    {
      "text_chunk": "I have a feeling these events will fill up fast, ",
      "metadata_column": null
    }
  ]
}

 

 

 

NOTE: A tip is to implement a helper service within your CAP service application that allows you to store and delete embeddings. This will make your life easier when experimenting with the embeddings.

 

 

 

service EmbeddingStorageService {
    entity DocumentChunk as projection on db.DocumentChunk excluding { embedding };

    function storeEmbeddings() returns String;
    function deleteEmbeddings() returns String;
}

 

 

 

Using the CAP LLM Plugin for the RAG response

The embedding vectors are stored and the connection to the LLM model is set. The last piece is to implement the application serving the RAG response to the user's question.

The CAP application simply defines an OData function that can be called via the API of the service:

 

 

 

service RoadshowService {
    function getRagResponse() returns String;
    function executeSimilaritySearch() returns String;
}

 

 

 

The implementation is using the CAP LLM Plugin's APIs to execute the requests:

 

 

 

const cds = require('@sap/cds')
const tableName = 'SAP_ADVOCATES_DEMO_DOCUMENTCHUNK'
const embeddingColumn = 'EMBEDDING'
const contentColumn = 'TEXT_CHUNK'
const userQuery = 'In which city are Thomas Jung and Rich Heilman on April, 19th 2024?'

module.exports = function() {
    this.on('getRagResponse', async () => {
        try {
            const vectorplugin = await cds.connect.to('cap-llm-plugin')
            const ragResponse = await vectorplugin.getRagResponse(
                userQuery,
                tableName,
                embeddingColumn,
                contentColumn
            )
            return ragResponse
        } catch (error) {
            console.log('Error while generating response for user query:', error)
            throw error;
        }
    })

    this.on('executeSimilaritySearch', async () => {
        const vectorplugin = await cds.connect.to('cap-llm-plugin')
        const embeddings = await vectorplugin.getEmbedding(userQuery)
        const similaritySearchResults = await vectorplugin.similaritySearch(
            tableName,
            embeddingColumn,
            contentColumn,
            embeddings,
            'L2DISTANCE',
            3
        )
        return similaritySearchResults
    })
}

 

 

 

The call for the RAG response is straight forward, it expects the user query, the table name for the text chunks, the column where the embeddings are being stored and the content column. The response will contain the LLMs' response to the user's question. After the model knows the context of the road show event, it will most-likely answer with the correct information. Did you notice that I said "most-likely"; If the chunking is not precise enough or your prompt aren't formulated properly, the LLM will still hallucinate.

NOTE: You need to experiment with the text chunking and the phrasing of the user query (prompt).

Executing the RAG response call results in the following response from the LLM:

 

 

 

{
  "@odata.context": "$metadata#Edm.String",
  "value": {
    "completion": {
      "content": "On April 19th, 2024, Thomas Jung and Rich Heilman will be in Paris, France during the SAP CodeJam Roadshow.",
      "role": "assistant"
    },
    "additionalContents": [
      {
        "score": 0.820276916027069,
        "pageContent": "SAP CodeJam Roadshow 2024\nDeveloper Advocate Rich Heilman\nYes, someone did say “Roadshow”!"
      },
      {
        "score": 0.791925715918883,
        "pageContent": "We will have 5 stops on the roadshow from April 12th through April 22nd 2024.\nBelow is a listing of the SAP CodeJam events on the roadshow schedule:\n- ABAP Cloud & RESTful Application Programming Model 04/12/2024 Amsterdam, Netherlands\n- ABAP Cloud & RESTful Application Programming Model 04/15/2024 Bucharest, Romania\n- ABAP Cloud & RESTful Application Programming Model 04/17/2024 Leverkusen, Germany\n- ABAP Cloud & RESTful Application Programming Model 04/19/2024 Paris, France"
      },
      {
        "score": 0.781783031331483,
        "pageContent": "I have a feeling these events will fill up fast, so if you want to join us for a day of coding awesomeness, make sure to register to secure your seat."
      }
    ]
  }
}

 

 

 

As you can see, there's a scoring on the different answers. The higher the score, the more likely the answer is correct. Looking at the first answer, you can see that this is the correct answer to the user's question.

The payload to the LLM looks like the following:

 

 

 

post POST /v2/inference/deployments/d7b8e46fc3d5c25f/embeddings?api-version=2023-05-15
header: { 'Content-Type': 'application/json', 'AI-Resource-Group': 'default' }
[pool] - effective pool configuration: {
  min: 0,
  max: 100,
  testOnBorrow: true,
  fifo: false,
  acquireTimeoutMillis: 10000,
  softIdleTimeoutMillis: 30000,
  idleTimeoutMillis: 30000,
  evictionRunIntervalMillis: 60000,
  numTestsPerEvictionRun: 34
}
payload is {
  messages: [
    {
      role: 'system',
      content: ' undefined ``` SAP CodeJam Roadshow 2024\n' +
        'Developer Advocate Rich Heilman\n' +
        'Yes, someone did say “Roadshow”! Attention SAP Developers! \n' +
        'Thomas Jung and I are hitting the road again this year for another SAP CodeJam Roadshow in Europe.\n' +
        'Last year, we drove about 1500km around Germany and the Netherlands, I don’t think we will be doing that again anytime soon.\n' +
        'This time the roadshow is a bit more spread out geographically, so it will be planes and trains for us this time.,We will have 5 stops on the roadshow from April 12th through April 22nd 2024.\n' +
        'Below is a listing of the SAP CodeJam events on the roadshow schedule:\n' +
        '- ABAP Cloud & RESTful Application Programming Model 04/12/2024 Amsterdam, Netherlands\n' +
        '- ABAP Cloud & RESTful Application Programming Model 04/15/2024 Bucharest, Romania\n' +
        '- ABAP Cloud & RESTful Application Programming Model 04/17/2024 Leverkusen, Germany\n' +
        '- ABAP Cloud & RESTful Application Programming Model 04/19/2024 Paris, France,I have a feeling these events will fill up fast, so if you want to join us for a day of coding awesomeness, make sure to register to secure your seat.\n' +
        'I’d also like to take a second and thank our hosts for partnering with us on these events.\n' +
        'Thanks to PVH and Partners in Technology in Amsterdam, IBM in Bucharest, Covestro in Leverkusen, VINCI Energies in Paris,\n' +
        'and of course our own Antonio Maradiaga at SAP Espana in Madrid.\n' +
        '\n' +
        'See you all somewhere in Europe on the roadshow! ``` '
    },
    {
      role: 'user',
      content: 'In which city are Thomas Jung and Rich Heilman on April, 19th 2024?'
    }
  ]
}

 

 

 

The content that gets send in the messages array within the payload are the embeddings, and down below the user question is being included as well. That will allow the LLM to understand the context and answer the question without hallucination.

An advantage of developing with CAP is the use of cds watch --profile hybrid. Running the watch command, allows for instant feedback on the changes. Not only that, the service has a stable connection to the real SAP HANA Cloud instance on your BTP account if the binding is set.

Final words

I hope this blog post helped you get an understanding on how you can enhance the AI models behind SAP AI Core with embeddings using the SAP HANA Cloud Vector Engine and the CAP LLM Plugin. Visit the CAP LLM Plugin sample repository for more samples. 

Nora von Thenen who is the AI Advocate within my team and myself will have a livestream on YouTube talking and demoing a more extensive version on what I described in this blog post. Make sure to set a notification reminder on the SAP Developers Youtube channel's livestream event.

livestream.png

I'm excited to see what you can create using the brand new capabilities of CAP and SAP Gen AI Hub! If you have any feedback or questions post them in the comments below this blog post.

With that, Keep Coding! 🧑‍💻

Further reading

 
1 Comment