Embedding Business Context with the SAP HANA Cloud...

thiago · ‎2024 May 07

Replay: Embedding Business Context with the SAP HANA Cloud Vector Engine

Here we will see the methods to ground & adapt Generative AI to business context though Retrieval Augmented Generation (RAG), extending to external domain knowledge by retrieving & injecting information via embeddings thanks to the SAP HANA Cloud, Vector Engine.

Here you will find the source code of both prototypes: SAP-samples / btp-generative-ai-hub-use-cases

The basics

Ok, let’s start with a quick warm-up with the basics: you know that LLMs are trained on vast amounts of text data from various sources, which allows them to generate responses and perform tasks across a wide range of topics.

However, they don't naturally understand the nuances of specific industries or domains. So, while LLMs can be valuable tools for generating text and performing certain tasks, they may require additional guidance or input from domain experts to ensure accuracy and relevance in business-specific contexts.

The question you might hear from customers worldwide and that I pose to you here is:

How do I make the large language model smarter
by adding details about my business and stop it from hallucinating?

Here are some known key techniques to enhance the output of large language models (LLMs). We have categorized them based on how well they optimize context and the model itself.

Let's start with one of the fundamental strategies: prompt engineering.

Prompt engineering

Prompt engineering involves crafting specific instructions or queries to guide the model towards desired outputs. It's like giving clear directions to get the best results. One advantage is its simplicity, it requires less technical expertise and computational resources, but a downside could be the need for some trial and error to find the most effective prompts.

Model fine-tuning

Then we have model fine-tuning. This is all about training the model to better fit the requirements at hand, to perform a NEW SKILL or NEW TASK that is just hard to articulate in a prompt. It’s like showing the model HOW to do it, not TELLING how to do it. It can require a good understanding of the model's inner workings, it’s time and resource consuming, therefore VERY COSTLY.

Retrieval Augmented Generation

Moving up on the context optimization axis we have the Retrieval Augmented Generation (so called RAG). This technique enhances the model's capabilities by providing more context to the given task. It's like providing the model access to a library of information to help it generate more accurate outputs. One advantage is its ability to generate relevant and domain-specific content, but it HEAVILY DEPENDS on the quality of external corpus used for retrieval.

By using a mix of prompt engineering, model fine-tuning, and RAG, companies can leverage the strengths of each approach to further improve the performance of the LLM.

Let's focus on RAG. The key characteristics of RAG includes:

Reduce Hallucinations: RAG can mitigate the occurrence of hallucinations or incorrect outputs by cross-referencing the generated content with the retrieved information, thereby improving the overall accuracy of the model's responses.

Increased Knowledge: Even if the base LLM has not been trained on certain information, as long as that information exists in the corpus used for retrieval, RAG can still provide relevant answers.

Flexibility: By changing the underlying corpus, RAG can be adapted to different domains or knowledge bases.

Memory Efficiency: Instead of having to increase the size of the language model to store more information, RAG leverages external data sources, keeping the model size manageable.

Let’s see that in action with a quick and very simple example here:

I asked the LLM since what quarter SAP HANA supports native vector datatype for running vector similarity. The answer was wrong, there was no such support in 2020.

Then I provided the context, in this case, the actual release date. And asked the same question. The model took the date and converted to the right quarter, providing me with the correct answer.

So, the concept is clear: if I want the model to be precise, I need to provide the information as the context to the questions I want it to answer.

However, the challenge is: where and how do I store all that information that I want to send to the model, and ALSO how to retrieve the right piece according to the question asked.

Every company store their unstructured information in different formats. So think about that bunch of pdfs, web pages, word documents. We want that to be our source for the RAG technic.

To store them in a format that can be meaningful to the technics we want to apply, we need to convert them to embeddings. Embeddings are a way to represent text in numerical format, typically as vectors.

Let’s use few words as an example - In the first column, you'll notice we've grouped some of them together based on their meanings, and we've color-coded them for clarity. Alongside each word, you'll find its corresponding embedding.

We have simplified them in a 2-dimension embedding representation for better understanding of the concepts we want to demonstrate.

If we were to plot them on a Cartesian graph, here's how they'd appear.

Notice how these words are all clustered together on the graph - It's like they're grouped up in the same area. We can also see their vector representation, which basically shows their 'direction' in the space. Pretty interesting, right?

Alright, so how do we figure out how similar these words are? There are a few methods, but let's focus on two for now.

The first method uses the COSINE of their angle:

And the second one uses the Euclidean distance:

Understanding this is important because it helps us do a lot of useful things like classification, group them together, spot any odd ones out, do semantic searches, and even use it to give recommendations. We can for example group documents, or chunks of documents together to find common topics of interest.

Let's break it down with a practical example. To understand how to use these steps effectively, first, we need to create smaller sections of documents like paragraphs or sentences, also known as chunks, and turn them into embeddings.

Then we save them as vectors, in a vector store.

Once we receive a query from the user, we turn it into a vector and then compare it with all the other vectors in our database. We'll then filter down to just a few, just the most similar ones.

And with those, we enrich the prompt, just like you see here.

Now, let's incorporate the technology into those steps. This diagram is something you'll definitely want to keep handy, like sticking it on your workstation:

In summary: we first generate the Text Chunks, then we convert to embeddings, and save them as vectors in the SAP HANA Cloud, using the Vector Engine. When the user’s query comes, we generate the embeddings and run a similarity search with the vectors we have stored.

Best matches goes together with the query to the LLM that provides the answer to the user.

Now, this is very important to understand: in orange you see the steps we need Generative AI Hub to generate the embeddings and provide the answer.

The model responsible for converting text to embeddings is called Text Embeddings model, in this case here you see ADA-002. And the model responsible for generating the text competition for the final answer is GPT-4. Both accessible, again, through the Generative AI Hub that you have seen in the fist session of this series. These are the models we have chosen and you can always opt for different ones available in the Generative AI Hub – please refer to this link to see the supported models.

Good, now we want to see how to store the embeddings in a vector format and run the similarity search. And this is the role of our SAP HANA Cloud. It will provide us with the right functions and data formats:

SAP HANA Cloud, Vector Engine

The Vector Engine, released last March on the SAP HANA Cloud, provides a new datatype to store high-dimensional vectors and two distance functions to compute vector similarity, and one of the use cases for those features is the Retrieval Augmented Generation.

The new datatype is called REAL_VECTOR and consists of real elements that supports up to 65000 dimensions.

You can get a REAL_VECTOR by using the function TO_REAL_VECTOR by passing either a text, a binary representation or an array.

The distance functions available are the Euclidean distance, represented here by L2DISTANCE which will result in a double number greater or equal to zero. Basically, the closer this number is to zero, the more similar the vectors are.

And the COSINE_SIMILARITY – which results in a double between -1 and 1. The greater the result, the more similar are the vectors.

You can consume them using standard SQL script, Python through hana-ml library or Langchain, and also using CAP.

And we close the introduction looking at the main benefits of Vector Engine: It combines vectors and results of fast vector search with business data to enable specific-domain use cases.

Next up, we will now land this into a practical use case implementation with code, architecture and a cool demo.

Use Case 1: Deduplication of citizen reported issues

We start from the previous session in February where we presented a proof of concept application using Generative AI HUB. The goal of the Citizen Reporting application is to assist the Maintenance Manager, Mary, by extracting insights from citizen social media posts, classifying them using Generative AI, allowing for the creation of maintenance notifications in the SAP S/4 HANA Cloud tenant when the incident is approved.

However a problem presented itself … In this scenario, the maintenance manager, Mary, faces a challenge where citizens frequently report incidents via social media. These reports often pertain to the same issue but originate from different individuals. Mary is tasked with reviewing both duplicate incidents and newly reported incidents in order to determine which ones should be submitted to the backend S/4 HANA maintenance system.

The solution to help Mary in her task, will be to utilize the SAP HANA Vector engine to analyze all incoming social media incidents and identify duplicates or similarities among reported incidents. When duplicates or similarities are identified, a notification will be attached to the incident when viewed in the application, enabling Mary to easily distinguish them from new reports. This approach aims to streamline the incidents requiring Mary's review, enhancing efficiency in the overall process.

John reports an incident by creating a post via social media to the city’s community page using the social media to raise attention about an issue in the community.

The citizen reporting application receives John's post, examining its similarity to other member posts as identified in the SAP HANA Vector Database. If a duplicate incident is detected, the incident will be flagged as “duplicated” as to be easily identified by Mary, the maintenance manager, when she is viewing/managing incidents.

Both similar and dissimilar entries, as compared using the SAP HANA Vector engine, will undergo processing and analysis by the corresponding Large Language Model via SAP Generative AI Hub. This analysis entails extracting key incident details such as issue summary, type, urgency, location, and sentiment.

Subsequently, Mary, as the Maintenance Manager, reviews both new incident details and flagged duplicate incidents where she can then determine whether to approve the incident(s), reject the incident(s) or Link the duplicate or similar incident(s) to an existing S/4 HANA Cloud Maintenance Notification thereby taking action on the reports from citizens but not creating Maintenance notifications for the same reported incidents in the backend system.

Now, let’s have a look at a live demo of the Citizen Reporting Application in action:

As mentioned earlier, in the previous SAP Generative AI session we utilized, SAP Build apps for the client front-end application, the orchestration citizen reporting application on cloud foundry as well as SAP HANA for storage. For this use case we have extended the application to now use the SAP HANA Vector Engine on top of the existing solution for the deduplication by vectorizing the posts.

Let’s look at the designed architecture and how the different services interact between them.

Our Citizen Reporting app is composed by:

A server side application that runs on SAP BTP Cloud Foundry. It can be developed with the language of your choice, in our proof of concept we implemented 2 versions: NodeJS and Python.
A user interface developed with SAP Build Apps, that can be easily deployed in SAP Build Work Zone.
The server side is consuming APIs from different BTP Services: from SAP AI Core service, providing access to different foundations models (remote or hosted) in a trusted and controlled way through the Generative AI Hub capability.
And to securely connect to the SAP AI Core APIs: the Destination service is leveraged.
We also consume the SAP HANA Cloud, Vector Engine through the Cloud Application Programing Model to perform the similarity search needed for the deduplication of issues as well as to store information about the incidents.
To create Maintenance Notifications in SAP S/4HANA Cloud we are consuming SAP S/4HANA Cloud OData APIs also secured through the Destination service.
The SAP Cloud Identity service allow us to share the same Identity Provider among the different components (including SAP S/4HANA Cloud) and to manage authorization and authentication for our application.

Use Case 2: Ask the City

Next up, let's explore into the second use case of how SAP HANA Cloud vector engine could be simply be utilized for citizen's queries!

Citizens like John have the opportunity to submit inquiries to the city through social media platforms like Reddit’s Subreddit channel - Ask SAGenAICity and subsequently receive responses directly on the same platform.

This capability is facilitated by the vectorization of SAGenAICity's knowledge repository, presently accessible through the city's website in formats such as HTML, PDF, and other documents, into the SAP HANA Cloud Vector Database.

Problem: John has questions that he needs to contact SAGenAICity about but he does not know whom or what departments to contact within the city offices.

Solution: In order to streamline the process of addressing John’s questions and guiding him toward appropriate resources, he asks his question(s) through SAGenAICity's community channel on Reddit. This city utilizes the SAP HANA Vector Database to search for similar questions/queries and delivers relevant answers directly to John via Reddit.

Once John’s question is submitted and the citizen reporting application receives the question the system undergoes a process whereby it evaluates whether the question aligns with similar data within the knowledge base of the city, which has been embedded within the SAP HANA Cloud Vector Database.

If similar information exists within the SAP HANA Vector Database, it will be relayed to SAP Generative AI, which will then generate a response tailored to John and the generated response will then be posted back to John via the Reddit social media platform.

By embedding SAGenAICity’s knowledgebase into the SAP HANA Cloud Vector database many questions can be answered automatically without intervention by a SAGenAICity employee thereby freeing up city employees to only have to handle exception questions that may not be part of their knowledgebase. It also enables citizens reporting issues yet another channel to resolve issues.

Now let’s have a look at a demo of the Ask SAGenAICity solution in action:

Here is the solution architecture diagram, similar to the previous one.

The only difference is the python microservice, deployed in the Cloud Foundry runtime, that is actively listening to any new posts in the Reddit page. It is a pretty simple & straightforward solution in this use case.

Please note that the webpages have been chunked using langchain library for the text chunks generation.

Consumption Methods

As we mentioned earlier, there are various consumption methods for SAP HANA Cloud, Vector Engine:

If you are curious to learn more, we explain the source code of both use case implementations in the replay of our session here:

Replay: Embedding Business Context with the SAP HANA Cloud Vector Engine

Here you will find the source code of both prototypes: SAP-samples / btp-generative-ai-hub-use-cases

Do not miss it.
Stay tuned!

Khushbu_bansal · ‎2024 Jun 20

Great content , thank you, the second video posted doesn't have sound?

venkatesha_n

Nice one.

Vitaliy-R

Nicely done!