In this blog post we will explore how SAP HANA Cloud is further increasing its multi-model capabilities by supporting also RDF-based knowledge graphs and SPARQL querying and we will see a real-world business application of this. We will see how to organize and represent enterprise data in a graph structure with HANA Cloud Knowledge Graph Engine, we will discover how this structure facilitates knowledge representation and reasoning and we will learn how to leverage Knowledge Graphs to retrieve structured data and achieve greater accuracy and explainability in RAG scenarios.
This is a summary of the business use case I presented together with my colleague @merza in this webinar: Explicit knowledge representation and reasoning with Knowledge Graphs. Please, note the mentioned session is part of a series of webinars under the topic of “Talk to Your Business with Generative AI”. Check the full calendar here to watch the recordings of past sessions and register for the upcoming ones!
The summary of the session is structured in two parts: this blog post where I introduce RDF-based knowledge graphs and the business use case we used to demonstrate the new capabilities of HANA Cloud, and then a second part with a deep dive into the proof-of-concept implementation we developed on the business use case.
As we always do in our sessions, we make a proof-of-concept, which I will go into more detail about in the implementation in the second blog post. This is not a product, it’s just a prototype that we use to illustrate how to implement the services and concepts we talk about and we hope it might not only inspire you but also be of great help as you have access to the working code through the SAP-Samples repository here.
To understand the content of this blog post, it would be useful if you are already familiar with some SAP products, especially SAP BTP Platform and SAP HANA Cloud, and if you already have some knowledge of generative AI, data science in general and also a basic knowledge of JavaScript and Python programming languages.
Many thanks to our colleagues in the HANA Cloud Product Management Team for their great support in helping us develop this content: @mfath, @ChristophMorgen, @mkemeter, Stefan Uhrig, Tobias Mindnich!
The term knowledge graph itself is not a new, it was coined in the 70s of of the previous century and then in 2012 Google introduced its own knowledge graph. But nowadays the interest in knowledge graphs is growing with the rise of generative AI and consequently SAP is now introducing this technology into HANA HANA Cloud database.
But why is the interest in the knowledge graph growing with the rise of generative AI? We can mention several reasons:
Let’s introduce briefly what a knowledge graph is.
Today’s organizations generate massive volumes of data. Yet, without structure and context, that data rarely matures into actionable knowledge that enables informed decision-making and supports intelligent actions based on semantics and reasoning over raw data. This is where knowledge graphs come in.
A knowledge graph (KG) is a graph-based data model that captures entities (like people, products, places) and their relationships, aiming in making the knowledge understandable by machines. To do so, a formal knowledge representation is needed, where knowledge is broken down into atomic units, aka Facts, each of which is given in the form of triples (subject-predicate-object), as in the following example:
By linking different facts together, they form a knowledge graph:
This type of representation is not limited to concrete instances like the author Arthur Conan Doyle or the character Sherlock Holmes, but it can be extended to include type semantics or classes like saying that Arthur Conan Doyle is a person and Sherlock Holmes is a fictional character (see image below).
At the core of knowledge graphs are semantic web standards that ensure interoperability, extensibility, and formal meaning. These standards support automated reasoning and complex queries in a machine-understandable way.
The Resource Description Framework (RDF) is the core triple-based data model for knowledge graphs. Let’s give an example: based on the previously depicted KG, the resulting facts expressed as triples in the RDF standard would look like as the following:
Note that RDF is a 1-to-1 representation of triples of facts, and each part of the triple is considered a uniquely identified resource referenced by a uniform resource identifier (URI) here abbreviated defining some namespaces.
Now that we know how to express data using RDF, let's explore how we can actually use that data by querying a RDF-based knowledge graph.
SPARQL is the standard query language for RDF data. It works by matching graph patterns across the dataset. Pattern matching means looking for RDF triples that contain given variables at any arbitrary place (Subject, Property, Object). SPARQL variables are bound to RDF terms and all the variables will appear as columns in the result table by the select statement. For example consider the following example query (where we used resources from the public knowledge base DBPedia) and the corresponding result:
SPARQL supports filtering, aggregation, and even complex query logic. It can also reach out to remote endpoints via federated queries, e.g., fetching author birthplaces from DBpedia while querying local product catalogs.
RDF triples allow a simple representation of facts. But list of facts is not enough to make full sense of the data. For this reason we need a formal description of the knowledge as a set of concepts, relationships and constraints. Let’s some of the options available to build a conceptual model for a knowledge graph.
One simple option is RDF Schema (RDFS) that extends RDF by adding basic vocabulary for defining:
A simple example is given below:
What if more expressivity power is needed to express more sophisticated conditions like cardinality, class disjoints, or equivalence? For that, there's the OWL, the Web Ontology Language, which is designed to represent rich and complex knowledge. The Web Ontology Language (OWL) builds on RDFS to support richer axioms and reasoning rules. With OWL, you can express:
This enables reasoning and inference such as in the following example:
Let’s review quickly the applications of RDF-based knowledge graphs.
An example application of that is 360 customer relationship management app because with KG we can link data from various customer touchpoints such as sales, service interactions and marketing campaigns.
So for example, e-commerce platforms can recommend products not just by text match, but products which are contextually related and knowledge graphs connect silent data and surface relevant insights leading to an improved decision support.
GraphRAG is based on the same principle of the well known vector-based RAG retrieval augmented generation (refer to figure below).
Retrieval Augmented Generation (RAG) is a framework where a large language model receives external knowledge before generating the answer.
In a GraphRAG scenario, the generative AI app is not sending the question directly to the large language model, but to a specific component (1) that converts the natural language query to a SPARQL query by using the LLM itself (text2SPARQL). This SPARQL query is executed against a triple store (2) where the knowledge is stored in the form of triples to retrieve the needed information. Then finally this information is sent with the original question as additional context to the LLM (3 and 4) which then generate an accurate answer and sends it back to the app. GraphRAG is a technique suitable in particular to ground a LLM on specific business domains and on thei structured data to enhance the accuracy of the LLM answers.
Let’s now see what SAP HANA Cloud offers in terms of RDF-based knowledge graphs (NB: HANA Cloud is alredy supporting since quite some time another category of knowledge graphs, the labeled property graphs. If you are interested, please have a look here).
HANA Cloud is supporting RDF-based knowledge graphs with the release of the Knowledge Graph Engine in QRC12025. The Knowledge Graph engine is a high performance graph analytics database system designed to store, manage and query RDF data or triples. So it is a triple store.
Let’s review the core capabilities of the Knowledge Graph Engine (KGE):
To embed SPARQL queries inside SQL statements, a public table function SPARQL_TABLE is provided. This function returns an internal table which can be used by the SQL engine before further processing with other SQL objects.
To embed SQL statements inside a SPARQL query, a function SQL_TABLE is provided to federate queries to the corresponding engine. The return results of the SQL statement is converted into a form of graph pattern with SQL projections mapped into variables.
So how can we consume the Knowledge Graph Engine? There are several options:
Now that we have introduced RDF-based knowledge graphs and the HANA Cloud Knowledge Graph Engine, we can see what we can build with this technology.
As mentioned in the beginning, this is a summary of a webinar that is part of a mini series consisting of three appointments. When we conceived this program we envisioned a single business use case that we would have ehnanced it over time with new features.
This is to say that we will make use of the same business use case described here used for the first webinar we delivered in mid March, but this time we will expand the scope with Knowledge Graph technology.
So here we will deal again with a typical customer support team that usually is made of two types of personas with different needs. For example, the technical engineers here represented by John or the support team manager here represented by Mary.
In general, John and the technical engineers take customer incidents and questions, and they try to answer and provide guidance. Mary on the other side is more interested to properly manage the whole team and in ensuring the customer satisfaction and the optimal support processing.
Let's try to recap the use case and the proof of concept we implemented for the first webinar of the program. We imagined the team gets incidents and questions from customers through a CRM system running on SAP HANA Cloud. We imagined that in this CRM system, this team basically collects, stores all the questions, incidents, but also the comments and solutions provided.
We imagined also the involved personas face some issues in their daily working experience. We imagine John struggling with a lot when searching the knowledge base for past cases, and we imagined Mary needs to identify trends to better manage his team and to follow the company priorities.
We addressed these two problems implementing a specific application we named “Smart Advisory Companion” that offers several tools to John and Mary in particular to perform smart searches and some analytics for for Mary. In particular, John can run meaningful searches instead of just exact matches, and Mary can get insights from a cluster analysis view of incidents and queries.
To implement this solution, we basically vectorized the information from the customer questions, inquiries and provided solutions and we stored everything in the HANA Cloud Vector Engine.
All this was already very good, but this tool cannot answer properly some questions. For example, let’s imagine John and Mary need to ask a question like this: “tell me the SAP employees who delivered a service of type ‘BTP technical advisory’ regarding ‘multi-tenancy’”.
To answer such a question, it's not enough to leverage the vectorized knowledge available in the HANA Cloud Vector Engine. We need to leverage in some way also the structured data generated by or related to the customer support team activity. But what are the structure data we are talking about? Let me remind you that this this use case was suggested by our daily working experience as BTP Solution architects.
We deal with partners, but it's not so different from interacting with customers. And we have also a sort of CRM system based on a data model in HANA Cloud. So let’s imagine our data model reported in the figure below is the one used by our fictitious customer support team. We have used our relational data for the implementation of the POC associated to this use case.
Let's have a look closer look to this to this data model. So here you can see we have one main fact table, the service request table, where each new request is tracked and here we collect information about who created the services requested, who is the associated partner and contact person and the associated use case and so on and so forth. Then we have also other tables, for example, the service table where we track all the services requested via a service request and many other dimensions where we collect details about for example, the partners, the countries and regions of the partners or where the services have been delivered, the SAP products covered by the services and the relative industries and LoB’s and so on and so forth.
In the picture below you can see the actual scenario: basically our Smart Advisory Companion needs to be updated to leverage not only the vectorized knowledge base, but also the structured information coming from the relational database.
In this scenario we can recognize some new problems for the involved personas here. For example, we can imagine John needs to access quickly the data about submitted requests without browsing the CRM UI or executing complex SQL queries. And on the other hand we imagined Mary needs to access easily the same data in natural language.
How can we enhance then the Smart Advisor advisory companion capabilities to address these new need? We neet to introduce the Knowledge Graph Engine, our triple store. We need to map our relational data model to a custom knowledge graph that we will store and query by means of the Knowledge Graph Engine. We need also to introduce Generative AI Hub because, if we want to allow Mary to interact in natural language with this structured data, we need also to leverage some large language model.
Now let's have a look at the functionalities we added to the original version of our Smart Advisory Companion. We are talking about 3 different modules (refer to image below).
The first one we named Knowledge Graph Discovery is to explore the custom knowledge graph obtained from the customer support team relational data (see demo below).
Then we have the SPARQL Explorer, another functionality to allow John and all the technical personas to submit SPARQL queries that will be executed against the Knowledge Graph Engine.
Finally, we have the enhanced version of the Advisory Buddy that allows Mary or John to submit queries in natural language, to leverage the interoperability between Knowledge Graph engine and Vector Engine and to consume the new knowledge inferred with Knowledge Graph engine.
Let's now have a look at the solution architecture to achieve the results and to implement all the functionalities seen in the previous demos. Let's start from the architecture we have seen in the original version of the Advisory Buddy poc developed for the webinar we delivered in mid March (see animation below).
So here we can recognise an app composed by a Fiori UI5 front end that relies on Python microservices to consume the SAP and a cloud services both for persistency and for the PAL functions needed for Mary for the cluster analysis view.
How does it change to introduce the capabilities we have seen? We need to introduce the Knowledge Graph Engine here, and we need also to introduce Generative AI Hub because we need to access a large language model to convert the query expressed in natural language into a SPARQL query. Of course the Python micro services here need to be updated to consume the new services and we will see later how they are made.
If you want to know more about the technical implementation of our prototype, jump to the second blog post.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
| User | Count |
|---|---|
| 49 | |
| 46 | |
| 34 | |
| 32 | |
| 30 | |
| 29 | |
| 28 | |
| 24 | |
| 23 | |
| 22 |