Artificial Intelligence and Machine Learning Blogs
Explore AI and ML blogs. Discover use cases, advancements, and the transformative potential of AI for businesses. Stay informed of trends and applications.
cancel
Showing results for 
Search instead for 
Did you mean: 
yxlee12345
Product and Topic Expert
Product and Topic Expert
5,709

Introduction

In the rapidly evolving world of AI, building applications that leverage the power of large language models (LLMs) has become increasingly essential. LangChain is an innovative framework that simplifies the development of these applications by providing robust tools and integrations for creating context-aware systems. It connects LLMs to various sources of context, enabling more accurate and meaningful responses to user queries. HANA Vector DB complements this by offering efficient storage and retrieval of vector embeddings, which are crucial for enhancing the capabilities of language models. Together with the Generative AI Hub SDK, these technologies enable the creation of sophisticated Retrieval Augmented Generation (RAG) applications, which integrate external knowledge to improve the quality and relevance of generated content.

There are several blog posts out there about HANA Vector DB, including this post by my colleague @YangYue01. Another insightful post by @MartinKolb demonstrates how to use LangChain with HANA Vector DB and Generative AI Hub SDK to develop a Retrieval-Augmented Generation (RAG) application. In this post, I'll walk you through building a Python RAG application using LangChain, HANA Vector DB, and Generative AI Hub SDK. We'll focus on the essential steps, rather than delving into details like prompt engineering and model parameters. Additionally, we'll use the LangChain Expression Language, a new syntax that simplifies the code needed to build our RAG chain.

Building Our RAG Application

Setting up Connections

To start off, we'll need to set up connections to a HANA Database as well as Generative AI Hub. Connection to the HANA Database can be configured using the hdbcli package:

 

from hdbcli import dbapi

hana_conn = dbapi.connect(
		address="some-host-address",
		port="443",
		user="some-username",
		password="some-password",
		autocommit=True,
		sslTrustStore="some-certificate"
)

 

Similarly, connection to the Generative AI Hub can be configured using the generative-ai-hub-sdk package:

 

from gen_ai_hub.proxy.core.proxy_clients import get_proxy_client

os.environ["AICORE_AUTH_URL"] = "some-auth-url"
os.environ["AICORE_CLIENT_ID"] = "some-clientid"
os.environ["AICORE_CLIENT_SECRET"] = "some-clientsecret"
os.environ["AICORE_RESOURCE_GROUP"] = "some-resource-group"
os.environ["AICORE_BASE_URL"] = "some-ai-core-base-url"

proxy_client = get_proxy_client("gen-ai-hub")

 

With the Generative AI Hub proxy client set up, we could run the following code to get a list of deployments that are available:

 

proxy_client.deployments

 

If the connection is successful, we should see something like this in the response:

 

[Deployment(url='https://api.ai.prod.eu-central-1.aws.ml.hana.ondemand.com/v2/inference/deployments/d1e30862f24f01ec', config_id='cb08ab6d-94d9-4534-a60e-922ec1be66ff', config_name='gemini-1.0-pro-config-1', deployment_id='d1e30862f24f01ec', model_name='gemini-1.0-pro', created_at=datetime.datetime(2024, 4, 27, 5, 34, 58, tzinfo=datetime.timezone.utc), additonal_parameters={'executable_id': 'gcp-vertexai', 'model_version': '001'}, custom_prediction_suffix=None),
 Deployment(url='https://api.ai.prod.eu-central-1.aws.ml.hana.ondemand.com/v2/inference/deployments/d50f02e66f040e9f', config_id='2f34dd34-eb58-482d-a1c2-d1450011ac88', config_name='chat-bison-config-1', deployment_id='d50f02e66f040e9f', model_name='chat-bison', created_at=datetime.datetime(2024, 4, 27, 5, 34, 56, tzinfo=datetime.timezone.utc), additonal_parameters={'executable_id': 'gcp-vertexai', 'model_version': '002'}, custom_prediction_suffix=None),
...

 

Defining our Embeddings and Vectorstore Objects

After setting up our connections, we want to define our embeddings and vectorstore objects. The embeddings object is in charge of converting input texts into embedding vectors, while the vectorstore object handles CRUD operations to HANA Vector DB.

 

from gen_ai_hub.proxy.langchain.init_models import init_embedding_model
from langchain_community.vectorstores import HanaDB

embeddings = init_embedding_model("text-embedding-ada-002", proxy_client=proxy_client)
hana_vectordb = HanaDB(embedding=embeddings, connection=hana_conn, table_name="RAG_EXAMPLE_VECTORSTORE")

 

Some points to note:

  • The embedding model name text-embedding-ada-002 was obtained from the output of proxy_client.deployments shown previously.
  • The second line of code that defines the HanaDB object would also result in the creation of a new table in the HANA database if the table specified (i.e. RAG_EXAMPLE_VECTORSTORE) doesn't already exist.

Writing Vectors to HANA Vector DB

Now that we have our embeddings and vectorstore object defined, there are a few things that we could do here. One would be to write embedding vectors to the vector table in HANA. This is usually done offline in a separate process (e.g. in an ETL pipeline), so it technically isn't something that happens during inference time (when RAG is run), but I thought I would show a slightly different approach from what many other sources show when it comes to building up a vectorstore.

 

import pandas as pd
from langchain_community.document_loaders import DataFrameLoader

input_df = pd.read_csv("data/rag_example_inputs.csv")
loader = DataFrameLoader(data_frame=input_df, page_content_column="text")
documents_to_index = loader.load()
hana_vectordb.add_documents(documents_to_index)

 

In this code snippet, we read in data from a simple CSV file and used the text content within to generate the corresponding vectors, which are then written to our HANA Vector DB. Of course, depending on the use case, we could always use different ways to load our data, and LangChain contains various tools for this purpose.

The main point here is that we're doing the writing of vectors in a two-step approach (creating the HanaDB object first by the typical class instantiation, then using the add_documents method), rather than the hana_vectordb = HanaDB.from_documents() approach that is shown in many other resources. While the .from_documents() method is simpler which makes it suitable for PoCs, using this method for a vector table that already has vectors written to it could result in duplicated or unnecessary vectors being written due to the lack of separation between instance creation and writing of vectors. As such, I would advise using the two-step approach instead when it comes to writing of vectors in a productive use case setting.

Building our RAG Chain

Here we'll be building the chain with the components we need using the LangChain Expression Language. For a chain to do RAG, we'll need:

  • A retriever component, which fetches context from HANA Vector DB that is relevant to the inputted query
  • A prompt component, which contains the prompt structure that we need for text generation
  • An LLM (Large Language Model) client component, which basically sends inference requests to an LLM
  • (Optionally) An output parser component, which reformats the output of the text generation to fit our use case

 

from gen_ai_hub.proxy.langchain.init_models import init_llm
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableParallel, RunnablePassthrough

hana_vector_retriever = hana_vectordb.as_retriever()

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
output_parser = StrOutputParser()

setup_and_retrieval = RunnableParallel(
    {
        "context": hana_vector_retriever,
        "question": RunnablePassthrough(),
    }
)

llm = init_llm("gpt-4", proxy_client=proxy_client)

rag_chain = setup_and_retrieval | prompt | llm | output_parser

 

Some points to note:

  • Additional parameters can be passed to the method call hana_vectordb.as_retriever() to configure the way retrieval is done. For example, we could configure the retriever to retrieve at most 4 documents by setting .as_retriever(search_kwargs={'k': 4}) . More details can be found in LangChain's documentation for the .as_retriever() method
  • The RunnableParallel and RunnablePassthrough classes are used here, as the question input is required by both the retriever and prompt components.
  • The LLM model name gpt-4 was obtained from the output of proxy_client.deployments shown previously.

Running RAG with our Chain

Now that our chain is defined, we can now send queries to perform RAG:

 

questions = [
    "What is Python typically used for?",
    "What is JavaScript typically used for?",
    "Which programming languages are dynamically typed?",
]

for question in questions:
    print(f"Question: {question}")
    print(f"RAG Answer: {rag_chain.invoke(question)}")
    print()

 

We should expect to get something like this for the response:

 

Question: What is Python typically used for?
RAG Answer: The text does not provide specific information on what Python is typically used for.

Question: What is JavaScript typically used for?
RAG Answer: JavaScript is typically used as a core technology of the Web, alongside HTML and CSS. It is used on the client side for webpage behavior in 99% of websites. JavaScript also has application programming interfaces (APIs) for working with text, dates, regular expressions, standard data structures, and the Document Object Model (DOM).

Question: Which programming languages are dynamically typed?
RAG Answer: The programming languages that are dynamically typed are Ruby and Python.

 

Conclusion

We've just built a straightforward Retrieval-Augmented Generation (RAG) application. While I've simplified some details for clarity, I encourage you to delve deeper on your own. To help with that, I've included some references that were useful to me, which I hope you'll find helpful as well. For more detailed examples, check out my GitHub repository, where I demonstrate how to implement RAG in both a Jupyter Notebook and a FastAPI application. I'd love to hear if this post has helped you, so feel free to share your thoughts in the comments. Happy coding! 😃

References

Top kudoed authors