In the rapidly evolving world of AI, building applications that leverage the power of large language models (LLMs) has become increasingly essential. LangChain is an innovative framework that simplifies the development of these applications by providing robust tools and integrations for creating context-aware systems. It connects LLMs to various sources of context, enabling more accurate and meaningful responses to user queries. HANA Vector DB complements this by offering efficient storage and retrieval of vector embeddings, which are crucial for enhancing the capabilities of language models. Together with the Generative AI Hub SDK, these technologies enable the creation of sophisticated Retrieval Augmented Generation (RAG) applications, which integrate external knowledge to improve the quality and relevance of generated content.
There are several blog posts out there about HANA Vector DB, including this post by my colleague @YangYue01. Another insightful post by @MartinKolb demonstrates how to use LangChain with HANA Vector DB and Generative AI Hub SDK to develop a Retrieval-Augmented Generation (RAG) application. In this post, I'll walk you through building a Python RAG application using LangChain, HANA Vector DB, and Generative AI Hub SDK. We'll focus on the essential steps, rather than delving into details like prompt engineering and model parameters. Additionally, we'll use the LangChain Expression Language, a new syntax that simplifies the code needed to build our RAG chain.
To start off, we'll need to set up connections to a HANA Database as well as Generative AI Hub. Connection to the HANA Database can be configured using the hdbcli package:
from hdbcli import dbapi
hana_conn = dbapi.connect(
address="some-host-address",
port="443",
user="some-username",
password="some-password",
autocommit=True,
sslTrustStore="some-certificate"
)
Similarly, connection to the Generative AI Hub can be configured using the generative-ai-hub-sdk package:
from gen_ai_hub.proxy.core.proxy_clients import get_proxy_client
os.environ["AICORE_AUTH_URL"] = "some-auth-url"
os.environ["AICORE_CLIENT_ID"] = "some-clientid"
os.environ["AICORE_CLIENT_SECRET"] = "some-clientsecret"
os.environ["AICORE_RESOURCE_GROUP"] = "some-resource-group"
os.environ["AICORE_BASE_URL"] = "some-ai-core-base-url"
proxy_client = get_proxy_client("gen-ai-hub")
With the Generative AI Hub proxy client set up, we could run the following code to get a list of deployments that are available:
proxy_client.deployments
If the connection is successful, we should see something like this in the response:
[Deployment(url='https://api.ai.prod.eu-central-1.aws.ml.hana.ondemand.com/v2/inference/deployments/d1e30862f24f01ec', config_id='cb08ab6d-94d9-4534-a60e-922ec1be66ff', config_name='gemini-1.0-pro-config-1', deployment_id='d1e30862f24f01ec', model_name='gemini-1.0-pro', created_at=datetime.datetime(2024, 4, 27, 5, 34, 58, tzinfo=datetime.timezone.utc), additonal_parameters={'executable_id': 'gcp-vertexai', 'model_version': '001'}, custom_prediction_suffix=None),
Deployment(url='https://api.ai.prod.eu-central-1.aws.ml.hana.ondemand.com/v2/inference/deployments/d50f02e66f040e9f', config_id='2f34dd34-eb58-482d-a1c2-d1450011ac88', config_name='chat-bison-config-1', deployment_id='d50f02e66f040e9f', model_name='chat-bison', created_at=datetime.datetime(2024, 4, 27, 5, 34, 56, tzinfo=datetime.timezone.utc), additonal_parameters={'executable_id': 'gcp-vertexai', 'model_version': '002'}, custom_prediction_suffix=None),
...
After setting up our connections, we want to define our embeddings and vectorstore objects. The embeddings object is in charge of converting input texts into embedding vectors, while the vectorstore object handles CRUD operations to HANA Vector DB.
from gen_ai_hub.proxy.langchain.init_models import init_embedding_model
from langchain_community.vectorstores import HanaDB
embeddings = init_embedding_model("text-embedding-ada-002", proxy_client=proxy_client)
hana_vectordb = HanaDB(embedding=embeddings, connection=hana_conn, table_name="RAG_EXAMPLE_VECTORSTORE")
Some points to note:
Now that we have our embeddings and vectorstore object defined, there are a few things that we could do here. One would be to write embedding vectors to the vector table in HANA. This is usually done offline in a separate process (e.g. in an ETL pipeline), so it technically isn't something that happens during inference time (when RAG is run), but I thought I would show a slightly different approach from what many other sources show when it comes to building up a vectorstore.
import pandas as pd
from langchain_community.document_loaders import DataFrameLoader
input_df = pd.read_csv("data/rag_example_inputs.csv")
loader = DataFrameLoader(data_frame=input_df, page_content_column="text")
documents_to_index = loader.load()
hana_vectordb.add_documents(documents_to_index)
In this code snippet, we read in data from a simple CSV file and used the text content within to generate the corresponding vectors, which are then written to our HANA Vector DB. Of course, depending on the use case, we could always use different ways to load our data, and LangChain contains various tools for this purpose.
The main point here is that we're doing the writing of vectors in a two-step approach (creating the HanaDB object first by the typical class instantiation, then using the add_documents method), rather than the hana_vectordb = HanaDB.from_documents() approach that is shown in many other resources. While the .from_documents() method is simpler which makes it suitable for PoCs, using this method for a vector table that already has vectors written to it could result in duplicated or unnecessary vectors being written due to the lack of separation between instance creation and writing of vectors. As such, I would advise using the two-step approach instead when it comes to writing of vectors in a productive use case setting.
Here we'll be building the chain with the components we need using the LangChain Expression Language. For a chain to do RAG, we'll need:
from gen_ai_hub.proxy.langchain.init_models import init_llm
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
hana_vector_retriever = hana_vectordb.as_retriever()
template = """Answer the question based only on the following context:
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
output_parser = StrOutputParser()
setup_and_retrieval = RunnableParallel(
{
"context": hana_vector_retriever,
"question": RunnablePassthrough(),
}
)
llm = init_llm("gpt-4", proxy_client=proxy_client)
rag_chain = setup_and_retrieval | prompt | llm | output_parser
Some points to note:
Now that our chain is defined, we can now send queries to perform RAG:
questions = [
"What is Python typically used for?",
"What is JavaScript typically used for?",
"Which programming languages are dynamically typed?",
]
for question in questions:
print(f"Question: {question}")
print(f"RAG Answer: {rag_chain.invoke(question)}")
print()
We should expect to get something like this for the response:
Question: What is Python typically used for?
RAG Answer: The text does not provide specific information on what Python is typically used for.
Question: What is JavaScript typically used for?
RAG Answer: JavaScript is typically used as a core technology of the Web, alongside HTML and CSS. It is used on the client side for webpage behavior in 99% of websites. JavaScript also has application programming interfaces (APIs) for working with text, dates, regular expressions, standard data structures, and the Document Object Model (DOM).
Question: Which programming languages are dynamically typed?
RAG Answer: The programming languages that are dynamically typed are Ruby and Python.
We've just built a straightforward Retrieval-Augmented Generation (RAG) application. While I've simplified some details for clarity, I encourage you to delve deeper on your own. To help with that, I've included some references that were useful to me, which I hope you'll find helpful as well. For more detailed examples, check out my GitHub repository, where I demonstrate how to implement RAG in both a Jupyter Notebook and a FastAPI application. I'd love to hear if this post has helped you, so feel free to share your thoughts in the comments. Happy coding! 😃
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.