Technology Blog Posts by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
HarukiSuzawa
Advisor
Advisor
469

image (4).jpeg

Introduction 

The implementation of Retrieval-Augmented Generation (RAG) is becoming more popular as a technique to provide accurate and reliable responses from Large Language Models (LLMs). This time, we examine PandasAI as a method to enable LLMs to handle more advanced tasks such as data analysis. PandasAI allows data search and visualization by generating Python code for data analysis in response to natural language queries. In this article, we explore how PandasAI can be integrated with Generative AI Hub, enabling developers to quickly create a chat application to interact with their business data using only Python.

Prerequisites

When using Generative AI Hub for the first time to access Language Models (LLMs), refer to the following document for a quick start:

https://pypi.org/project/generative-ai-hub-sdk/

 

from dotenv import load_dotenv
from gen_ai_hub.proxy.langchain.openai import ChatOpenAI
from gen_ai_hub.proxy.core.proxy_clients import get_proxy_client

load_dotenv(override=True)
proxy_client = get_proxy_client('gen-ai-hub')
model_name = 'gpt-4o'
llm = ChatOpenAI(proxy_model_name=model_name, proxy_client=proxy_client)

 

In the following document, you can find the models available in Generative AI Hub. Please note that you must deploy the model you want to use, such as GPT-4o, as described in the document.

https://help.sap.com/docs/sap-ai-core/sap-ai-core-service-guide/create-deployment-for-generative-ai-...

More details on how to deploy them are covered in steps 2 to 4 of the following tutorial:

https://developers.sap.com/tutorials/ai-core-generative-ai.html

Smart Excel Chat

Preparation (Back-end)

PandasAI class manages Excel file uploads and facilitates chat interactions based on their contents. Upon initialization, it creates two empty lists: files for storing DataFrames and filenames for the corresponding file names. The upload_excel method takes a list of file paths, reads each Excel file into a DataFrame, and appends them to files while also storing the file names in filenames. It returns a string of the uploaded file names for display in the interface.

 

import os
import pandas as pd
from typing import List, Unio

class PandasAI:
    def __init__(self):
        self.files = []
        self.filenames = []

    def upload_excel(self, file: List[str]) -> str:
        try:
            for filepath in file:
                df = pd.read_excel(filepath)
                self.files.append(df)
                self.filenames.append(os.path.basename(filepath))
            return "\\n".join(self.filenames)
        
        except Exception as e:
            return f"An error occurred: {str(e)}"

 

The chat_with_excel method allows users to interact with uploaded data through messages while maintaining a chat history. The Agent class from PandasAI facilitates interaction with dataframes for multi-turn conversations. This method is designed to handle both image and text outputs. Since the Agent outputs dataframes and numbers (not just strings), they are cast to strings in lines 29 and 31.

 

from pandasai import Agent

class PandasAI:
    def __init__(self):
        self.files = []
        self.filenames = []

    def upload_excel(self, file: List[str]) -> str:
        # Code omitted for brevity

    def chat_with_excel(self, message: str, chat_history: List) -> List[Union[str, List]]:
        agent = Agent(self.files, config={"llm": llm})
        prompt = f'''            
            User Message:
            {message}
            
            Chat History:
            {str(chat_history)}
        '''
        bot_message = agent.chat(prompt)

        if isinstance(bot_message, str) and os.path.exists(bot_message):                    
            img = gr.Image(bot_message)
            chat_history.append((message, None))
            chat_history.append((None, img))
            return ["", chat_history]

        elif isinstance(bot_message, pd.DataFrame):
            bot_message = bot_message.to_string(index=False)
        else:
            bot_message = str(bot_message)

        chat_history.append((message, bot_message))
        return ["", chat_history]

 

Preparation (Front-end)

We prepared the UI using Gradio, a Python library that allows developers to quickly create ML-based applications. 

 

import gradio as gr

with gr.Blocks() as excel_chatbot:
    gr.Markdown("## Smart Excel Chat")
    
    with gr.Row():
        with gr.Column():
            excel_file = gr.File(label="Upload Excel Sheet", file_count="multiple", file_types=[".xlsx"], type="filepath")
            upload_button = gr.Button("Upload")
            upload_status = gr.Textbox(label="Uploaded Files", interactive=False)
        with gr.Column():
            chatbot = gr.Chatbot()
            msg = gr.Textbox()
            clear = gr.ClearButton([msg, chatbot])
    
    pandasai = PandasAI()
    
    def reset_file():
        return None

    upload_button.click(pandasai.upload_excel, inputs=excel_file, outputs=upload_status)
    upload_button.click(reset_file, outputs=excel_file)

    msg.submit(
        pandasai.chat_with_excel,
        inputs=[msg, chatbot],
        outputs=[msg, chatbot]
    )

excel_chatbot.launch()

 

It includes components for uploading Excel sheets, a chat view, and a text input box.

Screenshot 2024-09-29 at 15.27.40.png

Performance Verification

I will demonstrate our prototype using the following sample data related to HR. We prepared two simple Excel sheets for the demonstration, even though it worked well with larger datasets as well.

files.png

The app can correctly answer user queries by considering the chat history.

Screenshot 2024-09-30 at 11.25.52.png

It can plot bar chart,

Screenshot 2024-09-19 at 11.07.35.png

and pie chart as well.

Screenshot 2024-09-19 at 11.07.46.png

Limitations

When we ask a query about something the LLMs shouldn't know, they hallucinate.

Screenshot 2024-09-19 at 11.07.28.png

However, it recognizes when it lacks the necessary information. By restructuring the prompt to ensure that the LLM has access to the relevant information and adjusting actions based on the output, we assume that this issue can be addressed.

Screenshot 2024-09-30 at 13.09.15.png

Conclusion

Integrating PandasAI with Generative AI Hub enhances data analysis by enabling seamless interaction with business data through natural language queries. This solution simplifies tasks like data search and visualization, making them more accessible and intuitive. It demonstrates the powerful potential of RAG to improve productivity and streamline workflows in business applications.