Artificial Intelligence and Machine Learning Blogs
Explore AI and ML blogs. Discover use cases, advancements, and the transformative potential of AI for businesses. Stay informed of trends and applications.
cancel
Showing results for 
Search instead for 
Did you mean: 
YatseaLi
Product and Topic Expert
Product and Topic Expert
2,097

In the blog post about Function Calling LLMs for SAP: Structured Outputs and API calling by @MarioDeFelipe, Mario has done a great job in explaining the why and how of LLM's structured output with Function Calling for SAP API integration.

On 6 Aug 2024, OpenAI introduces Structured Outputs in the API in the new gpt-4o-2024-08-06 with a strict enforcement option for model's JSON output to be compliant with a given JSON schema. Weeks ago, MetaAI releases LLaMa 3.1 with multilingual capabilities, long context(128 K token) and tool usage capabilities(function call) etc. Also noticeably, Back in May 2024, Mistral unveils mistral v0.3 with function call support. In this blog post, we will discuss structured output with JSON mode and Function Calling in LLMs, walk through the function call with these llama 3.1 and mistral v0.3 with sample using Ollama(BYOM) in SAP AI Core, and examine some general use case patterns of LLM's Function Calling in SAP domain.

The evolution of LLM's Structured Output from Unstructured Input

LLM's Structured Output out of Unstructured Input is essential for its integration into the realm of business applications like SAP. Simply, the unstructured data in real life like customer review text, product image, service call audio etc. are very difficult for business applications with formal and explicit rules to process, which are designed to handle structured data in certain schema like customers, product, sales order etc. Now with LLM/LMM's capability of understanding and processing these unstructured data, it is possible to bridge the gap between the unstructured and structure data. More importantly, model outputs adhered to the given JSON Schemas ensure reliable integration of business applications with LLMs.

In practice, there are several approaches to extract structured output from unstructured input with LLMs, take OpenAI's GPT-4 for example:

  • Prompting Alone: Instruct the LLM to output in JSON with in-context learning. For example, we have discussed about Prompt Engineering for Advanced Text Processing on Customer Messages. However, it doesn't guarantee valid JSON response every time. As OpenAI's evals of output with complex JSON schema, gpt-4-0613 scores less than 40%

  • Structured Output (strict=false)

    • Function Calling: Introduced by OpenAI in Aug 2023, it allows you to connect models like gpt-4o to external tools and systems. Also as a easy way to extract structured data in JSON. However, no grantee on valid JSON output.
    • JSON Mode: Released by OpenAI in Nov 2023, it enforces the format of the output, it does not help with validation against a specified schema. However, directly passing in a schema may not generate expected JSON and may require additional careful formatting and prompting.
  • Structured Output (strict=true):
    Announced by OpenAI in Aug 2024, gpt-4o-2024-08-06 with Structured Outputs(strict=true) scores a perfect 100%. Applicable for both Function Calling and JSON Mode.

OpenAI's evals of structured output with complex JSON schemaOpenAI's evals of structured output with complex JSON schema

Okay, that is Structured Output with OpenAI. How about the other vendors?

Open-Source community

  • Open-weight models like llama 3.1 and mistral v0.3 support function calling
  • llama.cpp(open-source llm inference server) use grammars which could provide those guarantees against open weights models, by interacting directly with the next-token logic to ensure that only tokens that matched the required schema were selected. Here is an sample of using grammars to ensure structured JSON output in Customer Message Processing for
    • Summarize customer message into title and a short description
    • Analyze the sentiment of the customer message
    • Extract the entities from the customer message, such as customer, product, order no etc.
  • ollama(open-source llm inference server) supports JSON mode with "format": "json" and function calling in its chat api and OpenAI compatible chat completion api, however, it doesn't guarantee the validity and compliance of its JSON output.

Function calling vs JSON mode

In general, both function call and JSON mode could be used to produce structured output in LLMs. With strict option of structured output introduced recently in OpenAI API, both can ensure the conformity to a desired schema of JSON output. However, the validity and conformity to a specified schema of JSON output may vary in vendors or models. Here are some differences between JSON mode and Function Calling

With JSON mode, you know exactly what to do with the unstructured input and define a desired output JSON schema, handing them over to the LLM for processing and producing the structured output, which enables further integration with your business applications. For instance, we can integrate GPT-4 Chat API with SAP CAP for Advanced Text Processing in Customer Message with JSON mode, such as sentiment analysis, message summary and entities extraction of involved customer, product, transaction for further integration with SAP S/4HANA Cloud.

With Function Calling,  you can have several functions or tools automatically selected by LLMs, while each function has a different purpose and a specified JSON schema for its arguments. This gives extra flexibility in application integration. For example, function calling can be very useful in chatbot development. An intent of conversation can be well represented as a function call in LLM, no additional corpus training required for intent identification in conversation, for LLM can automatically pick the best fit of function call for the input text with its description and extract the structured arguments for further application integration to produce a more contextual and accurate reply in the conversation. Another example is to orchestrate the process automation with function calls, which can identify and route the downstream tasks with different APIs or tools.    

In the rest of the blog post, we'll focus on function calling. Since there are already heap of blog posts or articles about function calling with OpenAI, I will showcase function calling with Open-Source LLMs with Ollama in SAP AI Core, however, the same can be applied to the proprietary models in SAP Generative AI Hub.

Function Calling in Open-Source LLMs

Next, let's move on how to custom function call in the open-source LLMs, namely LLaMa 3.1 and Mistral 0.3. And we'll just focus on the custom function call instead of built-in function call. To make it easy, we'll use Ollama as open-source LLM inference server and the basic and popular sample of function call about get current weather from weather API. Ollama supports function call with its chat api(recommended), OpenAI compatible chat completion api(recommended), and completion api in raw mode.

Let's take the example of answering the question "What is the weather today in Melbourne, Australia?" with real data. The diagram illustrates the flow of between LLM, A Custom Weather Chatbot as Orchestrator and Weather API as APIs Service Providers.

YatseaLi_0-1723010295846.png

     0. User asks a question through chatbot about "What is the weather today in Melbourne, Australia?" or many other ways to ask the same question, like "Is it rainy in Melbourne?", "Is it raining in Melbourne?" etc.

  1. Orchestrator(Chatbot) sends the question to LLaMa 3.1 or Mistral 0.3 with custom functions,  then the LLM identifies the corresponding custom function call as "get_current_weather" and extract the necessary parameters about "location" and "format" to answer the question,
  2. then fetch the real weather data from a 3rd party weather API(we'll use a mock-up API),
  3. and finally instruct the LLM to generate an answer to the original questions with API response as the context. 

Step 1: Custom Function Calling with Ollama chat api

 

# test llama3.1 and mistral v.03's function call with ollama
import requests, json

# for ollama in SAP AI Core, please chat_api_endpoint and headers accordingly
chat_api_endpoint = 'http://localhost:11434/api/chat' 
headers = {'Content-Type': 'application/json'}

question = "What is the weather today in Melbourne, Australia?"
model = 'llama3.1' #'mistral'
json_data = {
  "model": model,
  "messages": [
    {
      "role": "user",
      "content": question
    }
  ],
  "stream": False,
  "format": "json", #enable JSON mode to assure valid json response for function call
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_current_weather",
        "description": "Get the current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The location to get the weather for, e.g. San Francisco, CA"
            },
            "format": {
              "type": "string",
              "description": "The format to return the weather in, e.g. 'celsius' or 'fahrenheit'",
              "enum": ["celsius", "fahrenheit"]
            }
          },
          "required": ["location", "format"]
        }
      }
    }
  ]
}

response = requests.post(url=chat_api_endpoint, headers=headers, json=json_data)
print('Result:', response.text)

 

The response looks like:

 

{
  "model": "llama3.1",
  "created_at": "2024-08-01T06:35:31.535917Z",
  "message": {
    "role": "assistant",
    "content": "",
    "tool_calls": [
      {
        "function": {
          "name": "get_current_weather",
          "arguments": {
            "format": "celsius",
            "location": "Melbourne, Australia"
          }
        }
      }
    ]
  },
  "done_reason": "stop",
  "done": true,
  "total_duration": 8040050292,
  "load_duration": 5870310667,
  "prompt_eval_count": 139,
  "prompt_eval_duration": 711313000,
  "eval_count": 50,
  "eval_duration": 1456128000
}

 

Step 2: Service fulfillment with external tools or APIs

We'll parse the result from custom function call to identify the function as "get_current_weather" and extract its required arguments "location" and "format". Then invoke the 3rd party weather API call with the given location and format.

 

# parse the json response to retrieve the location and format, to be passed to 3rd-party weather API
resp_json = response.json()
func_dict = resp_json['message']['tool_calls'][0]['function']
func = func_dict['name']
args_dict = func_dict['arguments']
location = args_dict['location']
format = args_dict['format']

print('Function:', func)
print('Location:', location)
print('Format:', format)

# service fulfillment by 3rd-party API with given location and format... for example, let's assume the 3rd party API returns a json weather condition like this. we'll instruct the llm to answer the question with this service response
def get_current_weather(location, format):
    # Your actual API call goes here...
    response = { "condition": "Rainy", "temp_h": 15, "temp_l": 7, "temp_unit": "C" }
    return response
service_resp = get_current_weather(location, format)
service_resp_str = json.dumps(service_resp)

 

It return the real data in JSON like:

 

{ "condition": "Rainy", "temp_h": 15, "temp_l": 7, "temp_unit": "C" }

 

Step 3: Generate the final answer to the original question with API response as context

In this final step, we'll instruct LLM to generate the answer to the original question with the API response as context.

 

# answering the original question with the service response as context
user_msg = """
context: {}

Answer the question with context(weather API response in json) above including weather condition as emoji and temperatures range: {}?Be concise.
""".format(service_resp_str,question)

json_data = {
  "model": model,
  "messages": [
    {
      "role": "user",
      "content": user_msg
    }
  ],
  "stream": False
}

response = requests.post(url=chat_api_endpoint, headers=headers, json=json_data)
resp_json = response.json()
print('Final Response JSON:', resp_json)

 

The final answer is generated as: "🌧️ Today in Melbourne, it's rainy with a temperature range of 15°C to 7°C"

 

{'model': 'llama3.1', 'created_at': '2024-08-01T06:53:35.756Z', 'message': {'role': 'assistant', 'content': "️ Today in Melbourne, it's rainy with a temperature range of 15°C to 7°C."}, 'done_reason': 'stop', 'done': True, 'total_duration': 1216658166, 'load_duration': 16187791, 'prompt_eval_count': 75, 'prompt_eval_duration': 432991000, 'eval_count': 25, 'eval_duration': 765913000}

 

Additional steps to make it a reliable conversation

Of course, there are more steps required to handle the exceptions of the conversations, for example,

  • the input question may be related to recipe like "How to cook a Hainanese Chicken Rice?" instead of weather inquiry,
  • or it may miss the required information as "What is the weather?" without location etc.
  • ...

All these extra exceptions the conversation could be handled efficiently with the help of LLMs, for instance,

  • we can instruct the LLM to only handle weather-elated question, reply the other questions with a friendly apology in its system message.
  • For the case of missing information, we can ask the LLM to remind the user about the missing information, and have another function call to retrieve the required information such as get_location.

Potentials of function calling with LLMs in SAP domain

Use Case Pattern#1: Application Integration for processing unstructured data

JSON mode and Function Calling are very useful for integration with business application for processing unstructured input such as customer review, customer message, service call audio etc.

  • In this example of Intelligent Ticketing Solution, we have used gpt-4's JSON mode of LLMs for customer message processing, such as sentiment analysis, summarization, and entity extraction for text-based customer message in customer service integrated SAP Field Service Management Solution.
  • In another example about Social Media Citizen Reporting App, we have leveraged gpt-4's function calling to analyze social media posts about issues in public spaces, and extract the issue description, priority, location etc as JSON output for maintenance management integrated with SAP S/4HANA Cloud

 

Use Case Pattern#2: Chatbot

You may know that SAP has released Joule as digital assistant across all product lines. However, Joule may not be available for external users like customers, contingent worker etc. of a business.

Now that we have seen Function Call with LLMs can be very helpful in chatbot development. Similarly, we can replace the weather condition question with business questions, weather API with APIs to SAP systems, such as SAP S/4HANA Cloud, SAP Sales Cloud,  SAP Service Cloud etc. In this way, we can complement or extend Joule with some custom chatbot scenarios for external users with Function calling of LLMs.

For instance,  in a customer self-service chatbot use case, as a customer, you can help yourself with 
"what is the delivery status of my order 198?",
"what is my account balance?",
"The descale light is solid on my coffee machine with series no xxxxx, what should I do?"
...

The diagram just for illustration of flow.Delivery status may need consideration on items and orther factorsThe diagram just for illustration of flow.Delivery status may need consideration on items and orther factors

Use Case Pattern#3: Orchestrating Process Automation

Another use case pattern could be using Function Call of LLMs to identify user question, email, support ticket etc with its target tool (process or automation) and extract its required information as structured output,  and route it to different downstream automation process or human intervention.

Let's have a look at an example, we have an unattended bot in SAP Build Process Automation monitoring the email account of customer service as a customer service digital orchestrator,  

  1. Identify and route an incoming email to different processes with function calls to LLMs, for instance
    1. An inquiry of customer account balance identified and routed to API call to SAP S/4HANA Cloud or SAP Graph. 
    2. A RFQ with a pdf attachment identified and routed to RFQ process, invoking Document Information Extraction to extract RFQ in JSON output, and create a RFQ in SAP S/4HANA Cloud
    3. A product trouble shooting issue identified and routed to customer support process, looking for an resolution with RAG Query API in knowledge base.
    4. ... 
  2. For the case of a simple service fulfillment, it invokes the API call to API Service Providers
  3. (Optional)For more complex automated downstream task, it can route to another process or automation in SAP Build Process Automation, or an integration flow in SAP Integration Suite
    For workflow requires human intervention, it will route to the human agent.
  4. Generation the reply with the service response or task result as context

YatseaLi_0-1723517494948.png

Use Case Pattern#3: Autonomous Agent

Agent Framework like autogen from Microsoft, MetaGPT or crewAI etc gain their popularity, which aims to solve complex tasks with role-based multi agents autonomously and collaboratively. As illustrated in last diagram above, it is possible to route a downstream task to multiple agents with their role and responsibility clearly defined, let the agents work together towards the final goal. However, it is still in very early stage to use autonomous agents in real business due to limited capability of planning and reasoning in current LLMs, safety or trust issues of autonomous decision, and the complexity of business decision itself etc.

Additional thoughts on the practical use of Function Calling

In a complex use cases of hundreds of function call involved, it is impossible to send all the function calls as a tool list option to LLMs, then it makes sense to shortlist or filter the function calls into a few as a pre-process step before relay the function calling request to LLMs, there are several ideas as thoughts of food for you to explore:

  • Categorize the functions call by business function or business process etc. or even organize the function calls with a hierarchy. In the use case like customer service chatbot, it can ask the end user to choose a category to narrow down the choices of function calls at the beginning of the conversation. e.g.
    "What can I help you today?
    1.Product Inquiry
    2.Check the delivery status my order
    3.Log a compliant
    ..."

  • Embedding with the name, description, category etc of function calling with calling to an embedding model like OpenAI's text-embedding-3-large, and persist the function calls into a vector database like SAP HANA Cloud for Retrieval Augmented Generation (RAG) to find out the most suitable function calling from a long list. Additionally, you may have a look at Microsoft's GraphRAG,  a structured, hierarchical approach to Retrieval Augmented Generation (RAG).

  • Implement a retry mechanism in your application to increase the possibility of correct JSON output by LLMs, if the selected LLM doesn't have a full guarantee of JSON output coherent with the target JSON schema. For instance, if a LLM only returns correct JSON at 80% of time, and invalid or unmatched JSON output for 20% of the cases, the application has retried the function calling process upon invalid or unmatched JSON output for 5 times, the possibility of failure is exponentially shrunk from 20% to 0.2^5 = 0.00032 = 0.032%. In other words, it is 99.97% of the cases with correct JSON output.

References and Additional Materials

Summary

As we have seen from the samples, JSON mode and Function Calling are very helpful in turning unstructured inputs(like customer review, service ticket, chat, email, service call audio etc.) into structured output like JSON output compliant with a supplied schema,  integrating API calls in business applications or 3rd-party system, generating answers with real data from API calls in Chatbot, or invoke a downstream task with extracted structured output for automation etc. Structured Output with JSON mode or function calling are available in most of popular LLMs like latest gp4-o from OpenAI etc, Cloude from Anthropic and Gemini from Google etc.  With Ollama(BYOM) in SAP AI Core, now we can also leverage the function calling with open-source LLMs like LLaMa 3.1 and Mistral 0.3 etc.