IBM Granite Model

MarioDeFelipe · ‎2024 May 27

Introduction

Large Language Models (LLMs) are evolving from serving knowledge passively in chatbots to actively interacting with applications and services in JSON, YAML or other structured outputs. But in our world this is old as the world.

What is new is Agentic systems, co-pilots, plugins, function calling and tool use, all them steps towards this direction bringing innovative solutions to old the old of problem of remote function calling. Yes, RFC.

The logical next step in this evolution is toward autonomous LLM-powered microservices, services, and applications.

Yes, LLMs have been around for some years, but Function calling models is a relatively new concept. Function Calling was first "released" by OpenAI in mid-2023 to enable large language models (LLMs) to reliably interact with external tools and APIs. From the model. Even we have a fantastic blog by @Yogananda in SAP to use "OpenAI Function Calling" with SAP BTP. Start there, because I go some steps further.

How function calling works is developers describe custom functions to the models, which could then output JSON containing arguments to intelligently call those functions when needed, based on the user's input. This is called "in context learning", in other words, we tell the model what we want the model to know, we give knowlege, in context.

By doing this, we extend the capabilities of LLMs beyond just text-based reasoning. Function calling enables the models to access real-time information, interact with databases and APIs, extract structured data from text, solve multi-step problems, and more things. It's important for Enterprise, because we have multiple sources of information, multiple tools, that, if lucky, we can access through API. SAP as an example, how many APIs we have in BTP, and how many custom APIs we built on S/4HANA.

And I go one step further than limiting this to OpenAI, recently other companies like Anthropic, Google, and open-source projects have been releasing models with function calling abilities, but who has been trained (if any) on SAP APIs is absolutely fundamental. Other than that, everytime I hear SAP "collaboration" with other vendors on Generative AI models, such as their collaboration with Google Gemini, OpenAI, IBM, Aleph Alpha, Anthropic or Cohere, I wonder, but what is that collaboration about? are they training a model on SAP APIs? Why is this important? I will explain this in this blog.

---

If the goal is to build systems that talk to other systems, and not chatbots, then you want complete control over your prompts. Many libraries pre-write the prompts, making it easy to start but requiring significant work to undo these pre-written elements months into a project. When OpenAI function calling was introduced, the usage of JSON schema led to new ways of prompting and working with language models by using Pydantic for example which is a well used library.

OpenAI has trained these models to call functions, so they inherently know how to do it, unlike open-source models where you have to explain everything in the prompts or in the training.

Function calling, but we do it, not the model

Don't get confused, one could think that function calling in a model means that the model calls functions. And that would be a great feature for a model to have, but the model has no ability to call functions when you use function calling. Function calling is all about controlling the format of the output from the model. When you get the response in a format you expect, then you can reliably call a function. But the user, or the application is calling the function. The model is not.

This expansion of LLM capabilities is, if lucky, extensive with third-party service providers (e.g., Slack, Gmail, Dropbox, etc.), enabling seamless interactions between users and services through personalized LLM workflows. These third-party agents would allow users to communicate with and through them using customized LLM-powered workflows, and integrate a wide range of services.

Cool, but all that you just read, is LLM dependent

You must inform the model about the functions, and this information is model dependent. This is just how you let the model know about the schema the output should stick to. The chat completion message shows the output string is formatted to JSON, so the model must know the schema.

In OpenAI, this is the purpose of the function block.

But every model has its own requirements, if you are curious what are the requirements to "teach" an LLM how to call an API, check this

LLM requirements for API call

Llama 3 model requires this, forgive me for bringing code into the blog, I dont like bringing code into blogs, just to show the complexity around it 🙏🏼

[
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "This function gets the current weather in a given city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The city, e.g., San Francisco"
                    },
                    "format": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The temperature unit to use."
                    }
                },
                "required": ["city"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_clothes",
            "description": "This function provides a suggestion of clothes to wear based on the current weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "temperature": {
                        "type": "string",
                        "description": "The temperature, e.g., 15 C or 59 F"
                    },
                    "condition": {
                        "type": "string",
                        "description": "The weather condition, e.g., 'Cloudy', 'Sunny', 'Rainy'"
                    }
                },
                "required": ["temperature", "condition"]
            }
        }
    }    
]

functions / contains function files, each function needs its JSON file with a specific structure that describes a function and its sample prompts and responses.

LangChain Output Parser will help?

The LangChain output parser can be used to create structured output, and JSON can be the choice if desired.

The two main implementations of the LangChain output parser are:

Get format instructions: A method which returns a string containing instructions for how the output of a language model should be formatted.

Parse: A method which takes in a string (assumed to be the response from a language model) and parses it into some structure.

The problem is JSON is quite prone to errors in its formation, or it might not be formed at all, and in Enterprise, we must control the output, then we use Pydantic.

What is Pydantic doing in all this?

Getting JSON output from Llama 3 , by setting the model format to JSON and defining a JSON schema helps extract information in a structured format. Using a JSON output parser returns a dictionary.

The problem of this approach is that responses are then obtained using Ollama functions and Pydantic to define a class schema. We have control, but then this is not automatic, every schema needs to be identified. This is not enough.

And on top of Pydantic, we use Instructor and Marvin

Instructor elevates Pydantic to create a more intuitive and streamlined experience when working with language models. Instructor is limited to OpenAI Function Calling, but if this is NOT a problem for you, it allows developers to work with structured data. Despite its advantages, it still has complexities around JSON Schema. This is exactly the thing, the implementation of Instructor requires storing in Vector DB, Define Pydantic Model to specify the structure of the extracted data, pass the document or instruction, and Model to Instructor, and finally calling LLM Using Instructor, mostly OpenAI.

Marvin is more mature than Instructor and has more tools, probably is not so focus on its goal, specially if your objective is using OpenAI Function Calling, but its tools are simple and self-documenting, and help with complex challenges like entity extraction, classification, and generating synthetic data.

Let's forget frameworks then, I want models to be good at Function calling

All described above will required a significant amount of development, especially if we have large API datasets and very clearly defined APIs such as the ones we have in an SAP Workflow.

OpenAI without Function Calling

No API ⬆️

With API ⬇️

OpenAI with weather function calling

Could be good to have some chances of rain in Barcelona 🥹

In February 2024, Fireworks AI released FireFunction V1, a model with GPT-4 level function calling accuracy, open-source models like Mistral-7B and Dolphin-2.7 also demonstrated function calling by early 2024.

Will the tiny, small models save the day?

Keep in mind that large language models are leaky abstractions! You'll have to use an LLM with sufficient capacity to generate well-formed JSON, not every model will be useful. Take Gemma-2B as an example, take our API. Generate 1,000 "correct" API function call responses by dint of placing only your API calls in the pre-prompt, then prompting it.

To create a function call for comparing two bills of materials (BOMs) using the SAP S/4HANA On-Premise API, we need to refer to the specific API endpoints and methods provided in the documentation. Based on the provided sources, here is the JSON structure for the function call that can be used for OpenAI function calling in context learning for fine-tuning:

JSON Structure for Function Call API ⬇️

{
  "functionCalls": [
    {
      "question": "How do I compare two bills of materials (BOMs)?",
      "request": {
        "method": "POST",
        "endpoint": "/sap/opu/odata/sap/API_BILL_OF_MATERIAL_SRV/CompareBOMs",
        "headers": {
          "Content-Type": "application/json"
        },
        "body": {
          "BOMComparison": {
            "BOM1": {
              "BOMHeader": {
                "BOM": "BOM_ID_1",
                "Plant": "PLANT_1"
              },
              "BOMItems": [
                {
                  "ItemNumber": "0010",
                  "Material": "MATERIAL_1",
                  "Quantity": "10"
                },
                {
                  "ItemNumber": "0020",
                  "Material": "MATERIAL_2",
                  "Quantity": "20"
                }
              ]
            },
            "BOM2": {
              "BOMHeader": {
                "BOM": "BOM_ID_2",
                "Plant": "PLANT_2"
              },
              "BOMItems": [
                {
                  "ItemNumber": "0010",
                  "Material": "MATERIAL_3",
                  "Quantity": "15"
                },
                {
                  "ItemNumber": "0020",
                  "Material": "MATERIAL_4",
                  "Quantity": "25"
                }
              ]
            }
          }
        }
      },
      "response": {
        "status": 200,
        "body": {
          "ComparisonResult": {
            "Differences": [
              {
                "ItemNumber": "0010",
                "BOM1Material": "MATERIAL_1",
                "BOM2Material": "MATERIAL_3",
                "BOM1Quantity": "10",
                "BOM2Quantity": "15"
              },
              {
                "ItemNumber": "0020",
                "BOM1Material": "MATERIAL_2",
                "BOM2Material": "MATERIAL_4",
                "BOM1Quantity": "20",
                "BOM2Quantity": "25"
              }
            ]
          }
        }
      }
    }
  ]
}

This JSON structure is designed to be used for OpenAI function calling in context learning for fine-tuning, based on the API details provided in the SAP S/4HANA On-Premise documentation, it will not serve other models.

Note that they use "functional tokens" in training - they convert a function to a particular, previously unused tokenization, and refer to it that way. They claim this speeds up inference (I'm sure it does). They don't make any claims as to whether or not it changes their accuracy (I bet that it does). It definitely makes the system more fragile / harder to train for large and very large APIs.

Outcome: highly capable single API function call LLM. They say you could do it with as little as 100 training inputs if you really wanted.

I think this is interesting, but not world-shattering. I could imagine building a nice little service company on it, basically just "send us a git repo and you'll get a helpful function call API for this version of your code which you can hook up to an API endpoint / chatbot".

Limitations are going to be largely around Gemma-2B's skills -- A 2B model isn't super sophisticated. And you can see they specify "<30 tokens" for the prompt. But, I imagine this could be trained quickly enough that it could be part of a release CI process. There are a number of libraries I use that I would like to have access to such a model.

I'd be interested in something that has general knowledge of a large set of packages for a language, and could pull in / finetune / MoE little models for specific repositories I'm coding on. Right now I would rely on either a very large model and hope its knowledge cutoff is right (Claude/GPT-4), or using a lot of a large context window. There might be some Goldilocks version in the middle here which would be helpful in a larger codebase but be faster and more accurate than the cloud monopoly providers.

IBM Granite Model

IBM introduced a new set of models called Granite, which are specific for Code generation and completion usage.

Comparison of Granite-8B-Code (Base/Instruct) with other open source (code) LLMs of similar size on HumanEvalPack

Coding agents for Enterprise AI are not that many;

Meta Llama family
Google Gemma
DBRX (Databricks),
Arctic (Snowflake),
Grok,
Mixtral 8x22B (MistralAI),
Command R+ (Cohere)

What really keeps my eye is, for example, models that have been trained on ABAP programming language, since most of my questions will require an ABAP FM call. Granite indicates it was trained on ABAP. Bravo!

IBM Granite trained on ABAP?

Granite was trained on ABAP, this is good, lets see how;

Screenshot 2024-05-13 at 08.49.08.png

The problem is none of these datasets contain ABAP Language other than one or two Functions.

In many enterprise contexts, code LLM adoption can be further complicated by factors beyond the performance of the models. For instance, even open models are sometimes plagued by a lack of transparency about the data sources and data processing methods that went into model, which can erode trust in models in mission critical and regulated contexts. Furthermore, license terms in today’s open LLMs can encumber and complicate an enterprise’s ability to use a model.

But, this is not Code Generation, this is about Function Calling

Calling Functions and Tools, we must check the Berkeley Function-Calling Leaderboard (BFCL), to evaluate LLM’s ability to call functions and tools. BFCL is a function-calling dataset with 1700 functions across 4 categories: simple, multiple, parallel, and parallel multiple function calls -

Applications and Services should by-default expect their APIs to be chained with each other when used by agents. To support such a scenario, there needs to be a way to express which APIs can be commutative, associative or distributive with a given set of APIs.

Screenshot 2024-05-27 at 16.56.02.png

What we might want to have is a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction

How to achieve this?

This is done in a self-supervised way, requiring nothing more than a handful of demonstrations for each API. We incorporate a range of tools, including a calculator, a Q&A system, a search engine, a translation system, and a calendar

In order for LMs to use this basic paradigm of using tools, current works mainly leverage inference-time prompting and training-time learning methods.

Inference-time prompting leverages the ability of LMs to learn in-context while Learning by training make LM learn from examples that use these tools during training. The use of tools should be learned in a self-supervised way without requiring large amounts of human annotations

For complex queries that require multiple tools to solve, the common approach is to break down the task and tackle each step sequentially by selecting and using tools with intermediate contexts.

Tools are mainly aggregated from existing datasets or public APIs, but these benchmarks are limited in domains. Several works scrape more APIs from online sources such as Public APIs, RESTful APIs or the OpenAI plugin list. Nonetheless, as tools are collected from heterogeneous sources, it is challenging to select the best benchmark or unify all these varied benchmarks.

LLMs can be trained to use tools by combining traditional fine-tuning methods with in-context learning. One approach called ToolkenGPT represents each tool as a token ("toolken") and learns an embedding for it, allowing the LLM to generate tool calls like regular word tokens

Training LLMs to use tools requires careful prompt engineering, key methods for training LLMs on tool use include learning tool embeddings, parameter-efficient fine-tuning, in-context learning from examples, and prompt engineering. Newer approaches aim to combine the strengths of these techniques for flexible and efficient tool integration.

Screenshot 2024-05-18 at 05.12.37.png

If our intention is to build an API database or dataset, we must do it in four stages, is described in four stages: data pre-processing, API database creation, instructions generation, and data validation.

Screenshot 2024-05-27 at 16.48.21.png

1. Data Pre-processing stage involves filtering out files or malfunctioning endpoints, information about the API and its endpoints are extracted.

2. Create API DBs, an API database is created. This database contains instances in JSON format, each holding all relevant information about an endpoint, and the API it belongs to. API call examples should be included in each instance.

3. Instructions Generation by creating high-quality instruction examples and generating instruction candidates. The method involves selecting endpoints, bootstrapping information into templates, and refining the instructions, at this stage, the selected LLM is used to generate instructions based on the examples.

Screenshot 2024-05-17 at 16.50.33.png

Screenshot 2024-05-17 at 16.50.10.png

Screenshot 2024-05-17 at 16.49.56.png

4. Data Validation stage involves verifying if the API calls are valid HTTP request examples, checking if the instructions are of high-quality, and selecting the instruction with the best quality for training. The LLM is used for various tasks in this stage, such as labeling instructions as good or bad, and calculating the likelihood of an LLM to recreate the input text used to generate each instruction.

Its clear that LLMs like GPT-4 require significant computational resources, making them inefficient and , for coding pourposes, there is a need for smaller, task-oriented LLMs that maintain functionality while reducing costs, however, smaller models have a higher risk of errors or "hallucinations" and issues with precise output formatting, which is critical for robust software applications, so the solution is headed for LLM Training and Inference, using a dataset of thousands of widely-used APIs from pourpose built datasets, covering diverse functionalities of the application we need to, with the goal to improve LLMs' ability to select appropriate API functions. This might be done with small models like Codellama7b, Google's Gemma 7B & 2B, and Stable Code 3B.

The technical limitation lies on the 3rd step, if I try to create an OpenAPI spec for a single SAP API, I take for instance the Material Serial Number, service name: API_MATERIALSERIALNUMBER, to create, read and update serial numbers for a material, this API, in its most basic format for OpenAPI spec, will require the actual schemas for the respective entities and actions, in other words, we need to adjust it further than this for a single SAP API;

openapi: 3.0.0
info:
  version: '1.0.0'
  title: 'Material Serial Number API'
  description: 'API to create, read, update, and manage material serial numbers'
servers:
  - url: 'https://api.sap.com/api/OP_API_MATERIALSERIALNUMBER_0001/resource'
paths:
  /MaterialSerialNumber:
    get:
      summary: 'Get entities from MaterialSerialNumber'
      description: 'Read data for all material serial numbers.'
      parameters:
        - in: query
          name: Material
          schema:
            type: string
          description: 'Indicates the unique number that identifies a material.'
          required: true
        - in: query
          name: SerialNumber
          schema:
            type: string
          description: 'Indicates the number of the serialized equipment.'
          required: true
      responses:
        '200':
          description: 'OK'
          content:
            application/json:
              schema:
                # Add schema for MaterialSerialNumber entity
                type: object
    post:
      summary: 'Add new entity to MaterialSerialNumber'
      description: 'Create a material serial number.'
      requestBody:
        content:
          application/json:
            schema:
              # Add schema for MaterialSerialNumber entity
              type: object
      responses:
        '201':
          description: 'Created'
          content:
            application/json:
              schema:
                # Add schema for MaterialSerialNumber entity
                type: object
  /MaterialSerialNumber/SAP__self.CreateMassMaterialSerialNumber:
    post:
      summary: 'Invoke action CreateMassMaterialSerialNumber'
      description: 'Create Mass Material Serial Numbers'
      requestBody:
        content:
          application/json:
            schema:
              # Add schema for CreateMassMaterialSerialNumber action
              type: object
      responses:
        '200':
          description: 'OK'
          content:
            application/json:
              schema:
                # Add schema for MaterialSerialNumber entity
                type: object
  /MaterialSerialNumber/{Material}/{SerialNumber}:
    get:
      summary: 'Get entity from MaterialSerialNumber by key'
      description: 'Read data for a particular material serial number.'
      parameters:
        - in: path
          name: Material
          schema:
            type: string
          required: true
        - in: path
          name: SerialNumber
          schema:
            type: string
          required: true
      responses:
        '200':
          description: 'OK'
          content:
            application/json:
              schema:
                # Add schema for MaterialSerialNumber entity
                type: object
    patch:
      summary: 'Update entity in MaterialSerialNumber'
      description: 'Update StockInformation for an existing material serial number using PATCH request.'
      parameters:
        - in: path
          name: Material
          schema:
            type: string
          required: true
        - in: path
          name: SerialNumber
          schema:
            type: string
          required: true
      requestBody:
        content:
          application/json:
            schema:
              # Add schema for MaterialSerialNumber entity
              type: object
      responses:
        '200':
          description: 'OK'
          content:
            application/json:
              schema:
                # Add schema for MaterialSerialNumber entity
                type: object
  /MaterialSerialNumber/{Material}/{SerialNumber}/SAP__self.ChangeMaintenancePlant:
    post:
      summary: 'Invoke action ChangeMaintenancePlant'
      requestBody:
        content:
          application/json:
            schema:
              # Add schema for ChangeMaintenancePlant action
              type: object
      responses:
        '204':
          description: 'No Content'
  # Add other paths and operations similarly
  /MaterialSerialNumber/{Material}/{SerialNumber}/_Partner:
    get:
      summary: 'Get entities from related _Partner'
      responses:
        '200':
          description: 'OK'
          content:
            application/json:
              schema:
                # Add schema for _Partner entity
                type: object
    post:
      summary: 'Add new entity to related _Partner'
      requestBody:
        content:
          application/json:
            schema:
              # Add schema for _Partner entity
              type: object
      responses:
        '201':
          description: 'Created'
          content:
            application/json:
              schema:
                # Add schema for _Partner entity
                type: object
  /MaterialSerialNumberPartner/{Material}/{SerialNumber}/{Equipment}/{PartnerFunction}/{EquipmentPartnerObjectNmbr}/_EquipmentMaterialSerialNumber:
    get:
      summary: 'Get related _EquipmentMaterialSerialNumber'
      responses:
        '200':
          description: 'OK'
          content:
            application/json:
              schema:
                # Add schema for _EquipmentMaterialSerialNumber entity
                type: object
  # Add other paths and operations similarly
  /$batch:
    post:
      summary: 'Send a group of requests'
      requestBody:
        content:
          application/json:
            schema:
              # Add schema for batch request
              type: object
      responses:
        '200':
          description: 'OK'
          content:
            application/json:
              schema:
                # Add schema for batch response
                type: object

Conclusion

I describe what Function Calling is and what capability brings in the models, and which models have been tested with Function Calling mechanisms. I have been trying to understand if any of the existing models have been trained with SAP Function Calling capabilities, and what it would require to fine tune a small model with SAP APIs, just in case you want to explore it, what it would take.

We discussed several options, Multi Modal, like Octopus, Tool Augmented LLMs, which is the most explored option and has been around for a while, Model Chaining and API Agents which is a very recent methodology as well as Tool Usage Planning which requires further investigation, all of them will require worth a lot of knowledge from the SAP team to make it work correctly, I also particularly started creating an SAP API Dataset and made it public on Hugging Face, which I will describe in a future blog.

Screenshot 2024-05-27 at 16.43.52.png

Its almost June 2024 and we are entering what Gartner calls the "Trough of Disillusionment", although this methodology is not enough for SAP, keep the faith, don't loose trust on Generative AI, we'll get there.

By Category

Related Content

Activity Groups

Industry Groups

Influence and Feedback Groups

Interest Groups

Location Groups

Customer Only Groups

Forums

Related Resources

Products

Learning and Support

About

My Account

My Account

Function Calling LLMs for SAP: Structured Outputs and API calling

Introduction

---

Function calling, but we do it, not the model

Cool, but all that you just read, is LLM dependent

LangChain Output Parser will help?

What is Pydantic doing in all this?

And on top of Pydantic, we use Instructor and Marvin

Let's forget frameworks then, I want models to be good at Function calling

Will the tiny, small models save the day?

IBM Granite Model

But, this is not Code Generation, this is about Function Calling

How to achieve this?

Conclusion