Updates
- 15.03.23 – Added details on the newly available GPT-4 model
- 01.03.23 – Added details on the newly available ChatGPT model (GPT-3.5) and API endpoint
- 27.02.23 – Added a link to part 3.
- 24.02.23 – Added a link to part 2 and also updated the title to reflect, that there will be 3 posts part of the series (not only 2 as initially planned).
ChatGPT has been making waves across the internet lately. In this small series of blog posts, I'll delve into the conceptual and technical details of building a ChatGPT-like chat app using the SAP Cloud Application Programming Model, SAPUI5 and the OpenAI API. While I won't be providing step-by-step instructions or explain everything in full detail, I will highlight key aspects such as: What ChatGPT actually is, what our chat app looks like, and how you can access the Open Source code. I will also take a closer look at the OpenAI API and how our app makes use of it.
In the following posts I will cover more technical topics like the repository setup and the usage as a monorepo with
pnpm
, the CAP backend's data model and service layer, and some best practices we (
@p36) use in our larger CAP projects. I'll also explore the TypeScript-based SAPUI5 frontend and its features like custom controls, the usage of external libraries, etc.
If you want to skip the theoretical part, you can directly head to the
p36 GitHub account and check out the project. The repository includes detailed instructions on how to set things up for local development and to deploy the app to SAP BTP Cloud Foundry.
=> Public GitHub Repository
The final version of our chat app
Our app allows users to create chats and engage in conversations with an AI. The AI will respond to whatever you will ask and guide it to do. And you will most probably be impressed by the amount of knowledge the AI has and its capabilities to interact with you like a human being.
Conversations with the AI will be encapsulated in chats. A chat can be created by providing a subject and an (OpenAI) model, responsible to generate the responses. Users can also select a personality that instructs the AI to respond in a certain way, such as adopting a developer (or even a pirate-like) persona (
harrr, matey 🏴
☠️). The app provides a SAPUI5 based User Interface, where users can write messages and have the whole history of a chat available.
Some impressions
The following two animated images show different conversations with the AI. In contrast to other chatbot tools like SAP Conversational AI, the model behind the chat is already trained on a very large (!) dataset and able to understand and respond to a very wide (!) range of queries.
Let's talk code on SAPUI5
In our app, the AI is able to understand, generate, display and talk about code and the app is also capable of applying code formatting. I won't join the discussion of the generated code correctness and quality in this blog post, but that's something that is widely discussed in the internet and even the SAP Community (e.g.
Are we ready to use AI for ABAP development,
How ChatGPT answer made me lose my time). So, let's leave out the quality aspect here and be amazed about the AI, to translate my instructions into real code (which seems to be correct in that scenario) and cheer about the app applying code formatting (
Shiver me timbers, that be a mighty fine feature 🏴☠️).
Asking to generate code for SAPUI5
Let's talk SAP BTP and Analytics
In the other example I am using the pirate persona to tell me something about the benefits of SAP BTP and concrete products in the analytical space. And even pirates AIs seem to acknowledge that the Business Technology Platform and tools like SAP Analytics Cloud are great products.
A chat about SAP BTP and its analytic capabilities
If you are familiar with ChatGPT, you will see, that our app looks and behaves very similar to its big brother. The answers are not as precise and detailed, but most of the time still very good. And hey, ChatGPT does not provide this awesome pirate personality out of the box. (
harrr 🏴☠️).)
Since our app seems to be almost as capable as the original one, and I am totally not able to build such a fancy AI, we have to dig a little into the magic behind the tools to understand, what is really going on.
What actually is ChatGPT?
ChatGPT in its current state is an app, very much like the one we built, with a Chat-UI and most probably a server component. The reason, why it is breaking wide parts of internet right now (its existence has also already started blogs and discussions in the SAP Community, e.g.
ChatGPT for SAP developments,
ABAP Code Refactoring with ChatGPT,
“Hello, world!” your crafted chat GPT bot!) is, that it is also a disruptive AI technology.
The AI behind ChatGPT is based on a state-of-the-art Deep Learning architecture for Natural Language Processing (NLP) called GPT (Generative Pre-trained Transformer). GPT is an advanced is a large language model trained on vast amounts of text data, enabling it to generate sophisticated, contextually relevant responses to a wide range of inputs. One of the key technical features of the GPT model is its ability to generate text that is highly coherent and contextually appropriate, even in situations where it has not been specifically trained. ChatGPT is based on selectable advanced versions of that model that are highly optimized for language and dialog: GPT-3.5 and GPT-4.
GPT is being provided by a company called OpenAI, which is industry leading in AI technologies. Next to GPT, OpenAI also develops other AI models for different use cases. For example, they also have a model called Codex
, which has its main focus in understanding and generating source code and is powering
GitHub's AI pair programmer Copilot.
I won't go much into the details of the GPT model, as I have no deep knowledge in the related aspects like NLP and Machine Learning-techniques, but the capabilities of GPT-3 and even more GPT-4 to understand, process and respond to inputs is absolutely mind blowing and will most likely have an impact on how we do many things with AI-support in the future.
The other disruptive aspect of those models, next to its state-of-the-art processing is, that its power is available through an API, specifically,
the OpenAI API.
The OpenAI API
Developers can integrate the GPT models into their own applications using the OpenAI API. One of the major benefits of using the OpenAI API is that it abstracts away much of the complexity of working with a large, pre-trained language model. Developers can send input text to the API and receive natural language responses back without needing to worry about the underlying details of the model's architecture or training data.
While the API is easy to connect to on a technical level, working with the OpenAI API requires some familiarity with how GPT models work internally. We will cover some of this later, when we talk about
completions.
The API offers endpoints to achieve various tasks, but we'll be focusing on three in particular since they are used within our application:
/v1/models
/v1/completions
(for GPT-3)
/v1/chat/completions
(for GPT-3.5 and GPT-4)
Models (/v1/models
)
OpenAI provides a set of different models that can be used to analyze the incoming data and create a response. For GPT there are a set specific models available, which are trained differently and some are more advanced than others. The following table shows only the latest models, while there are others available via API.
Model |
Description |
Training Data |
---|
gpt-4
in limited beta |
More capable than any GPT-3.5 model, able to do more complex tasks, and optimized for chat. |
Up to Sep 2021 |
gpt-4-32k
in limited beta |
Same capabilities as the base gpt-4 mode but with 4x the context length. |
Up to Sep 2021 |
gpt-3.5-turbo |
Most capable GPT-3.5 model and optimized for chat at 1/10th the cost of text-davinci-003. |
Up to Sep 2021 |
text-davinci-003 |
GPT-3 model. Can do any language task with better quality, longer output, and consistent instruction-following than other GPT-3 models (curie, babbage, or ada). |
Up to Jun 2021 |
A word about costs
In the description of those models, you can see the mentioning of costs. While the API (of course) has a pricing model for productive usage, you can try out the API for 3 months or $18 worth of API calls for free. After that, you have to sign up for a commercial account.
Costs will be calculated on the metric of
tokens, that refer to both, the incoming input, as well as the generated responses. More on tokens can be read in the
official documentation.
Completions (/v1/completions
) – GPT-3
The OpenAI API supports different ways to interact with its models. The one we are using within our chat is via
completions
. Before I will go into the details on what completions are, let's have a quick look at the API itself.
A simple a call to the completions endpoint would look like this:
POST https://api.openai.com/v1/completions
{
"model": "text-davinci-003",
"prompt": "What do you think of LCNC?",
"max_tokens": 7,
"temperature": 0,
"top_p": 1,
"n": 1,
"stream": false,
"logprobs": null,
"stop": "\n"
}
I won't fully describe all parameters, since some are not relevant for our use case. But here is a list of the important ones, which we are using within our chat application (you can look up the others in the
official API documentation😞
model
: The name of the model to use for generating text.
prompt
: The text prompt to generate text from.
max_tokens
: The maximum number of tokens (words or sub-words) to generate in the response.
temperature
: A value that controls the randomness of the generated text. Higher values result in more random responses, while lower values result in more conservative responses.
top_p
: A value between 0 and 1 that controls the diversity of the generated text. Lower values result in more conservative responses, while higher values result in more diverse responses.
n
: The number of responses to generate.
stop
: A string that, when encountered in the generated text, indicates that the response should end.
Most of the fields are easy to understand based on their description, but others (
prompt
and
stop
) need to be explained in more detail, to understand how
completions
internally work
.
Completion vs. Chat
When we think of the behavior of a chatbot, we might think of a system, that processes individual questions/answers within a conversation and keeps some kind of conversation ID to keep the context together through all requests. But when we look closely at the API definition of the completions endpoint, then there is no such conversation ID and every request will be stateless, not remembering anything that was requested before.
And if you just would send single
prompts
, the completion API would respond like this:
Completions do not work like the above chat-pattern, but in fact are way more powerful and having a chat like functionality is just a subset of its capabilities. There is a specific part in the
official documentation on completions that explains everything in great detail. I will just provide a short summary with a focus on our chat use case:
Completions receive a prompt
and the underlying model will generate one or multiple alternative texts to complete whatever context or pattern is included in that prompt.
For our chat use case this means, we need to
provide the full chat context (the whole conversation) in each request to the completions endpoint. This allows the AI model to understand the conversation's full context and generate a relevant answer to the latest question.
Next to the full context, we also need to follow a specific pattern. This pattern provides both, semantic meaning to the conversation and also a stop sequence for the model to know when to start and stop generating text. The stop words we use (
Human and
AI) also need to be part of the request (as the
stop
parameter) and are used by the AI internally, but will not be included in the response.
The pattern itself then looks like this, making the AI respond to the latest question in the context of the whole conversation:
Human: <First asked question>
AI: <First answer>
Human: <Second asked question>
AI:
Knowing that we have to provide text enriched with context, semantics, and patterns to the completions endpoint, we can enhance our chatbot functionality and provide even more context. For example, in our messenger app, we offer the option to select a pirate or developer personality, which adds specific instructions to the full chat context that instruct the GPT-3 model to respond in a particular manner. This allows us to add information on how the AI partner of the dialogue behaves and potentially many other things as well.
An example request-response-cycle using the pirate personality would then be looking like this:
Completions (/v1/chats/completions
) – GPT-3.5 and GPT-4
On March 1st 2023, OpenAI introduced
a new API endpoint to also access its more advanced GPT-3.5 and GPT-4 models. While the overall concept of a completion is exactly the same, the API is explicitly build for chat conversations and thus offers a more structured endpoint.
Instead of sending the whole text as a
prompt
with
stop
words, the individual messages have to be provided via JSON using a pre-defined vocabulary:
- Instructions (like our personality) can be provided by sending the content with a role called
system
- User provided messages need to be assigned to the
user
role
- The AI will always answer as
assistant
The GPT-3.5 endpoint also supports additional parameters to configure the AI response, but those are omitted in the following example showcasing the new message format:
POST https://api.openai.com/v1/chat/completions
{
"model": "gpt-3.5-turbo",
"messages": [
{ "role": "system", "content": "Assistant should act like a pirate" },
{ "role": "user", "content": "Can you explain the benefits of using SAP BTP?" },
{ "role": "assistant", "content": "Arrr, ye be askin'..." }
{ "role": "user", "content": "Does the platform have LCNC capabilities" }
]
}
Beyond chatting
While chat-like applications are a popular use case for completions, it's important to note that they are just one of many potential applications. Conversations are particularly effective at enriching the context with each message, allowing the AI to generate better answers. However, chat-like applications are generic and heavily dependent on user input and the capabilities of the model.
Beyond the chat-conversation-experience, there are many other use cases where general completions can be applied. In my opinion, these will have a significant impact on the way we work in the future, including:
- AI-supported pattern-based classification of data and information
- AI-supported pattern-based generation of texts and data
- AI-supported pattern-based transformation of texts and data
Although these use cases are outside the scope of our app and blog series, it's important to recognize the disruptive potential of GPT (and similar models). A chat-like application like ours provides just a glimpse of what's possible with these powerful AI technologies.
Closing comment of part 1
In the first part of the small series, we introduced ChatGPT and explored the underlying concepts behind our chat application. In the next post, we'll dive into the technical details of the messenger application, where we'll discuss the internal structure and architecture of both the backend (CAP) and frontend (SAPUI5).
Although the application itself isn't incredibly complex, you might still find interesting things within the technical deep dive. I applied some concepts from our larger projects at p36 into the application, which go beyond the usual simplified tutorial complexity.
Avast ye! We'll be settin' sail for part 2. Harrr! 🏴☠