Generative AI is a pretty new set of technologies for SAP.
In this series of blogs, which I believe will need to be updated frequently, I will deep dive into the options available on why applying and how to apply Generative AI for our SAP systems using AWS.
First and foremost, I will introduce Foundation Models in generative AI. What large language models (LLMs), how do they work, where do they come from, and what other types of generative AI exist. We'll learn the basics of foundation model customization and how to evaluate a generative model's performance.
Essentially a Foundation Model is a machine learning model designed to cover many different tasks, such as Text Generation, Q&A, Image generation and classification, and code generation (I have not seen ABAP yet, but Java, Node, and Python might be useful for BTP).
Then, essentially has a potent impact on our lives since it will radically change how we interact with machines. Since the release of this generation of Foundation Models, we can now ask a machine what we want from it, and it delivers.
This potentially significantly impacts SAP and the traditional way we have interacted with ERPs or CRM software, based on multiple transactions and structured data represented in table or ALV. How we interact with Software will fundamentally change since we will ask to approve this particular order, which could be text, audio, or video generated, but it will be different. Although we are still not there, we can build exciting use cases today.
Traditional machine learning models use a classification or regressive model to solve one or two tasks. In contrast, Foundation models are more powerful because they're trained on many different sets of data, and these massive data sets allow a Foundation Model to do classification, question answering, or summarization, and they can be adjusted to our patterns and needs.
There are many ways to customize a foundation model; once we pick a foundation model and start working with it most people will want to customize it in a certain way. There are core trade-offs in customizing Foundation Model, but we have Pre-Training, Fine-Tunning, and Prompting, or Inference, which is the technique in machine learning that enables algorithms to make predictions by updating their prior knowledge based on new evidence, and new datasets.
Complexity and Cost are one of the main concepts we will discuss; on the other side, we have Accuracy for the expected results.
Options available on AWS today; SageMaker
Let's start with the available options on AWS; until Amazon Bedrock becomes Generally Available, we will not use it, so our choice is to use the well-known Sagemaker. SageMaker is a fantastic use case for enterprise customers since our data will always be available on our VPC.
Amazon SageMaker provides machine learning (ML) capabilities that are purpose-built for data scientists and developers to efficiently prepare, build, train, and deploy high-quality ML models.
Do we need to be Data Scientists and get skills in Jupyter Notebooks to start with SageMaker? Fortunately no. SageMaker allows us to do complicated tasks such as building our own Models, but SageMaker er also offers many built-in algorithms that can be used to quickly train and run inference; SageMaker Autopilot automatically builds, trains, and tunes the best machine learning models based on the provided data, but the most exciting and new feature is Sagemaker JumpStart, allow us to get started with ML using pre-built Computer Vision, NLP and proprietary and open source FMs that can be deployed in a few seconds.
You might already have your favorites, like AI21Labs Jurassic-2, because it gives us much flexibility on the size of the generated model. If we decide to use the open-source model Falcon from Hugging Face, it's well-documented, available, and ready to use on our own VPC.
If we decide, for example, that to reduce our scope, we are just going to focus on Language Foundation Models, also called GPT or Generative PreTrained Transformers, and we decide to use Falcon, a good Foundation Model, and we want just to interact with it, we will see that by using Prompt Engineering, we can quickly boost the accuracy and the performance of the model for most customers, while for some others this will not be enough, then comes the retrieval augmented generation or RAG.
We can start picking a generative model, and then we send in prompts to this model; we can send 10 prompts, or if we already know some hundreds of business prompts, what it allows us to do is to get the response from these prompts and categorize them.
Prompt Engineering is the simplest way to start. We take our Prompts, we send the prompts to this generative model, and we'll find out that the model gives many different responses; then, for each given response, we label them on a ranking, picking our favorites from better to worse, then we train a reward model so again whole lecture.
If we want to start with Cohere since I believe it's an excellent proprietary LLM, Cohere allows Playground feature, which is a way to interact with the model to test it before we deploy it in our landscape and decide if I want to continue.
NoteBook Studio is actually a Jupyter Server run and managed by AWS, which we could customize to select which kind of accelerator we can use.
We will associate a Foundation Module to a specific Domain, a Domain includes a space for us, and it allows us to select which users or applications will access our space.
We can easily select and play with open-source Falcon and proprietary model Cohere. Still, also we can easily interact with pre-trained Industry Related Models for Demand Forecasting, which are not related to Generative text generation but allow us to select Models which we don't need to invest the compute to train while giving accurate datasets in our case, any information coming from SAP or relevant data from our business.
SageMaker Studio allows us to select from various frameworks if we prefer TensorFlow or HuggingFace to Pytorch, but also others available.
Selecting a Foundation Model
Starting with the suitable foundation model is essential, as well as selecting the appropriate model by considering things such as size, accuracy, ease of use, licensing, industrial precedent, and external benchmarks, just to name a few parameters.
The quality of the model and the ease of execution of that model is going to vary as a function of how widely adopted that model is, so language models and vision models are so good today because of years of hard and dedicated R&D to make them fantastic and so for any other arbitrary modalities definitely we can try it, but it might be challenging until they are more widely adopted.
Whether we choose Forecasting or Anomaly Detection, it will be important they are already pre-trained base artifacts that have multiple terabytes of images relevant to the business. However, it could not be the case, and we can still use many Foundation Models which can, for example, do text summarization for those the size of the model matters.
The Number of Accelerators, aka the number of GPUs, will be significant depending on how we want to interact with them, Foundation Models take physical space, so if we can fit a model with 3 GB, that model will work in computers which can be located on remote servers, on the edge, for image classification. If the model is smaller, like Stable Diffusion, it can be deployed in other servers, like on-prem or a laptop. At the same time, the model might be significantly optimized and give us fantastic functionality.
CNN, convolutional neural networks are fantastic for Vision; they capture the visual structure, it learns the structural relationship of pixels and objects that the pixels, Transformers don't necessarily do that so well, so deploying a CNN generates more accessibility. Also, we have some flexibility to deploy the models since we can run them on a single CPU machine-powered NVIDIA GPU and not cluster it.
Many models allow us to create a training job to fit the model to our data. This is for pre-trained models, but we want to fine-tune their parameters instead of starting from scratch.
Fine-tuning can produce accurate models with smaller datasets and less training time.
With Fine-tuning, aka transfer learning, SageMaker handles the technical details; we can learn on this blog
Smaller models bring lower carbon emissions and costs, so it will be crucial to select the correct Model.
We can run 20 Billion models which fit in a single server with 8 GPU accelerators so they can still fit in a single box, which has our costs under control, and 20 Billion parameters allow us exciting use cases; we can interact well with such a model.
50 Billion parameters allow us to train complex datasets, and bigger than 100 Billion parameter models will have a big cost to be Inferended and trained. At the same time, it will give excellent results.
Open Source versus Proprietary
There is a vibrant and lucrative industry right now around Foundation Models.
There are two main types of LLMs: open-source and proprietary. Open-source LLMs are freely available, and anyone can use them. While the Proprietary LLMs are owned by a company and are only available to customers who purchase a license.
Open-source LLMs provide more flexibility for enterprises to deploy the models on their own infrastructure, whether it’s on-premises or in a private VPC. With Open-Source models organizations will always have full control over their data, ensuring that sensitive information remains within their network and reducing the risk of data breaches or unauthorized access.
Another reason is that open-source LLMs should not require licensing fees associated. For example, an enterprise may want to add specific features to the LLM or train it on a specific dataset. With an open-source LLM, this is possible. With a Proprietary LLM, this might be limited or not possible at all, with a proprietary LLM, the enterprise would have to work with the vendor to make these changes.
Reduced Vendor Dependency
Adopting proprietary LLMs may lead to vendor lock-in, where enterprises become reliant on a single provider for updates, maintenance, and support.
Open-source LLMs allow organizations to avoid this dependency by leveraging community contributions and engaging with multiple service providers or internal teams for ongoing development and support.
This flexibility enables enterprises to have more control over their technology stack and make strategic decisions based on their specific needs.
Some popular open-source Large Language Models available on AWS
BLOOM (BigScience Large Open-science Open-access Multilingual Language Model) by HuggingFace
LLaMa (Large Language Model by Meta)
Dolly by Databricks
However, it’s important to note that while open-source LLMs offer advantages, there may be cases where proprietary LLMs are more practical for our enterprise, such as when specific commercial support, proprietary datasets, or domain-specific expertise are crucial for an enterprise’s needs.
Propietery LLMs typically provide a faster time to market; OpenAI models such as GPT-4 and Dall-E, which are not available on AWS, set unified API endpoints that developers can interact with. They are also fully managed, so we don’t need to worry about setting up a self-hosting environment, but if you already played with Dall-E you already noticed the cost is per generated image, while other FMs have it per token.
However, on AWS, these will be accessible through API using Amazon Bedrock, they allow us to use Generative AI at scale since we don't manage the infrastructure needed for training or deploying the models and still have many use cases for the Enterprise.
Stability AI (hosted Stable Diffusion)
Anthropic Claude and Claude 2
Many startups leverage API-based, proprietary models in their experimental phase to kickstart product development. Once they find product-market fit, it might make sense for some of them to transition into self-hosting, especially for startups that might want to fine-tune their models or target high-throughput use cases.
Latency then be considered a downside of proprietary models, especially larger-scale models. A proprietary model hosted behind an inference API with a response time ranging up to 20 seconds will negatively impact user experience around real-time industry use cases.
Security & Governance
Overall, there are a lot of gaps around the security and governance of large language models (LLM) and generative models. Proprietary and open-source models both exhibit risks in different aspects. Proprietary models offer built-in filtering and content moderation capabilities that flag and prevent sensitive, violent, hateful or any other content that violates regular content policy.
However, due to data compliance and security concerns, many enterprises avoid using or fine-tuning proprietary models. Although out-of-the-box open-source models lack security and governance capabilities, they can be brought within businesses’ security perimeter and securely fine-tuned on local data.
The following architecture diagram shows how SageMaker manages ML training jobs and provisions Amazon EC2 instances on behalf of SageMaker users. As a SageMaker user, we can save our training dataset to Amazon S3. We can choose an ML model training from available SageMaker built-in algorithms or get our own training script with a model built with popular machine learning frameworks.
As we move from running individual artificial intelligence and machine learning (AI/ML) projects to using AI/ML to transform our business at scale, the discipline of ML Operations (MLOps) can help.
Like DevOps, MLOps relies on a collaborative and streamlined approach to the machine learning development lifecycle where the intersection of people, process, and technology optimizes the end-to-end activities required to develop, build, and operate machine learning workloads.
Although MLOps can provide valuable tools to help us scale our business, we might face certain issues as we integrate MLOps into our machine learning workloads. To avoid it, its a good idea to implement Project management capabilities where ML team members organize the most important ML resources into a single, ordered system. This includes code repositories, experiments, pipelines, registered models, and endpoints. With Project templates, we automate model building, training, and deployment, effectively industrializing our model lifecycle and CI/CD process using provided templates, while we can create our own templates. Documentation
In this Blog, we discussed the benefits of using the AWS platform for Generative AI use cases with SAP, how to deploy a Foundation Model in our own VPC, and how to select a suitable Foundation Model.
In the next series, I will discuss how to train the models, both Prompt Engineering or Fine-Tuning, Pre-Train a model on AWS with our own Enterprise data and how why applying reinforcement learning (RLHF), as well as how to monitor the performance and the quality of our models and detect Bias. We will also use DataSphere to save the data on Amazon S3 Stay tuned!
To learn more about generative AI on AWS, visit the SageMaker landing page and Amazon Bedrock