
Welcome back to our series "SAP AI Core is All You Need"! Unfortunately, this is our final blog in the series 😕.
In this post, we'll deploy the fine-tuned model for Shakespeare style text transfer and demonstrate how to consume both the text generator and style transfer models. We've come a long way, and now it's time to put our models into action. By deploying them and creating an application to consume and evaluate their performance, you'll solidify your understanding of model deployment and gain valuable hands-on experience with SAP AI Core.
In this blog, you will gain practical insights into the following:
Let's get started!
In this last blog 😔, we’ll create an application (that can be deployed anywhere you like) for consuming the models we deployed and also compare their performance, etc. However, before we do that, I suggest you deploy the fine-tuned model yourself – by this point, we have learned a lot about SAP AI Core and will be able to do it. I believe in you!
Understanding the Changes for Fine-Tuned Model Deployment
Let’s give you some additional knowledge and explanations about the differences in deploying the fine-tuned model. First, let’s see what changed in the code (spoiler alert: not much).
Since this model is now performing a text style transfer task, we will create another executable for our API to receive (and show) the prompt, which wasn’t needed before. For this, we have changed the main.py file, specifically, the functions below:
.route('/v2/generate', methods=["POST"])
def generate_text():
data = request.get_json()
max_tokens = 256 # up to the pre-trained model.
temperature = float(data.get('temperature', 1.0))
top_k = int(data.get('top_k', 0))
top_p = float(data.get('top_p', 0.9))
prompt = data.get('prompt', None)
if prompt is None or prompt == "None":
return jsonify({'ERROR': 'Prompt is required for model tst-model.'}), 400
generator = Generator(model_manager, max_tokens, temperature, top_k=top_k, top_p=top_p)
generated_text = generator.generate(prompt)
processed_text = generator.post_process_text(generated_text)
lines = [line.strip() for line in processed_text.split('.') if line.strip()]
model_details = {
'model_name': 'shakespeare-style-transfer',
'temperature': generator.temperature,
'length': generator.length,
'top_k': generator.top_k,
'top_p': generator.top_p,
}
response = {
'prompt': prompt,
'completion': lines,
'model_details': model_details
}
return jsonify(response)
The key change here is the addition of handling a prompt parameter when the API is called. This ensures that for the TST Model, a prompt must be provided.
prompt = data.get('prompt', None)
if prompt is None or prompt == "None":
return jsonify({'ERROR': 'Prompt is required for model tst-model.'}), 400
The text generator returns a simple structure with generated text and model details. This one differentiates the response based on the model being used. It includes the prompt and completion, providing more context in the response for the style transfer task.
Now, let’s move to the generator.py file to check what has been changed.
ModelManager Class
Text Generation Logic:
idx = torch.full((1, 1), 4, dtype=torch.long, device=self.model_manager.serving_params.device)
completion = self.tokenizer.decode(self.__sample_from_model(idx)[0].tolist())
with torch.inference_mode():
def generate(self, modern_sentence):
try:
input_sequence = "<ME>" + modern_sentence + "<STYLE_SHIFT>"
enc_context = self.tokenizer.encode(input_sequence)
context = torch.tensor(enc_context.ids, dtype=torch.long, device=self.model_manager.training_params.device).unsqueeze(0)
sampled_indices = self.__sample_from_model(context)
completion = self.__get_completion(sampled_indices, context)
self.length = len(self.tokenizer.encode(completion).ids)
print(completion)
self.model_manager.logging.info(f"Text generated successfully with length: {self.length}")
self.model_manager.logging.info(f"With max tokens set to: {self.max_tokens}")
self.model_manager.logging.info(f"With temperature set to: {self.temperature}")
self.model_manager.logging.info(f"With top k set to: {self.top_k}")
self.model_manager.logging.info(f"With top p set to: {self.top_p}")
return self.post_process_text(completion)
except Exception as e:
self.model_manager.logging.error(f"Error during text generation: {str(e)}")
raise
Completion Generation:
def __get_completion(self, sampled_ids, context):
sequence = sampled_ids[0, context.size(1):].tolist()
return self.tokenizer.decode(sequence)
And that’s it, not that hard, right? Cool. Now you just have to sync the GitHub changes, create a configuration, and deploy it 😉.
If you’ve been following along with the past blogs, you're very likely to see something like this:
Then create a Configuration for each executable and select the artifact that we used for deploying the Shakespeare Language Model and the TST model.
Don’t forget to put the model.pkl and style-transfer-model.pkl in the corresponding path in S3, right? Remember that we created an artifact with the ai://shakespeare/deployments path for the Shakespeare Language Model; now you may want to create ai://shakespeare/deployments_tst for the fine-tuned model. The command is something like this:
aws s3 cp s3://<bucket>/shakespeare/executions/<execution_id>/model/style-transfer-model.pkl s3://<bucket>/shakespeare/<deployment_source_folder>/
Now, just create a “Deployment” and you should be fine to follow the next steps from now on.
Note: We implemented different executables for each model (pre-trained and fine-tuned); however, there are many ways of doing that. For instance, you may want to implement a conditional logic based on the model you pick from the same executable. Anyway, it's up to you 😉.
In addition, consider that the logs will be very helpful in providing you with further information about what’s happening behind the scenes.
Okay, now it’s time to create an application, and we’ll do that by using Streamlit to consume both deployed models and check their performance by playing around with them. To get started, you may remember that we posted some requests against the Shakespeare text generator model in our previous blog. So, before we start consuming them from the app, let’s test the text style transfer model using Python in Google Colab. By the way, in our previous blog, all the steps were executed, so you should be able to do it by yourself. If you aren’t, just go back there and take a look. Agreed? Good!
The only difference between that code and this one is the payload, which now contains the ‘prompt’.
temperature = 0.5
top_k = 0
top_p = 0.9
prompt = 'Can you teach me how to do it?'
# Create payload for model inference
payload = {
'prompt': prompt,
'max_tokens': max_tokens,
'temperature': temperature,
'top_k': top_k,
'top_p': top_p
}
And here's the result:
By just making this change, you should be able to run it! As we can see, it looks like it did a good job. Of course, you can find other optimal compositions of hyperparameters, but for all the demonstrations to follow, here are the hyperparameters we have chosen.
These hyperparameters were selected to balance model complexity and performance. For training the Shakespearean Language Models: 10 million parameters ensure sufficient learning capacity, 6 layers provide depth for feature extraction, 384-dimensional embeddings and 64-dimensional heads balance richness of representation and efficiency, 6 attention heads enhance parallel processing of different representation subspaces, a batch size of 256 optimizes training stability and speed, and a learning rate of 3.0 × 10-4 ensures effective convergence without overshooting.
For training, those numbers were broken down as follows:
If you want to try generating the above summary, use torchinfo.
When selecting these hyperparameters, it's important to balance the model's complexity with the available computational resources and choose a learning rate that allows the optimizer to converge efficiently without causing instability.
Just for comparison, take a look at the hyperparameters OpenAI used for training GPT-3, as described in the "Language Models are Few-Shot Learners" paper.
Our dataset is roughly 1 million tokens. All models were trained for a total of 300 billion tokens, which, for our dataset, would be 300,000 tokens in the OpenAI vocabulary. Huge difference, right? Despite this, we've still managed to develop an impressive large language model, given the circumstances 😊. Well, this is what we're going to see now.
Streamlit is an open-source app framework designed for creating and sharing beautiful, custom web apps for machine learning and data science projects. With Streamlit, you can turn data scripts into interactive web applications in just a few lines of code, making it a powerful tool for quickly developing and deploying AI applications.
In this blog section, we'll cover the main functionalities of the application and how it leverages Streamlit's capabilities to create a user-friendly interface. The code below demonstrates how to build a Streamlit application for interacting with two AI models: a Shakespeare text generator and a Shakespeare style transfer model. The main functionalities of the application include:
import json
import requests
import streamlit as st
from templates.custom_css import CSSGenerator
class InferenceClient:
def __init__(self, model_name):
with open('./config/env.json', 'r') as file:
self.env_vars = json.load(file)
self.uua_url = self.env_vars["AICORE_AUTH_URL"]
self.client_id = self.env_vars["AICORE_CLIENT_ID"]
self.client_secret = self.env_vars["AICORE_CLIENT_SECRET"]
self.tst_url = self.env_vars["TST_URL"]
self.slm_url = self.env_vars["SLM_URL"]
self.resource_group = self.env_vars["RESOURCE_GROUP"]
self.model_name = model_name
def get_token(self):
params = {"grant_type": "client_credentials" }
resp = requests.post(f"{self.uua_url}/oauth/token",
auth=(self.client_id, self.client_secret),
params=params)
return resp.json()["access_token"]
def get_headers(self):
return {
'Content-Type': 'application/json',
'AI-Resource-Group': self.resource_group,
'Authorization': f'Bearer {self.get_token()}'
}
def get_inference_url(self):
suffix = '/v2/generate'
if self.model_name == 'shakespeare-text-generator':
return self.slm_url + suffix
elif self.model_name == 'shakespeare-style-transfer':
return self.tst_url + suffix
else:
raise ValueError("Invalid model name")
def get_payload(self, max_tokens, temperature, top_k, top_p, prompt=None):
if self.model_name == 'shakespeare-text-generator':
return {
'max_tokens': max_tokens,
'temperature': temperature,
'top_k': top_k,
'top_p': top_p
}
elif self.model_name == 'shakespeare-style-transfer':
return {
'prompt': prompt,
'max_tokens': max_tokens,
'temperature': temperature,
'top_k': top_k,
'top_p': top_p
}
else:
raise ValueError("Invalid model name")
def Run():
st.title("Shakespearean Language Model")
st.sidebar.header("Model")
model_name = st.sidebar.selectbox("Select the Shakespeare model you want:", ['shakespeare-text-generator', 'shakespeare-style-transfer'])
st.sidebar.header("Parameters")
max_tokens = st.sidebar.slider("Max Tokens", min_value=0, max_value=4096, value=250, step=10)
temperature = st.sidebar.slider("Temperature", min_value=0.0, max_value=2.0, value=0.5)
top_k = st.sidebar.slider("Top-K", min_value=0, max_value=50, value=0)
top_p = st.sidebar.slider("Top-P", min_value=0.0, max_value=1.0, value=0.9, step=0.1)
infc = InferenceClient(model_name)
custom_css = CSSGenerator.generate_custom_css(max_tokens)
st.markdown(custom_css, unsafe_allow_html=True)
if model_name == 'shakespeare-style-transfer':
prompt = st.text_input("Prompt", key="prompt_input")
if st.session_state.prompt_input:
headers = infc.get_headers()
inference_url = infc.get_inference_url()
payload = infc.get_payload(max_tokens, temperature, top_k, top_p, prompt)
response = requests.post(inference_url, headers=headers, json=payload)
if response.status_code == 200:
data = response.json()
st.subheader("Shakespeare Style Text:")
#st.write(f"Prompt: {data['prompt']}")
lines = data['completion']
formatted_text = "<br>".join(lines)
styled_text = f'<div class="section"><ul class="list">{formatted_text}</ul></div>'
st.markdown(styled_text, unsafe_allow_html=True)
st.subheader("Metadata:")
metadata_html = (
f"Model Name: {data['model_details']['model_name']}<br>"
f"Temperature: {data['model_details']['temperature']}<br>"
f"Length: {data['model_details']['length']}<br>"
f"Top-K: {data['model_details']['top_k']}<br>"
f"Top-P: {data['model_details']['top_p']}"
)
st.markdown(metadata_html, unsafe_allow_html=True)
else:
st.error(f"Error: {response.status_code} - {response.text}")
else:
prompt = None
if st.sidebar.button("Generate") or (st.session_state.prompt_input if model_name == 'shakespeare-style-transfer' else False):
headers = infc.get_headers()
inference_url = infc.get_inference_url()
payload = infc.get_payload(max_tokens, temperature, top_k, top_p, prompt)
response = requests.post(inference_url, headers=headers, json=payload)
if response.status_code == 200:
data = response.json()
st.subheader("Generated Text:")
lines = data['generated_text']
formatted_text = "<br>".join(lines)
styled_text = f'<div class="section"><ul class="list">{formatted_text}</ul></div>'
st.markdown(styled_text, unsafe_allow_html=True)
st.subheader("Metadata:")
metadata_html = (
f"Model Name: {data['model_details']['model_name']}<br>"
f"Temperature: {data['model_details']['temperature']}<br>"
f"Length: {data['model_details']['length']}<br>"
f"Top-K: {data['model_details']['top_k']}<br>"
f"Top-P: {data['model_details']['top_p']}"
)
st.markdown(metadata_html, unsafe_allow_html=True)
else:
st.error(f"Error: {response.status_code} - {response.text}")
if __name__ == "__main__":
Run()
The application starts by defining an InferenceClient class that manages authentication and API requests. This class reads credentials from a configuration file, obtains access tokens, and constructs headers for API calls. It also determines the appropriate API endpoint based on the selected model and constructs payloads for the inference requests.
The Run function is the core of the Streamlit app. It sets up the interface, allowing users to select a model and adjust parameters through sidebar widgets. When users input a prompt or adjust parameters, the app sends a request to the AI Core inference API, processes the response, and displays the generated or transformed text along with metadata.
Custom CSS generated by the CSSGenerator class is applied to the interface to improve the visual aesthetics, ensuring that the app is not only functional but also visually pleasing. That’s it! Now you have some good place to use the API we deploy from scratch.
Pretty exciting, right? Well, if you want to deploy the app on BTP Cloud Foundry, just try the "Create an Application with Cloud Foundry Python Buildpack" guide, and you’ll be good to go.
And here’s our application:
If you don’t want to deploy on BTP, you can choose any other platform or even keep the consumption local to check your model’s performance. No problem at all - these options are just proposed to make your model's results analysis easier. You can also use Postman, curl, etc., for consuming the API. It’s really up to you, okay?
Our app uses this model to generate text that mimics the style of Shakespeare. Given that we trained it on only 40,000 words, we can't expect extraordinary results 😉 . However, we can reasonably expect it to capture some aspects of Shakespeare's writing and style. We’ll focus on three key parameters that influence the sampling process: temperature, top_k, and top_p. By adjusting these parameters, we can significantly alter the output, showcasing the influence of these techniques. Let’s dive into three different scenarios to see how each combination of these parameters affects the generated text.
Scenario 1: Conservative and Coherent
Parameters:
In this scenario, we set a moderate temperature of 0.7 to maintain some creativity while ensuring coherence. By setting top_k to 40, we limit the sampling pool to the top 40 predictions, and with top_p at 0.9, we include enough variability to produce natural-sounding text.
The generated text in this scenario is structured and maintains coherence throughout. The language is consistent with the Shakespearean style, featuring clear and logical progression. For example, "Save thou hast too done, by thy love, And make a pedlar to death of thy blood" demonstrates a well-formed sentence that adheres to the thematic elements of Shakespeare's works. Additionally, phrases like "For when thou desirest, and the seat of thee, Or else thy father's last of slaughters" are (almost) clear and contribute to a cohesive narrative.
Scenario 2: Creative and Diverse
Parameters:
Here, we increase the temperature to 1.0, allowing the model to be more adventurous with its predictions. By setting top_k to 50 and top_p to 1.0, we expand the sampling pool slightly, encouraging more diversity in the generated text.
With the increase in temperature and sampling pool, the text becomes more diverse and creative. This is evident in the unusual word choices and varied sentence structures. For instance, "Shall have it vengeance to him and here! NORTHUMBERLAND: And her I'll have staggeret again" showcases a mix of inventive phrasing and new vocabulary. The line "LORD ROSS: For I protest to his friends, can princely It were stinkle, in arms" reflects this diversity, although it introduces some less coherent elements. This scenario balances creativity with coherence, resulting in text that is engaging and varied but sometimes challenging to follow.
Scenario 3: Highly Creative and Unpredictable
Parameters:
In this last scenario, we set the temperature to a high 1.5, pushing the model towards highly creative and less predictable outputs. By not limiting top_k and setting top_p to 0.80, we allow for a wider range of possibilities, resulting in text that is rich in variety but may occasionally lose coherence.
In this highly creative and unpredictable scenario, the text generated is rich with imaginative expressions and whimsical phrases. For example, "I have not like me: thou hast sad, one to my boy; A likelen in sports it was my grief" shows a departure from traditional coherence, favoring creativity. The phrase "Be no chident, that is a monthly sound In suspicing here to death at deserves" highlights the increased randomness and less structured output. The overall text is filled with bold and unexpected combinations of words, such as "Too say so soon any more fourt of great sight Only Phoebus, grantio, here" showcasing the model's creative potential. However, this comes at the expense of clarity and logical flow, making some parts nonsensical.
More Shakespeare generated text:
What about the fine-tuned model? Let's examine some of its completions to get a sense of how it's performing:
At first glance, it seems promising, but it's still worth delving deeper to understand its outputs. Of course, there are several metrics we can use, such as BLEU Score (measures the overlap of n-grams between the generated text and a reference text), ROUGE Score (evaluates the quality of the generated text based on recall, precision, and F1 score against a set of reference texts), or Perplexity (measures how well a probability model predicts a sample). However, let's stick with "Human Evaluation," which directly addresses our goal: checking if it worked effectively.
Human judges evaluate the output based on certain criteria. This method, though subjective, provides valuable insights into the quality of the transformation.
Criteria 1: Style Accuracy
Example 1: "I'm so thankful for your friendship."
Example 2: "Can you help me understand?"
Criteria 2: Content Preservation
Example 1: "It's a wonderful morning."
Example 2: "He's cooking dessert."
Criteria 3: Fluency and Readability
Example 1: "I need to finish this."
Example 2: "Let's go for a walk."
While our fine-tuned model demonstrates good performance in generating stylistically accurate Shakespearean text, further refinements can enhance its performance in several areas. Nonetheless, it's already quite impressive, isn't it?
Congratulations on successfully deploying and consuming your fine-tuned Shakespeare-style transfer model! In this blog, we walked through the deployment process, explained key changes in the code, and built a Streamlit application to interact with your models. Let's recap what we've covered:
As you wrap up your journey with SAP AI Core and Transformers, consider exploring these advanced options to further enhance your models and workflows. Here are some next steps to dive deeper into the capabilities of AI and machine learning:
We appreciate your dedication and enthusiasm throughout this series. By completing these blogs, you've equipped yourself with the tools and knowledge to leverage the full potential of SAP AI Core and SAP AI Launchpad. We look forward to seeing the incredible AI solutions you'll create.
Stay curious, keep learning, and continue pushing the boundaries of what's possible with AI. Together, let's shape the future of technology and innovation.
Let's get started on your next AI adventure!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
19 | |
19 | |
16 | |
9 | |
7 | |
7 | |
6 | |
6 | |
6 | |
5 |