NOTE: The views and opinions expressed in this blog are my own
I've recently written a blog Share corporate info with an LLM using Embeddings which gives a very high level overview of embedding for LLM's using SAP AI CORE.
There are several different styles of Embeddings such as:
All the attention (excuse the pun) these days is on the Embeddings from Transformer models.
Check out the the Massive Text Embedding Benchmark (MTEB) Leaderboard.
Now as we go a little deeper in our attempts to compare Embeddings on Enterprise data, lets add 2 additional models:
Both are small enough to run on CPU, which is great for some preliminary tests.
Yes you can run them both on your laptop, but that's not Enterprise grade so lets run them on SAP AI CORE.
The process for deploying is pretty simple so I won't elaborate all the steps here, but I would recommend the tutorial Deploy a custom ML Python service on SAP AI CORE for more detail on the steps involved.
The pre-requisites to proceed are: SAP AI Core (Extended plan preferably), Docker Hub account.
Create a requirements.txt file:
Flask
angle_emb
torch
gensim
Create a server.py file:
# This code uses the AnglE library (https://github.com/WhereIsAI/angle_emb)
# based on the following paper:
# Li, Xianming and Li, Jing. AnglE-optimized Text Embeddings. arXiv preprint arXiv:2309.12871, 2023.
# https://huggingface.co/WhereIsAI/UAE-Large-V1
# This code also refers to https://radimrehurek.com/gensim/models/word2vec.html
from flask import Flask, request, jsonify
#AngleE
from angle_emb import AnglE, Prompts
#word2vector
import numpy as np
import gensim.downloader as genapi
app = Flask(__name__)
# Load the AnglE model
angle = AnglE.from_pretrained('WhereIsAI/UAE-Large-V1', pooling_strategy='cls').cuda()
angle.set_prompt(prompt=Prompts.C)
# Load pre-trained Word2Vec model
word2vec_model = genapi.load("word2vec-google-news-300")
@app.route('/embeddings/uae', methods=['POST'])
@app.route('/v1/embeddings/uae', methods=['POST'])
def uae():
try:
# Get JSON data from the request
data = request.get_json()
# Extract the prompt from the JSON data
prompt = {'text': data['text']}
# Encode the prompt to a vector
vec = angle.encode(prompt, to_numpy=True)
# Convert the vector to a string
vec_str = str(vec[0].tolist() ) #' '.join(map(str, vec))
# Return the vector string as JSON response
return jsonify({'vector': vec_str})
except Exception as e:
return jsonify({'error': str(e)})
@app.route('/embeddings/w2v', methods=['POST'])
@app.route('/v1/embeddings/w2v', methods=['POST'])
def w2v():
try:
# Get JSON data from the request
data = request.get_json()
# Extract the prompt from the JSON data
prompt = data['text']
words = prompt.split()
vectors = []
for word in words:
try:
vector = word2vec_model[word]
vectors.append(vector)
except KeyError:
# If the word is not found in the vocabulary, skip it
pass
# Calculate the mean of all word vectors to get the sentence vector
if vectors:
sentence_vector = np.mean(vectors, axis=0)
else:
# If no word vectors were found, return a vector of zeros
sentence_vector = np.zeros(word2vec_model.vector_size)
# Return the vector string as JSON response
vec_str = str(np.array2string(sentence_vector, separator=',', formatter={'float_kind': lambda x: "%.8f" % x}))
return jsonify({'vector': vec_str})
except Exception as e:
return jsonify({'error': str(e)})
if __name__ == '__main__':
#app.run(debug=True)
app.run(host='0.0.0.0', port=5000, debug=False)
Create a Docker file:
# This Dockerfile builds a container with a Flask app that uses the AnglE library
# based on the following paper:
# Li, Xianming and Li, Jing. AnglE-optimized Text Embeddings. arXiv preprint arXiv:2309.12871, 2023.
# https://huggingface.co/WhereIsAI/UAE-Large-V1
# Base image
FROM python:3.9
# Set the working directory in the container to /app
WORKDIR /app
# Copy the current directory contents into the container at /app
COPY server.py /app/
COPY requirements.txt /app/
# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
# Enable permission to execute anything inside the folder app
RUN chgrp -R 65534 /app && \
chmod -R 777 /app
#Expected with AI CORE deployments
RUN mkdir -p /nonexistent
RUN chmod -R 777 /nonexistent
EXPOSE 5000
# Run app.py when the container launches
CMD ["python", "-m", "server", "--host=0.0.0.0"]
Now you need to build, test and push the docker image to Docker Hub:
docker run -p 5000:5000 <your account>/sap-ai-test-embeddings:latest
docker build -t <your account>/sap-ai-test-embeddings:latest .
docker login -u <your account>
docker push <your account>/sap-ai-test-embeddings:latest
NOTE: You can test locally by running server.py directly or running the docker image:
### Test Embedding UAE
POST http://localhost:5000/embeddings/uae
Content-Type: application/json
{
"text" : "black panther"
}
### Test Embedding W2V
POST http://localhost:5000/embeddings/w2v
Content-Type: application/json
{
"text" : "black panther"
}
Now in a private git repo linked to AI-CORE specify the custom-embeddings/service-custom-embeddings.yaml:
apiVersion: ai.sap.com/v1alpha1
kind: ServingTemplate
metadata:
name: custom-embeddings
annotations:
scenarios.ai.sap.com/description: "Run a custom embeddings server on AI Core"
scenarios.ai.sap.com/name: "custom-embedding-scenario"
executables.ai.sap.com/description: "Run a custom embeddings server on AI Core"
executables.ai.sap.com/name: "custom-embedding-executable"
labels:
scenarios.ai.sap.com/id: "custom-embedding-server"
ai.sap.com/version: "0.1"
spec:
template:
apiVersion: "serving.kserve.io/v1beta1"
metadata:
annotations: |
autoscaling.knative.dev/metric: concurrency
autoscaling.knative.dev/target: 1
autoscaling.knative.dev/targetBurstCapacity: 0
labels: |
ai.sap.com/resourcePlan: basic
spec: |
predictor:
minReplicas: 1
maxReplicas: 1
containers:
- name: kserve-container
image: <your account>/sap-ai-test-embeddings
ports:
- containerPort: 5000
protocol: TCP
In SAP AI CORE Administration add a new application:
After a short while you should see the new ML Operations Scenario appear:
Next create the configuration.
Next Deploy.
Finally check that the Deployment is running.
Now lets check that the 2 new Embeddings work running on SAP AI CORE:
@auth='Bearer ......'
### Test Embedding UAE
POST <your deployment url>/v1/embeddings/uae
Content-Type: application/json
Authorization: {{auth}}
ai-resource-group: default
{
"text" : "black panther"
}
### Test Embedding W2V
POST <your deployment url>/v2/inference/deployments/ddc781cf7ae5ea71/v1/embeddings/w2v
Content-Type: application/json
Authorization: {{authEU10}}
ai-resource-group: default
{
"text" : "black panther"
}
If all went well you should be getting Embedding results like:
UAE-Large-V1 [1024 Dimensions] | Word2Vec [300 Dimensions] |
The good news is that if you followed this blog and the previous one you will have 4 Embedding Models running that you can test with your business scenarios:
Azure OpenAI text-embedding-ada-002 | 1536 Dimensions, hosted externally 175B parameters Currently Ranked 38 for Embeddings |
Ollama running Microsoft/Phi-2 | 2536 Dimension, hosted on SAP AI CORE, 2.7B parameters, infer.s [GPU] Not Ranked |
WhereIsAI/UAE-Large-V1 | 1024 Dimensions, hosted on SAP AI CORE, 335M parameters, basic [CPU] Currently Ranked 6 for Embeddings
|
Word2Vec (google-news-300) | 300 Dimensions,hosted on SAP AI CORE, basic [CPU] Not Ranked |
The bad news is this blog is getting a bit long, so I will need to test and compare them in my next blog in the series.
In the meantime I welcome your feedback below.
SAP notes that posts about potential uses of generative AI and large language models are merely the individual poster’s ideas and opinions, and do not represent SAP’s official position or future development roadmap. SAP has no legal obligation or other commitment to pursue any course of business, or develop or release any functionality, mentioned in any post or related content on this website.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
22 | |
10 | |
9 | |
7 | |
6 | |
6 | |
6 | |
6 | |
6 | |
6 |