Meta released its state of the art model, Llama 3.1 405B a few days ago, and well I have some exciting news if you think "Why should I care, I am SAP", I will tell you 6 reasons why you should consider it.
Meta has been training this model since January, dozens of thousands of GPUS, so, lets give Zuckerberg some credit, that is many millions of dollars we don't need to spend to train a model like it, because its open source.
Remember, there are 3 things we care about in AI;
1. The compute
2. The Algorithm
3. The Data and use case
Compute and Algorithm, don't ever consider. As we mentioned, Meta invested billions of dollars developing and training this model so you focus on 3 (the Data).
Running the model, I will provide couple of options but model inference is not simple. Be suspicious of over simplified explanations on "how to easily run an LLM". Large models are large and the performance decreases fast once they are being used, and we want them to be used, not to mention the cost of the storage, compute, and RAG. Best tip is always to focus on the quality of the data and the use case you want to give to this software.
Llama is the kind of model we would use internally in a company, because corps don't want users to easily give a sensitive PDF to OpenAI and get a summary out of it. Corporations are protecting its employees from providing regulated data to open endpoints. They keep it private.
To call Llama model served by Amazon Bedrock from your SAP BTP application, assuming you already have an Amazon Bedrock account and the API ready to be called, follow the steps to setup SAP AI Core as Proxy for Bedrock and then expose Bedrock via SAP AI Core Proxy to your application.
Llama 3.1 405B available on US West 2 (Oregon). By Author
The official documentation for this is here https://aws.amazon.com/blogs/awsforsap/power-your-business-with-secure-and-scalable-generative-ai-se...
the unofficial information is here https://community.sap.com/t5/technology-blogs-by-members/generative-ai-for-sap-vi-consume-amazon-bed...
To run the Llama 3.1 405B model, we will need a significant amount of RAM. The minimum RAM for running the Llama 3.1 405B model is 128 GB. However, this is just a minimum requirement, and having more RAM will significantly improve the model's performance. In fact, we can run it on 64 GB but makes the model very slow, 128 GB makes it slow and 256 GB makes it decent. This is a big model and requirements are significant.
SAP AI Core has a limited resource plan ** (August 2024). It maps AWS g4 family instances for NVIDIA T4 Tensor Core GPUs
Infer-S | 1 T4 | 3 | 10 |
Infer-M | 1 T4 | 7 | 26 |
Infer-L | 1 T4 | 15 | 58 |
Distributed Llama is a project that allows us to run an LLM model across multiple devices. It uses tensor parallelism and is optimized for the low amount of data required for synchronization. Distributed Llama distinguishes between two types of nodes that you can run on your devices:
for us in BTP is good because we dont have many GPUs but some more CPUs, and Distributed Llama supports only CPU inference ** (August 2024).
AI cluster topology, 4 devices, total 256 GB RAM. Image by Author
The root node on the first device and 3 worker nodes on the remaining devices will account the required 256 GB. Distributed Llama splits RAM usage across all devices.
Follow the steps detailed on this GitHub until finally, you run the Llama on the Master device.
./dllama-api \
--model models/llama3_1_405b_instruct_q40/dllama_model_llama3_1_405b_instruct_q40.m \
--tokenizer models/llama3_1_405b_instruct_q40/dllama_tokenizer_llama3_1_405b_instruct_q40.t \
--buffer-float-type q80 \
--max-seq-len 2048 \
--nthreads 4
my basic test for LLMs is to ask the mandatory field for the Document Info Record Service API
SAP API sourced knowledge by Llama. Image by Author
This is not an easy test, the only model which has not failed on the same question is GPT 4.
Use case of LLMs as function calling is described on this blog post.
I give this code to the LLMs, my expectation is it discovers an incorrect syntaxis around the CATCH instruction outside the Loop, something almost all models miss;
DATA: lt_mara TYPE TABLE OF mara,
ls_mara TYPE mara,
lv_matnr TYPE mara-matnr.
SELECT matnr, maktx FROM mara INTO TABLE lt_mara.
LOOP AT lt_mara INTO ls_mara.
IF ls_mara-matnr = lv_matnr.
WRITE: / ls_mara-matnr, ls_mara-maktx.
ENDIF.
ENDLOOP.
CATCH cx_sy_itab_line_not_found INTO DATA(lx_itab_error).
WRITE: / 'Internal table error:', lx_itab_error->get_text( ).
ENDTRY.
Let me ask the model if above statement syntax is correct;
ABAP Syntax check. Image by Author
Corrected syntax;
DATA: lt_mara TYPE TABLE OF mara,
ls_mara TYPE mara,
lv_matnr TYPE mara-matnr.
lv_matnr = 'some_value'. " Initialize lv_matnr with a value
SELECT * FROM mara INTO TABLE lt_mara.
LOOP AT lt_mara INTO ls_mara.
IF ls_mara-matnr = lv_matnr.
WRITE: / ls_mara-matnr, ls_mara-maktx.
ENDIF.
ENDLOOP.
IF sy-subrc <> 0.
WRITE: / 'No records found in MARA table'.
ENDIF.
I am no ABAPer but I believe this is a more decent code
I ask the LLM;
Can I update 2 columns from 2 tables joined by foreign key with one statement in SAP HANA?
The answer should be NO, but sometimes I get this;
Incorrect response from MistralAI Large. Image by Author
And this is the correct response;
Llama 3.1 getting it right. Image by Author
I am quite satisfied about it. Llama got right that I cant update 2 different HANA columns on a single statement.
Llama 3.1 405b is a beast. It is a large model and it will require a significant amount of resources to run it if that is our desire, but services like Amazon Bedrock help us to do the heavy work of not worrying about running LLMs and just privately expose an API for us only.
Llama 3.1 although has not been directly released for the enterprise segment, is exceeding by far all the business tests I have executed on my day to day SAP activities. Meta has done a very good job.
Meta AI Blog : https://llama.meta.com/
Meta Llama 3.1 : https://ai.meta.com/research/publications/the-llama-3-herd-of-models/
Model Accessability: https://llama.meta.com/llama-downloads/
Try on Huggingface: https://huggingface.co/chat/
Usage Llama3.1 : https://llama.meta.com/docs/getting-the-models/405b-partners/
Research document Link : https://www.rivista.ai/wp-content/uploads/2024/07/452387774_1036916434819166_4173978747091533306_n.p...
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.