Technology Blog Posts by SAP
cancel
Showing results for 
Search instead for 
Did you mean: 
ssohrabij
Associate
Associate
1,145

Generative AI has transformed how developers work by streamlining coding tasks and accelerating the transition from concepts to functional code. ABAP is SAP’s proprietary programming language and a critical foundation for SAP’s ERP systems and SAP S/4HANA Cloud, making it a crucial target for specialized AI.

Earlier this year, we unveiled the general availability of Joule for Developers in ABAP Development Tools (ADT). To bring this to life, we built an ABAP large language model (LLM) trained on more than 300 million lines of internal ABAP code as well as natural language documentation. A key capability of Joule for Developers is predictive code completion which suggests code snippets in real time in a “context-aware” fashion.

In this blog post we introduce how we leverage Constrained Decoding for code generation to 1) accelerate development time, 2) reduce errors, and 3) increase compliance and reliability. Throughout this process, we collaborated closely with NVIDIA to leverage NVIDIA NIM microservices for efficient inference.

 

figure1.png

  Figure 1: Joule for Developers integrated in ABAP Development Tools (ADT)

 

The importance of generating reliable and compliant ABAP syntax

Because ABAP is a closed-source and low-resource language, general purpose commercial LLMs often struggle to produce correct syntax and follow ABAP cloud programming best practices. Their performance is subpar compared to their capabilities in more commonly used languages such as python and Java. To close the performance gap of available LLMs on ABAP, we trained our own ABAP foundational model using internal ABAP code as well as high quality, manually labeled data. Given the complexity of the business logic involved, even minor syntax or logical errors can lead to costly disruptions.  Adhering to SAP-specific standards is therefore crucial.

Introducing constrained code generation using an ultra-fast ABAP parser

Constrained decoding revolves around verifying each token or entire statements in real time against ABAP’s strict grammar rules and SAP’s evolving standards. If a partial suggestion seems wrong, whether due to syntax, language-version differences (such as ABAP Cloud), or SAP-specific compliance rules, the parser instantly flags and refines it, ensuring that only correct suggestions are presented to the user.

Figure 2 shows how constrained decoding works using a simple example. Every potential token sequence (e.g., “value1 + value2”) is evaluated against the ABAP grammar. If the most likely next token or statement would violate ABAP syntax (such as prematurely ending a class, or creating an unbalanced block boundary), the parser flags it as invalid and discards that path. This pruning process guarantees that the final output strictly adheres to the defined ABAP rules and structure.

In detail, if {x₁, x₂, …, xᵢ₋₁} is the partial sequence generated so far. At step i, we let the language model define a probability distribution over potential next tokens t as:

P_model (t | x₁, …, xᵢ₋₁).

We denote by G(x₁, …, xᵢ₋₁, t) ∈ {True, False} a grammar-validation function powered by the ABAP parser that evaluates whether the sequence (x₁, …, xᵢ₋₁, t) adheres to ABAP syntax. The set of valid next tokens is then:

Vᵢ = { t | G(x₁, …, xᵢ₋₁, t) = True }.

The constrained decoding step then selects the highest probability token from the valid set. By ensuring that t must lie in Vᵢ, the process discards any token violating ABAP syntax, thereby boosting the grammar-compliance of the sequence. The same procedure can be applied to restrict the generations to special ABAP language versions, such as ABAP cloud-compliant objects and APIs.

 

figure 2.png Figure 2: Token-wise Constrained Decoding for ABAP

 

This approach hinges on the speed and accuracy of the ABAP parser to evaluate each potential next token during inference. To ensure it works effectively we implemented a Rust-based parser and collaborated with NVIDIA to integrate it into the inference engine in NVIDIA NIM.

Evaluating Constrained Generation for ABAP

To quantify how well constrained decoding contributes to generating working ABAP code, we assembled a diverse benchmark of real-world ABAP Cloud programming tasks. Each programming task is accompanied with several unit tests that check correct and cloud compliant implementation. So, the simplest, most objective metric becomes: does the generated code pass all its unit tests? This is referred to as the Unit Test Pass Rate (Figure 3).

 

figure3.png

 Figure 3: Performance benchmark for a diverse set of ABAP code generation tasks

 

GPT-4, a standard state-of-the-art LLM, passes just over half of our ABAP Cloud tests, underscoring how niche language constructs and SAP-specific APIs can trip up generic models. Simply switching to our internally trained ABAP model lifts the pass rate to 76%. This model has seen ABAP grammar, definitions, and APIs that are relevant for day-to-day development. Through token-wise and statement-wise constrained decoding and sampling we further boosted the performance to 84% enabling additional tasks to be solved automatically.

 

Collaborated with NVIDIA to Add Support for SAP ABAP Constrained Decoding to NIM

Deploying constrained decoding for ABAP code generation hinges on having a flexible, high-performance LLM inference framework – so we’ve collaborated with NVIDIA to leverage NIM. NVIDIA NIM is a set of easy-to-use microservices designed to accelerate the deployment of generative AI models across the cloud, data centers, and workstations. Together with NVIDIA, SAP designed and developed a custom guided decoding capability leveraging NIM, which provided the key features below for an effective implementation of ABAP constrained decoding in the LLM inference pipeline.

 

Seamless Integration

NIM's API allows for straightforward integration of our custom logits processor and ABAP parser. Figure 4 shows the architectural diagram of how ABAP constrained decoding is integrated and deployed via a NIM container.


figure 4.png

 Figure 4: Architecture of ABAP Constrained Decoding in NIM

 

 

By implementing the logit processing functionality through a Python interface, we inject our grammar validation logic directly into the token generation process. This means that each potential token can be evaluated against ABAP grammar rules in real-time, effectively pruning invalid paths before they're suggested to the developer. Our custom implementations can be stored as packaged directories and loaded into NIM at runtime.

 

Flexible Runtime Selection

One significant advantage of NIM is its support for multiple custom decoding backends, with the ability to switch between them on the fly per user query. This flexibility allows us to employ different parsing strategies based on the specific ABAP context (such as switching between standard ABAP and ABAP Cloud constraints) without needing a separate deployment for each ABAP language version.

 

Optimized Performance

Despite the additional validation steps, the Rust-based ABAP parser integration through NIM maintains high throughput and low latency. The highly optimized infrastructure handles the token-by-token validation efficiently, ensuring that code suggestions appear promptly in the developer's environment. Our benchmarks show that the constrained decoding approach adds minimal overhead compared to unconstrained generation, while significantly improving the quality and compliance of the suggestions.

When compared with alternative baseline inference solution for a typical server request workload, our co-designed NIM achieves up to 31% better latency and up to 26% better throughput. Figure 5 shows various normalized latency and throughput metrics.


figure 5.png

 Figure 5: Latency and Throughput Comparison of Co-Designed NIM Solution vs. Alternative

  

State Management

NIM's state management capabilities proved invaluable for complex ABAP syntax validation that requires context tracking (such as nested code blocks or statement boundaries). By leveraging the context variables feature, our parser can maintain awareness of the current syntax state throughout the token generation process, ensuring that suggestions remain contextually appropriate.

 

Summary

At SAP, we continually innovate to improve ABAP developer productivity. Constrained decoding is one such technique. This approach elevates Joule for Developers’ code completion functionality by narrowing the model’s output space with a millisecond‑fast Rust parser. With NVIDIA, we co-designed our solution to leverage a custom decoding backend to deliver constrained decoding with optimized performance powered by NIM.  Looking ahead, we will extend the grammar layer to accommodate additional constraints, updated APIs and evolving SAP guidelines.

 

 

Authored by: Abdelaziz Ben Othman, Salma Sohrabi-Jahromi, Karthikeyan Asokkuma, Philipp Herzig; and NVIDIA: Yi Fan, Neal Vaidya, Jeff Farris, Bhuvan Agrawal, Arun Venkatesan, Tricia Barr.


The following people (written in alphabetic order) contributed substantially to developing the implementation discussed in this post: Abdelaziz Ben Othman, Akhilesh Kakade, Ankit Shrivastava, Kai Patrick Reisert, and Sai Madhav Jonnada.