Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
Jörg_Brandeis
Contributor
When loading data from source systems, there are requirements in SAP BW that are often repeated at the field level. This includes, above all, logic for cleaning up and processing the data, such as:

  • Add and remove leading zeros

  • Case distinctions

  • Remove spaces at the beginning or end

  • Conversion to capital letters

  • Derive one field from another with fixed pattern sub-strings, e.g. COAREA ==> COUNTRY

  • Derivation of time characteristics


The following listing shows a few examples in the source code:
outTab = SELECT TO_VARCHAR(TO_DATE(calday), 'YYYYMM') AS calmonth,
"/BIC/SRCSYS",
...
"/BIC/CURTYP",
LEFT("/BIC/COAREA", 2) AS "/BIC/COUNTRY",
SUBSTRING("/BIC/BI_PROFCT", 9, 2) as "/BIC/PCACCID",
CASE SUBSTRING("/BIC/PROFCT", 8, 3)
when '643' then '1'
when '655' then '1'
when '641' then '2'
when '651' then '2'
when '643' then '3'
when '655' then '3'
else ''
end as "/BIC/PRBY',
...
FROM :intab;

Typical expressions in the field list of a transformation routine


What all these requirements have in common is that they

  • can be easily implemented in SQLScript using the existing SQL functions

  • repeat themselves often

  • can be executed very quickly by SAP HANA as long as they are implemented directly in the transformation routines.


DRY – Don’t Repeat Yourself


The well-known DRY principle should actually come into play here. So that you don’t repeat the source code all the time. In the field list, the scalar, user-defined functions (UDF) are available for this purpose. Until recently, these were not available for the transformation routines in the BW, because they can only be created with the AMDP Framework since AS ABAP Release 753 . But scalar UDFs would be ideal for precisely these requirements. This ensures a uniform implementation. And in the source code you can move from a technical description of the expressions to a business view:

outTab = SELECT "ZCL_CALDAY=>TO_CALMONTH"(calday) AS calday
"/BIC/SRCSYS",
...
"/BIC/CURTYP",
"ZCL_COAREA=>TO_COUNTRY"("/BIC/COAREA") AS "/BIC/COUNTRY",
"ZCL_PROFCT=>TO_PCACCID"(/BIC/PROFCT") as "/BIC/PCACCID",
"ZCL_PROFCT=>TO_PRBY"(/BIC/PROFCT") as "/BIC/PRBY',
...
FROM :intab;


Outsourcing the expressions to a scalar UDF


The outsourced functions look much more elegant. And they are also superior from the point of view of maintenance. For example, if another profit center appears in the list, there is exactly one function that needs to be adjusted. Using the example of ZCL_PROFCT=>TO_PRBY I show such a scalar AMDP function:





CLASS zcl_profct DEFINITION
PUBLIC
FINAL
CREATE PUBLIC .
PUBLIC SECTION.
INTERFACES if_amdp_marker_hdb.
METHODS to_prby IMPORTING VALUE(iv_profct) TYPE char10
RETURNING VALUE(rv_prby) TYPE char1.
ENDCLASS.
CLASS zcl_profct IMPLEMENTATION.
METHOD to_prby BY DATABASE FUNCTION FOR HDB LANGUAGE SQLSCRIPT OPTIONS DETERMINISTIC READ-ONLY.
rv_prby = case SUBSTRING(:iv_profct, 8, 3)
WHEN '643' then '1'
WHEN '655' then '1'
WHEN '641' then '2'
WHEN '651' then '2'
WHEN '643' then '3'
WHEN '655' then '3'
ELSE ''
END ;
ENDMETHOD.
ENDCLASS.





The example shows that the function is just a simple wrapper for a CASE expression. By the way, outsourcing to the ABAP world also makes it easy to write UnitTests for the functions. Since the functions in the database are only generated from the ABAP on the first call, UnitTests are also suitable for generating them.

Elegant but slow


As elegant as the outsourcing of logic in UDFs is, the concept is unfortunately not to be used for large amounts of data. This is because the UDFs have a significant influence on the runtime. I would like to show this in another expression from the example above:

TO_VARCHAR(TO_DATE(calday), 'YYYYMM')

This simple expression converts an ABAP DATS value into a SQLScript date, which is then output to a YYYYMM string. This corresponds to the format of the popular InfoObject CALMONTH.

Example: 20200928 ==> 202009

For this purpose, I create an AMDP UDF according to the above pattern:





  METHOD to_calmonth BY DATABASE FUNCTION FOR HDB LANGUAGE SQLSCRIPT OPTIONS DETERMINISTIC READ-ONLY.
rv_result = TO_VARCHAR(TO_DATE(:iv_calday), 'YYYYMM');
ENDMETHOD.





I made two queries in the SQL console on a small BW table with about 3 million entries: one directly with the expression and one with the function call.


Without UDF






do begin
select budat,
TO_VARCHAR(TO_DATE(budat), 'YYYYMM'),
account,
amount,
curr
from "/BIC/AZBR_E1_S2";
end;





Console output

Statement 'do begin select budat, TO_VARCHAR(TO_DATE(budat), 'YYYYMM'), account, amount, curr from ...'
successfully executed in 501 ms 237 µs (server processing time: 801 ms 664 µs)
Fetched 1000 row(s) in 102 ms 468 µs (server processing time: 0 ms 677 µs)
Result limited to 1000 row(s) due to value configured in the Preferences

With UDF










do begin
select budat,
"ZCL_CALDAY=>TO_CALMONTH"(BUDAT),
account,
amount,
curr
from "/BIC/AZBR_E1_S2";
end;





Console output:
Statement 'do begin select budat, "ZCL_CALDAY=>TO_CALMONTH"(BUDAT), account, amount, curr from ...'
successfully executed in 2:09.418 minutes (server processing time: 4:18.632 minutes)
Fetched 1000 row(s) in 103 ms 144 µs (server processing time: 0 ms 675 µs)
Result limited to 1000 row(s) due to value configured in the Preferences






The runtime is so different that you actually have to assume a bug. Because the complexity of the example is minimal. With less complexity, I don’t need UDF anymore. What I find amazing:

  • In fact, there are only 365 different values of BUDAT in the table. Thus, the system has the information: With the keyword DETERMINISTIC the system has the information: The same input should create the same output. See ABAP Documentation. This should be executed a maximum of 365 times and then be read out of the buffer, right?

  • I would also have expected only the dictionary of the column to be converted, so each value exactly once.

  • If you had replaced the function call with the function content with a primitive text editor with simple search/replacement, you are over 100 times faster. This is what the optimizer of an SAP HANA 7 years after the introduction of scalar UDFs should be able to do, right? The first blog post on the topic of Rich Heilmann was from 2013


In order to fully document the whole thing, I also did the analysis with PlanViz.


Planviz of the Query without UDF



PlanViz of the Query with UDF


It is clearly visible that, by and large, the same implementation plan is chosen. But the runtime of the one, red-marked JECalculate node contains the entire runtime for the UDF call. This should at least be parallelized.

I also stumbled across the HINTS. The HINT (INLINE) is supposed to trigger exactly the desired behavior: Optimization should be done globally. But unfortunately there is nothing in the documentation about functions. Everything refers to procedures. And that corresponds to my observation: the hint brings absolutely no change. It remains slow.
do begin
select budat,
"ZCL_CALDAY=>TO_CALMONTH"(BUDAT),
account,
amount,
curr
from "/BIC/AZBR_E1_S2"
WITH HINT(INLINE);
end;

Conclusion


Why is that? Do I have a misconception in my example? Can I still optimize this? Or are the UDFs just so slow. While researching forums, I have seen that I am not alone with my problems. There are also several ways to slow down the system with unfavorable UDFs. Some examples from answers.sap.com:

In any case, there is still great potential. At least for simple expressions without querying other DB tables.
I know that there are also complex requirements that cannot be reduced to a single expression. But then at least parallel processing would be desirable.
6 Comments
Labels in this area