Sometimes you might want to test a slow-running query by executing it directly from the command line. And surprisingly, now that the query is executed directly on the database, it has a much better runtime. How can that be? Since this is a question that I have heard repeatedly, I will outline what’s behind this behavior.
To be able to understand why the directly executed query can be faster and can have a better access plan than the query coming from the SAP code, we need to know how queries are executed by SAP code. In this blog post, we will have a closer look at parameter markers and at distribution statistics and their usage inside Db2.
You will learn what parameter markers are, when they can be an advantage or disadvantage during SQL query execution, and – spoiler alert! -why they are still useful.
How Are Queries executed by the SAP ABAP Code?
If we search for a certain query in the SQL trace (SAP transaction ST05 or ST12) or in the SQL cache of the DBA Cockpit (transaction DBACOCKPIT), we will very likely see something like this:
SELECT *
FROM "ZOPT_TEST"
WHERE "KEY1" = ? AND "KEY2" = ? AND "KEY3" = ? WITH CS
-- OPTLEVEL( 5 ) -- QUERY_DEGREE( ANY ) -- LOCATION( ZOPT_TEST_SELECT , 30 )
-- SYSTEM( XXX , SAPXXX )
So, we can see that instead of actual values, question marks “?” are used in the WHERE clause of the SQL query. These question marks are called parameter markers.
What Are Parameter Markers and What Are Their Advantages?
Parameter markers are placeholders for actual values. The actual values will be bound to the parameter markers when the query is executed.
With parameter markers, Db2 needs to prepare the query only once and can reuse the access plan of the query for each execution. Especially for often-executed queries, this is more beneficial than preparing the query for each execution with the actual values.
Therefore, SAP systems use parameter markers especially for standard queries in an SAP ERP environment.
Testing SQL Queries with Parameter Markers
If we want to execute the query now with the parameter markers directly on the database, we will get an error:
$ db2 set current schema sapxxx
DB20000I The SQL command completed successfully.
$ db2 "select * from zopt_test where key1 = ? and key2 = ? and key3 = ?"
SQL0313N The number of variables in the EXECUTE statement, the number of
variables in the OPEN statement, or the number of arguments in an OPEN
statement for a parameterized cursor is not equal to the number of values
required. SQLSTATE=07004
So, we must replace the parameter markers with real values to be able to execute the query.
For example:
$ db2 "select * from zopt_test where key1 = 1 and key2 = 1 and key3 = 1"
KEY1 KEY2 KEY3 CONTENT
--------------------------------- ------ ------ ------ -----------
1 1 1 Some content.
1 record(s) selected.
Now, let’s return to our original question why a query is now suddenly faster when it’s executed with real values directly on the database. Before we can answer this question, we must take a closer look at distribution statistics for Db2 first and then at possible disadvantages of the usage of parameter markers.
Distribution Statistics
Db2 creates and stores statistics values about all tables in the database. The database statistics include information about the size of the tables and indexes, and they can also contain information about the distribution of certain values in a table.
To show you the effect of distribution statistics, I created test table ZOPT_TEST with three key columns (KEY1 to KEY3) and one content field. Key column KEY1 of table ZOPT_TEST has two different values: 1 and 2. The complete table ZOPT_TEST has 100 entries. If now I select all values from table ZOPT_TEST with KEY1 = 1, I get 90 rows back. And for KEY1 = 2 I get 10 rows back.
From the distribution statistics, Db2 can get the information how often a certain value is stored in the different columns of a table. With this information, Db2 can calculate more accurately how much data will be returned from a certain query.
Let’s take the following example:
Example: Access Plan from DBA Cockpit
Here you can see the access plan of query SELECT * FROM ZOPT_TEST WHERE KEY1 = 1 as it’s shown in the DBA Cockpit. To see the estimate of returned rows, we must look at the first number after an index or table name, num_rows. For table ZOPT_TEST, we see 9.0000E+01, that means 90 rows. So, Db2 correctly estimates the number of rows returned from this query.
Disadvantages of Parameter Markers
But what happens if we use parameter markers instead? Here’s the same access plan, but now with parameter markers:
Example: Access Plan with Real Values from DBA Cockpit
With parameter markers, Db2 estimates that only 50 rows (5.0000E+01) are returned if we have a restriction on KEY1. How is this possible?
The explanation is that with parameter markers, Db2 cannot use the distribution statistics. Therefore, it calculates using an even data distribution. In our example, Db2 knows that there are two distinct values in column KEY1 (1 and 2) and that the table has 100 rows. So, Db2 calculates with 50 rows if the query contains a restriction on column KEY1.
And exactly this fact can be the explanation why a query executed on the command line is suddenly faster than the query coming from the SAP code. We must add actual values to be able to execute the query. With real values, Db2 can use the distribution statistics. In some cases, this can lead to a better access plan because DB2 can estimate the amount of data returned from the query in more detail.
Conclusion
If you want to find out why a query with parameter markers is slow and if it’s possible to make it faster, a test with real values instead of the parameter markers can be a good idea. From this test and especially from the access plan chosen with real values, you can get helpful information about a possible good plan. Also, you might find out with this test why the optimizer is choosing the current non-optimal plan. However, you must also take the advantages of the parameter markers into account. In most cases, parameter markers are the best choice when a query is frequently executed. Therefore, you should look if, for example, a new index or changing an existing index will help Db2 to execute the query faster. Forcing the regeneration of the access plan with the real values inside, for example by using REOPT ALWAYS, is only needed in very rare cases.