Technology Blog Posts by SAP
cancel
Showing results for 
Search instead for 
Did you mean: 
Yogananda
Product and Topic Expert
Product and Topic Expert
1,349

In this blog post, I will share few Best practices guides you query performance tuning tips that one of your Data Analysts and other SQL users can apply to improve DBSQL Serverless performance.

2026-02-23_18-47-07.png

In below suggested tips using one of the integrated landscape - SAP Datasphere integrated with SAP Business Data Cloud and SAP Databricks2026-02-22_13-18-30.png

Choose the Right Warehouse Size & Enable Autoscaling

The Databricks SQL Serverless engine powers SAP Databricks compute. Selecting the right warehouse size (S, M, L, etc.) and autoscaling policies ensures your queries have adequate compute without over‑provisioning. Proper resource optimization is foundational to performance.

2026-02-22_12-56-11.png

Optimize Storage Using Delta Lake

Under the hood, SAP Databricks uses Databricks’ optimized open storage formats. By using Delta Lake, your queries benefit from columnar storage, data skipping, ACID transactions, and optimized IO pathways—dramatically boosting speed.

Avoid SELECT *

SELECT * FROM my_table will retrieve all the columns and rows from the given table and it is an Input-Output intensive operation for a large table.

Scanning unnecessary columns slows queries and inflates compute costs. Always specify the exact columns you need—especially when querying wide SAP analytical datasets.2026-02-22_12-57-29.png

Integer vs String Data Types

Use integer and not to use String on Primary Key

2026-02-22_13-00-50.png

2026-02-22_13-01-39.png

Optimize Data Layout (Statistics, OPTIMIZE, VACUUM)

Three essential housekeeping operations:

  • ANALYZE TABLE collects table statistics for better planning.
  • OPTIMIZE compacts small files and organizes data.
  • VACUUM cleans unused files to improve scan speed.

These drastically improve performance, especially with frequently refreshed SAP operational data.

ANALYZE collects the statistics of a specified table. These statistics are used by the DBSQL query optimiser to generate a better execution plan.

ANALYZE TABLE demo_19538.cashflow_dp.zcashflow COMPUTE STATISTICS;

OPTIMIZE statement optimizes the layout of the Delta Lake data files. Optionally you can Z-Order the data to colocate the related data in the same set of files.

OPTIMIZE demo_19538.cashflow_dp.zcashflow;
OPTIMIZE demo_19538.cashflow_dp.zcashflow ZORDER BY (COMPANYCODE);

2026-02-22_13-04-39.png

VACUUM removes unused files from the table directory. It removed all the data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold. The default threshold is 7 days. If you run Vacuum on a delta table, you lose the ability to time travel back to the older versions of the table but it helps in reducing the storage cost.

VACUUM demo_19538.cashflow_dp.zcashflow DRY RUN;
VACUUM demo_19538.cashflow_dp.zcashflow;

Use Query Profiles & EXPLAIN Plans

Databricks provides rich diagnostics:

  • Query Profile highlights slow operators, joins, and shuffles.
  • EXPLAIN FORMATTED shows the logical and physical execution plan.

Use these tools regularly to spot bottlenecks in SAP data models and adjust your SQL accordingly

It is very important to explore and analyse your query plan before executing it. This enables you to understand how the code will actually be executed and is useful for optimising your queries. The DBSQL optimiser automatically generates the most optimised physical plan which is then executed.

EXPLAIN 
SELECT
COMPANYCODE, count(CASHFLOW) AS TotalTrip
FROM
demo_19538.cashflow_dp.zcashflow
GROUP BY COMPANYCODE;

2026-02-22_13-07-14.png

Reduce Shuffle Volume with Better Partitioning

Partition SAP datasets carefully:

  • Avoid too many small partitions.
  • Avoid over‑partitioning tables under 1TB.
  • Use partitioning only when your access patterns require it.

This reduces shuffle operations and speeds aggregation-heavy SAP reporting queries.

Use Caching Strategically

Caching can cut repeated query runtimes by 50–70%, especially in BI scenarios. Databricks caching options include:

  • Delta Cache on SSD-backed nodes
  • DataFrame caching (cache() / persist())

Monitor cache hit rates and pre‑warm caches for SAP dashboards.

2 Comments