In this blog post, I will share few Best practices guides you query performance tuning tips that one of your Data Analysts and other SQL users can apply to improve DBSQL Serverless performance.
In below suggested tips using one of the integrated landscape - SAP Datasphere integrated with SAP Business Data Cloud and SAP Databricks
The Databricks SQL Serverless engine powers SAP Databricks compute. Selecting the right warehouse size (S, M, L, etc.) and autoscaling policies ensures your queries have adequate compute without over‑provisioning. Proper resource optimization is foundational to performance.
Under the hood, SAP Databricks uses Databricks’ optimized open storage formats. By using Delta Lake, your queries benefit from columnar storage, data skipping, ACID transactions, and optimized IO pathways—dramatically boosting speed.
SELECT * FROM my_table will retrieve all the columns and rows from the given table and it is an Input-Output intensive operation for a large table.
Scanning unnecessary columns slows queries and inflates compute costs. Always specify the exact columns you need—especially when querying wide SAP analytical datasets.
Use integer and not to use String on Primary Key
Three essential housekeeping operations:
These drastically improve performance, especially with frequently refreshed SAP operational data.
ANALYZE collects the statistics of a specified table. These statistics are used by the DBSQL query optimiser to generate a better execution plan.
ANALYZE TABLE demo_19538.cashflow_dp.zcashflow COMPUTE STATISTICS;OPTIMIZE statement optimizes the layout of the Delta Lake data files. Optionally you can Z-Order the data to colocate the related data in the same set of files.
OPTIMIZE demo_19538.cashflow_dp.zcashflow;
OPTIMIZE demo_19538.cashflow_dp.zcashflow ZORDER BY (COMPANYCODE);VACUUM removes unused files from the table directory. It removed all the data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold. The default threshold is 7 days. If you run Vacuum on a delta table, you lose the ability to time travel back to the older versions of the table but it helps in reducing the storage cost.
VACUUM demo_19538.cashflow_dp.zcashflow DRY RUN;
VACUUM demo_19538.cashflow_dp.zcashflow;Databricks provides rich diagnostics:
Use these tools regularly to spot bottlenecks in SAP data models and adjust your SQL accordingly
It is very important to explore and analyse your query plan before executing it. This enables you to understand how the code will actually be executed and is useful for optimising your queries. The DBSQL optimiser automatically generates the most optimised physical plan which is then executed.
EXPLAIN
SELECT
COMPANYCODE, count(CASHFLOW) AS TotalTrip
FROM
demo_19538.cashflow_dp.zcashflow
GROUP BY COMPANYCODE;Partition SAP datasets carefully:
This reduces shuffle operations and speeds aggregation-heavy SAP reporting queries.
Caching can cut repeated query runtimes by 50–70%, especially in BI scenarios. Databricks caching options include:
cache() / persist())Monitor cache hit rates and pre‑warm caches for SAP dashboards.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
| User | Count |
|---|---|
| 94 | |
| 80 | |
| 30 | |
| 28 | |
| 25 | |
| 22 | |
| 21 | |
| 21 | |
| 20 | |
| 20 |