on ‎2017 Apr 06 7:41 AM
I am joining two tables on HANA and, to get some statistics, I am LEFT joining the items table 3 times to get a total count, number of entries processed and number of errors, as shown below.
This is a dev system and the items table has only 1500 items. But the query below runs for 17 seconds. When I remove any of the three aggregation terms (but leave the corresponding JOIN in place), the query executes almost immediately.
I have also tried adding indexes on the fields used in the specific JOINs, but that makes no difference.
Also, the query is only slow if, for the two latter aggregations, ( COUNT DISTINCT RP2.GUID and COUNT DISTINCT RP3.GUID), there are actually records. When the count is 0 (because no items have yet been selected), the query executes quickly.
I am attaching the execution trace for this statement as well: xmlplv.txt (Just change the extension).
select rk.guid, rk.run_id, rk.status, rk.created_at, rk.created_by,
count( distinct rp.guid ),
count( distinct rp2.guid ),
count( distinct rp3.guid )
from zbsbpi_rk as rk
left join zbsbpi_rp as rp
on rp.header = rk.guid
left join zbsbpi_rp as rp2
on rp2.header = rk.guid
and rp2.processed = 'X'
left join zbsbpi_rp as rp3
on rp3.header = rk.guid
and rp3.result_status = 'E'
where rk.run_id = '0000000010'
group by rk.guid, run_id, status, created_at, created_by
This is the execution plan of the long-running statement.

This is th execution plan when taking out one of the aggregations:

Request clarification before answering.
You might want to check with explain plan or planviz what's happening, when your query runs.
Gut feeling tells me, that the triple outer join is not the best approach here - especially when done before aggregation.
Could be a good idea to aggregate first and join later...
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thanks for the reply, Lars. The funny thing is that, as explained, when I take any of the aggregations out, but leave the joins in, it's quick. I just attached the execution trace (I don't know how to read it).
How would you aggregate first and join later? Can you give me an example of what you mean? Thanks, Martin
The first join (the one you simply changed into an INNER JOIN, which changes the meaning of your data model) joins over your grouping column.
So instead of
select rk.guid, rk.run_id, rk.status, rk.created_at, rk.created_by,count(distinct rp.guid )
from zbsbpi_rk as rk
left outer join zbsbpi_rp as rp
on rp.header= rk.guid;You could do
select rk.guid, rk.run_id, rk.status, rk.created_at, rk.created_by, rp.cnt_guid
from zbsbpi_rk as rk
left outer join
(SELECT header, count(distinct guid) as cnt_guid)
FROM zbsbpi_rp
GROUP BY header) rp
on rp.header= rk.guid;
| User | Count |
|---|---|
| 15 | |
| 9 | |
| 6 | |
| 5 | |
| 4 | |
| 4 | |
| 3 | |
| 2 | |
| 2 | |
| 2 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.