cancel
Showing results for 
Search instead for 
Did you mean: 

Performance issue with mix of left outer and inner joins

1,424

Hi All,

I am working on a performance issue involving a select query with a mix of left outer join and inner joins.

SQLA configuration:

OS: Windows server 2012 R2 standard
Page size: 8K
SQL Anywhere version: 12.0.1.4124
Edition: workgroup
Cache size: 80 GB

Below is the query where I have replaced the original table names with alternate names as I am not allowed to share the original query:

Query:

SELECT ltbl2.col1,
       ltbl1.*,
       ltbl3.col1,
       ltbl3.col2,
       view1.col1,
       view1.col2,
       view1.col3
FROM   ltbl1 left outer join view1 on ltbl1.key1 =  view1.key, 
       ltbl2, 
       ltbl3
WHERE (ltbl1.key2 = ltbl3.key ) AND
      ( ltbl1.key3 = ltbl2.key ) AND
      ( ltbl3.col1 not in ( 1,6 ) ) AND
      ( ltbl1.date_column is not null ) AND
      ( ltbl1.key2 = 600260151039) AND
      ( convert(varchar(1), ltbl1.col1) <> '5')

Row counts:

ltbl1 - 13989140
ltbl2 - 329159
ltbl3 - 607

Issue Description:

Scenario 1: When we include only the first left outer join between table - ltbl1 and view - view1, the optimizer estimate - "RowsReturned" is approximately 80% correct (estimate 7054, actual 8936) for table - ltbl1.

Scenario 2: If we add the second table - ltbl2 with an inner join into the first left outer join, the estimate of the - "RowsReturned" deteriorate very quickly to almost 0% (estimate 63, actual 8936).

Scenario 3: If we add the third table - ltbl3 with an inner join into the previous mix of left outer join and inner join the estimate - RowsReturned further deteriorates to 34 vs 8936.

When I view the query plan in all the three scenarios, I can see that the optimizer chooses to use an index scan. The index happens to be the one created for the foreign key constraint between ltbl1 and ltbl2. The index statistics section of all three plans shows the estimate and actual for RowsReturned are same.

The post scan predicates and scan predicates section also shows column and join statistics being used and they evaluate to true 100%.

I can see that the inner join is messing up the estimate but unable to answer why the optimizer estimate for the largest table - ltbl1 continues downhill when inner joins are added.

NOTE: If we change the inner joins for the remaining two table(s) ltbl2 and ltbl3 to a left outer join the optimizer estimates for ltbl1 RowsReturned is back to normal (estimate 7054, actual 8936).

Could someone please help me understand what is happening when we add inner joins into a left outer join and the effect of it on the optimizer estimates?

Many thanks.

Accepted Solutions (0)

Answers (1)

Answers (1)

awitter
Participant
0 Kudos

The where clausule has this: ltbl3.col1 not in ( 1,6 )

This means it canot use the index optimally (at least not as expected) (can be rewritten as ltbl3.col1 <> 1 and ltbl3.col1 not in ( 1,6 ) <> 6 )

Maybe start cleaning up your query: SELECT ltbl2.col1, ltbl1.*, ltbl3.col1, ltbl3.col2, view1.col1, view1.col2, view1.col3 FROM ltbl1 inner join ltbl2 on ltbl2.key = ltbl1.key3 inner join ltbl3 on ltbl3.key = 600260151039 and ltbl3.col1 not in ( 1,6 ) left outer join view1 on view1.key = ltbl1.key1 WHERE ( ltbl1.date_column is not null ) AND ( ltbl1.key2 = 600260151039) AND ( convert(varchar(1), ltbl1.col1) <> '5')

Now you see that the nner join to ltbl3 really isn't an inner join, but just an added table with parameters. so I still wonder if the joins as stated are correct (key2 and key 3 interchanged?):

FROM   ltbl1 
       inner join ltbl2 on ltbl2.key = ltbl1.key3
       inner join ltbl3 on ltbl3.key = ltbl1.key2 and ltbl3.col1 not in ( 1,6 )
       left outer join view1 on view1.key = ltbl1.key1

Also, have you tried to reset the statistics for these tables? After a few inserts and deletes it might be that the statistics are out-of-line. look for statement Reset statistics...