cancel
Showing results for 
Search instead for 
Did you mean: 

Are duplicate values for IN problematic?

MCMartin
Participant
2,566

having a query like this:

select * from T where ID in (1,1,1,1,1)

does it have any negative effect, if inside the IN list an entry appears multiple times like in the example above.

Accepted Solutions (1)

Accepted Solutions (1)

Former Member

In a word, no. The predicates ID in (1,1,1,1) and ID in (1) are semantically equivalent.

The server software automatically eliminates duplicates in IN-lists where the size of the list is "sufficiently small" that we can incur the overhead of the sort (to eliminate the duplicate values) at compile time. This can only be done if all of the IN-list elements are known at optimization time - they must either be literal values, variables, or hostvars.

VolkerBarth
Contributor
0 Kudos

What about IN with a subquery that returns duplicate values - like this one:

select * from T where ID in ((select SomeNonUniqueColumn from T1))

Can I conclude that the optimizer will eliminate duplicates in "SomeNonUniqueColumn" as well (treating this somewhat like "select distinct SomeNonUniqueColumn")?

VolkerBarth
Contributor
0 Kudos

Interesting test with v12.0.1.3726: When using

select * from T where ID in
  ((select SomeNonUniqueColumn from T1 where SomeNonUniqueColumn = 1))

this obviously isn't treated as

select * from T where ID in ((select 1 from T1))
Breck_Carter
Participant
0 Kudos

Why would it be? What if SomeNonUniqueColumn = 1 doesn't exist in T1?

Former Member
0 Kudos

No. The duplicate elmination is performed only when the IN-list contains only constant values (or values from variabies that are known and fixed at compile time).

Former Member
0 Kudos

Correct, though we could enhance the product to do so if we found this to be a construction commonly generated by SQL generators. We have done so for other types of syntactic constructions (eg ISNULL(X,X) is one of these) but not this one.

VolkerBarth
Contributor
0 Kudos

So would it make sense to use "in ((select distinct ...))" in such cases?

VolkerBarth
Contributor
0 Kudos

Yes, I agree - it surely does exist in my tests but that can't be generalized. (And no, I won't delete my somewhat dumb question:))

Former Member

The optimizer has a number of rewrite and execution-time strategies for computing the result of nested queries, and it knows that, semantically, it doesn't matter if the subquery contains DISTINCT or not. So it doesn't really matter if you include DISTINCT or not in the original statement.

VolkerBarth
Contributor
0 Kudos

Thanks for the clarification: So my underlying general assumption "The optimizer is smart enough to handle duplicates in subqueries for EXISTS/IN tests as well" seems to hold. As expected: "You do the hard part."

Answers (1)

Answers (1)

Former Member

Indeed, I second the motion. Duplicates in an IN clause are NOT a problem.

However, I can say from experience that an IN clause on large number of items performs worse than an equivalent join ..

So ..

instead of doing:

select * from DataTable where ID in ( select ID from LookupTable )

try this instead:

select dt.* from ( select distinct ID from LookupTable ) x, DataTable dt where x.ID = dt.id