cancel
Showing results for 
Search instead for 
Did you mean: 

Joining vs calling a deterministic function

2,261

11.0.1.2467

I'm coding a web-based application where a lot of selects typically go like this=

select cols from work_table bt 
      inner join this_user_table user1 on user1.col1 = bt.col1
      inner join this_language_table l on l.col = bt.col2
      inner join this_workflow_table w on w.col = bt.col3
      left outer join this_user_table user2 on user2.col2 = bt.col4

I only need one column from each of the joined tables.

Most work_tables have typically less than 5000 rows

  • this_user_table: about 10.000 rows
  • this_language_table: about 7500 rows
  • this_workflow_table < 10 rows

The columns in the join conditions are all primary or foreign keys, so they're indexed.

From what I gather based on incomplete production data, the values in the join conditions tend to repeat themselves quite a bit. For instance, there may be 10.000 users, but only 100 or so satisfy the condition, and about a dozen users cover about 80% of all cases. The same applies to the language table.

I need to generate database views with the selects above for each of my work tables.

In terms of overall performance, what's better in your expert opinion?

Views with joins as above or inlined deterministic functions on the work table to retrieve the column I need from each of the other three tables?

View Entire Topic
Former Member

SQL user-defined functions add considerable overhead to computation that each and every query at execution time must pay for. In addition, query optimizers (including SQL Anywhere's) are in the main unable to effectively optimize queries containing user-defined functions because it is exceedingly complicated to ensure equivalent semantics under all conditions with even basic rewritings. An exception with SQL Anywhere (at least) is function in-lining, but this can only be done if the UDF contains only a SELECT statement with no other procedural logic.

So I would caution against using UDFs in "ordinary" queries without doing some performance testing. In some cases I can envision the use of a UDF leading to performance gains because the overall statement then breaks down into separate components, making it easier for the query optimizer to find an efficient plan. But these situations would, in my view, occur rarely.

I have written about some of the tradeoffs of SQL rewritings along these lines in this article on my blog.