cancel
Showing results for 
Search instead for 
Did you mean: 

Source code for SQL Optimizer of MaxDB 7.4

Former Member
0 Kudos

Hello,

I am browsing through the MaxDB 7.4.03.32 source code downloaded at the [SAPDB website|http://www.sapdb.org/7.4/develop/dev_linux.htm|http://www.sapdb.org/7.4/develop/dev_linux.htm].

I need to understand how MaxDB estimates the selectivity for a given predicate (of the form <column name> <less than|equal to|...> <value>). As far as I know, the optimizer uses the (B+-tree of the) index of the column to estimate selectivity and uses this selectivity estimate in choosing an appropriate plan. I want to see how exactly this is done. I also want to see how the several plans are generated, how one of them wins and how the outputs of the three EXPLAIN vaiants are calculated.

I looked in the folder SAPDB_ORG/sys/src/SAPDB, especially the cpp files in the folder DataAccess, which had some details of how the statistics are calculated by sampling. However, I could not find the source code that answers the above questions. The other folders in SAPDB seemed irrelevant to my question (judging by the name) and the other folders in src have cryptic names that do not give a clue. I did an egrep on the entire src folder looking for "purpose:optim*" ignoring case without finding anything.

Am I looking in the wrong place, or doing it the wrong way?

~maximus

Accepted Solutions (0)

Answers (2)

Answers (2)

Former Member
0 Kudos

Hi Lars,

I am trying the following experiment -

For a query with joins and several predicates, I choose some predicate, of the form <colname> less than <value>, and vary <value> so that the selectivity of the predicate varies from 10% to 100% of the rows of the table. I now compare the plans chosen by the optimizer for different selectivities.

So, if I want a selectivity of x%, I must use a <value> v such that given the predicate <colname> less than v, MaxDB will calculate the selectivity as x% (never mind if the 'correct' <value> was something else). So, I need to know how the MaxDB optimizer calculates the selectivity of predicates so that I can mimic that behavior.

Btw, I read on Wikipedia that MaxDB 7.6 and later are closed source? Is that correct?

regards,

maximus

lbreddemann
Active Contributor
0 Kudos

Hi Maximus,

I've got to think about your questions and no time for that now.

I'll come back on that later.

Anyhow, just creating the testcases you mentioned should give a good first insight into the join optimization.

Try to follow the necessary decisions: 1. Possible join orders, 2. Possible access strategies for the join orders.

> Btw, I read on Wikipedia that MaxDB 7.6 and later are closed source? Is that correct?

Yes, that's correct. MaxDB 7.6 Build 36 was the last version for which the source code was downloadable from the MySQL website.

regards,

Lars

Former Member
0 Kudos

Hi there,

Didn't find this post earlier.

Actually I am working on similar line and would like to know how predicate selectivity is handled by MAXDB Optimizer from v.7.6 onwards

lbreddemann
Active Contributor
0 Kudos

> Actually I am working on similar line and would like to know how predicate selectivity is handled by MAXDB Optimizer from v.7.6 onwards

There is no general selectivtiy for predicates (as it is in Oracle when using Bind-variables w/o peeking).

The selectivity of predicates is estimated by sampling a bit of the table B*tree.

That's all that can be known outside the MaxDB development.

If you've got specific questions you may post them and we'll see whether we can discuss them.

As MaxDB is not Open Source anymore, there is no way to discuss kernel source code here.

Anyhow, what makes me wonder is: 'working on the inner workings of the MaxDB optimizer' - what kind of assignement is that for someone who's not actually developing that stuff?

Wouldn't it be more productive to 'work on the optimization of some SQL' instead and post questions about that?

I'm just curious here - what would you do with that knowledge?

Just like in other DBMS the optimizer is constantly changed (hopefully improved) and therefore there is never the situation where one can say: "Now I know exactly what the optimizer will do and I can tune my SQL with that knowledge".

There are books about the optimizer in e.g. Oracle and even after reading these books people cannot know everything about the optimizer (e.g. how exactly does the sampling works?).

regards,

Lars

lbreddemann
Active Contributor
0 Kudos

> Am I looking in the wrong place, or doing it the wrong way?

Both of it.

The internal structure of MaxDB is pretty complex and components that belong together from a functional point of view are located at different folders.

On [http://home.snafu.de/~dittmar/] you can find a old description of the folder/component mapping.

If this helps to understand how the numbers you see are made up - I'm pretty sure that it won't.

The other point here is: the source code available to you is very old. MaxDBs optimizer has much evolved since then. So whatever you learn about it in version 7.4 - it has nearly no relevance for 7.6 and higher versions.

Although I really understand that it can be interesting to see how this works, I don't see what this information should help you with.

Basically the "costs" of any plan are meaningless in general and only relate to one single sql.

So, to say "SQL A with costs of 3" is three times faster than "SQL B with costs of 9" is nonsense.

SQL B could be faster, run at the same speed or be slower than SQL A.

Knowing how the optimizer makes up the numbers does not make you write better SQL for MaxDB.

If you have any specific questions, post them and we'll try to answer them.

regards,

Lars