I am browsing through the MaxDB 7.4.03.32 source code downloaded at the [SAPDB website|http://www.sapdb.org/7.4/develop/dev_linux.htm|http://www.sapdb.org/7.4/develop/dev_linux.htm].
I need to understand how MaxDB estimates the selectivity for a given predicate (of the form <column name> <less than|equal to|...> <value>). As far as I know, the optimizer uses the (B+-tree of the) index of the column to estimate selectivity and uses this selectivity estimate in choosing an appropriate plan. I want to see how exactly this is done. I also want to see how the several plans are generated, how one of them wins and how the outputs of the three EXPLAIN vaiants are calculated.
I looked in the folder SAPDB_ORG/sys/src/SAPDB, especially the cpp files in the folder DataAccess, which had some details of how the statistics are calculated by sampling. However, I could not find the source code that answers the above questions. The other folders in SAPDB seemed irrelevant to my question (judging by the name) and the other folders in src have cryptic names that do not give a clue. I did an egrep on the entire src folder looking for "purpose:optim*" ignoring case without finding anything.
Am I looking in the wrong place, or doing it the wrong way?
I am trying the following experiment -
For a query with joins and several predicates, I choose some predicate, of the form <colname> less than <value>, and vary <value> so that the selectivity of the predicate varies from 10% to 100% of the rows of the table. I now compare the plans chosen by the optimizer for different selectivities.
So, if I want a selectivity of x%, I must use a <value> v such that given the predicate <colname> less than v, MaxDB will calculate the selectivity as x% (never mind if the 'correct' <value> was something else). So, I need to know how the MaxDB optimizer calculates the selectivity of predicates so that I can mimic that behavior.
Btw, I read on Wikipedia that MaxDB 7.6 and later are closed source? Is that correct?
> Am I looking in the wrong place, or doing it the wrong way?
Both of it.
The internal structure of MaxDB is pretty complex and components that belong together from a functional point of view are located at different folders.
On [http://home.snafu.de/~dittmar/] you can find a old description of the folder/component mapping.
If this helps to understand how the numbers you see are made up - I'm pretty sure that it won't.
The other point here is: the source code available to you is very old. MaxDBs optimizer has much evolved since then. So whatever you learn about it in version 7.4 - it has nearly no relevance for 7.6 and higher versions.
Although I really understand that it can be interesting to see how this works, I don't see what this information should help you with.
Basically the "costs" of any plan are meaningless in general and only relate to one single sql.
So, to say "SQL A with costs of 3" is three times faster than "SQL B with costs of 9" is nonsense.
SQL B could be faster, run at the same speed or be slower than SQL A.
Knowing how the optimizer makes up the numbers does not make you write better SQL for MaxDB.
If you have any specific questions, post them and we'll try to answer them.