on 2011 Dec 31 7:44 AM
I have a table like described below with about 1E6 records.
create table SXE_QUOTE ( isin varchar(12) not null, mic varchar(4) not null, period_start timestamp not null, period_end timestamp not null, ticks_count smallint not null, -- some other fields .... constraint SXE_QUOTE_PK primary key (isin, mic, period_start) , constraint SXE_QUOTE_U1 unique (isin, mic, period_end) , constraint SXE_QUOTE_C01 check (period_start <= period_end) );
And I select last period_end for each group <isin, mic=""> with a select:
select q.isin as @q_isin, q.mic as @q_mic, max(q.period_end) as @q_period_end from sxe_quote q group by q.isin, q.mic
I took a look at execution plan and was appalled, because I have expected, that the index SXE_QUOTE_U1 will be used, but instead I saw only a parallel table scan.
So the question for me: why index is not used? And it's possible at all by ASA 11?
Request clarification before answering.
Here's another wild guess; this query appears to work OK in V11.0.1, but whether or not it works BETTER on millions of rows is anyone's guess:
create table SXE_QUOTE ( isin varchar(12) not null, mic varchar(4) not null, provider smallint not null, period_start timestamp not null, period_end timestamp not null, ticks_count smallint not null, -- some other fields .... constraint SXE_QUOTE_PK primary key (isin, mic, provider, period_start) , constraint SXE_QUOTE_U1 unique (isin, mic, provider, period_end) , constraint SXE_QUOTE_C01 check (period_start <= period_end) ); INSERT sxe_quote VALUES ( 'A', 'A', 1, '2001-01-01', '2001-01-01', 0 ); INSERT sxe_quote VALUES ( 'A', 'A', 1, '2001-01-02', '2001-01-02', 0 ); INSERT sxe_quote VALUES ( 'A', 'A', 2, '2001-01-03', '2001-01-03', 0 ); INSERT sxe_quote VALUES ( 'A', 'A', 2, '2001-01-04', '2001-01-04', 0 ); INSERT sxe_quote VALUES ( 'A', 'A', 3, '2001-01-05', '2001-01-05', 0 ); INSERT sxe_quote VALUES ( 'A', 'A', 3, '2001-01-06', '2001-01-06', 0 ); COMMIT; SELECT isin, mic, provider, period_end FROM ( SELECT isin, mic, provider, period_end, rank ( ) OVER partition_window AS entry_rank FROM sxe_quote WINDOW partition_window AS ( PARTITION BY isin, mic, provider ORDER BY period_end DESC ) ) AS ranked_sxe_quote WHERE entry_rank = 1 ORDER BY isin, mic, provider; isin,mic,provider,period_end 'A','A',1,'2001-01-02 00:00:00.000' 'A','A',2,'2001-01-04 00:00:00.000' 'A','A',3,'2001-01-06 00:00:00.000'
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Many thanks. That shows great promise and looks much better as my current workaround. Just to fullfil the whole bild:
-- do it only first time and keep it select distinct isin, mic, provider into tt from sxe_quote; -- select tt.isin, tt.mic, tt.provider, (select max(s.period_end) from sxe_quote s where s.isin = tt.isin and s.mic = tt.mic and s.provider = tt.provider) as period_end from tt order by tt.isin, tt.mic, tt.provider;P.S. SQL might be incorrect, because I wrote it blindly, and only to show the idea behind it. Anyway, such optimization strategy might be a very useful improvement for ASA 13/14.
FWIW the following Help topic may hold a clue about way MAX() and GROUP BY are handled (or not handled) when it comes to optimization: Optimization for MIN and MAX functions http://dcx.sybase.com/index.html#1100/en/dbusage_en11/queryopt-sect2-5356394.html
The clue is this statement: "must not contain a GROUP BY clause".
Thanks. Apparently this restriction isn't dictated by a nature of applied sql pattern but rather by a product. And after some test I found out, that SA miss some usefull optimization strategies applicable on an index. AFAIK some competitors can use indexes much ofter to extract partial key values from an index.
http://publib.boulder.ibm.com/infocenter/iseries/v7r1m0/index.jsp?topic=%2Frzajq%2Frzajqgroupidx.htm
So I have just to hope that this gap isgoing to be closed in one of next releases.
I tested suggested optimization with ASA 11. Sadly I had no luck. I couldn't even get detailed execution plan. Hier is execution plan based on estimation: http://www.megafileupload.com/en/file/351427/not-better-saplan.html
FWIW, Breck has asked for an according product enhancement in his blog:
SXE_QUOTE_U1 is a unique constraint on (isin, mic, period_end). That seems like a reasonable candidate for an index-based access plan to me since index entries would be clustered by (isin, mic) and the engine just needs to find the highest period_end for each unique (isin, mic). One of the query folks would know better but I'd bet the choice of plan was just based on heuristics in the cost-based optimizer and might also be related to the size of the index hash for varchar(12)+varchar(4) (which would also depend on the database collation). A different plan might be chosen if more of the table pages is in cache or if calibration has been performed. I'm just a guessing, really. As I said, a query team member would know better.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
The query you posted originally:
select q.isin as @q_isin, q.mic as @q_mic, max(q.period_end) as @q_period_end from sxe_quote q group by q.isin, q.mic
matches the index : constraint SXE_QUOTE_U1 unique (isin, mic, period_end)
but, query in your bad_plan.saplan is different, has an extra column ("provider")in SELECT stmt and GROUP BY:
select isin, mic, provider, max(period_end) from sxe_quote group by isin, mic, provider
have you tried to create an index that will actually match this e.g. : constraint SXE_QUOTE_U2 unique (isin, mic, provider, period_end)
and then see what plan/performance you get.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I posted my first question in December 2011, and since that the data model has undergone some changes, that I haven't mentioned, and indeed, a primary key was extended by a column 'provider', but also unique index U1 has the same extension. I guess, you can just oversee that - it's really negligible.
"it's really negligible" ...are you trying to make it easy, or hard, for folks to help you? If your answer is "hard" then just keep going like you have been... there is no risk or obligation to you, there is a Money Back Guarantee.
If, on the other hand, you really want help, try posting your query as a new question, with the matching plan, and the matching CREATE TABLE.
First you should make sure, that you have the latest EBF applied and that you have refreshed your statistics (see Create Statistics).
Multiple ideas come to my mind:
1) Based on the data I would not expect to much from the optimizer, the optimizer doesn't have the knowledge, that only 16 of all isin+mic tuples are distinct. So maybe what you can do is split your table into two, one having just the isin+mic as a unique entry and then a foreign key relationship to the second table with the period_end entries. Then the optimizer wouldn't have to find the distinct tuples together with the max(period_end) but instead can concentrate on that desired task.
2) As your data rate seems rather high an alternative could be to use insert triggers and update an already aggregated table with just the distinct isin+mic + the max(period_end), so that don't have to visit the SXE_QUOTE table for your query at all.
3) Last but not least have you tried to force the usage of the index?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Just some wild guesses:
a) To your first suggestion: Would an index solely on (isin, mic) be appropriate (though John's reasoning would imply that the existing index on SXE_QUOTE_U1 would do here as well).
b) To your second suggestion: An immediate materialized view would come to mind as a "blackbox" implementation of your approach.
Now we are waiting on mikron to check these alternatives:)
Last EBF (revision 2803) was applied before. 1) I thought that an index has a distribution statistic. And I would expect, that at least after first full table scan statistics get updated.
2) table data is very volatile and get updated 5 times per second. And select mentioned before will be executed only 6 times per hour. So I gess, the total cost shoulb be more with a mat. view.
3) No, i didn't right now but I give them a try.
I have a close look on documentation about optimization methods in execution plan and haven't saw any proper optimization method. May be I'm spoiled by other RDBMS vendors.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I had already posted saplan. Hier again:
http://www.megafileupload.com/en/file/350815/bad-plan-zip.html
User | Count |
---|---|
75 | |
30 | |
9 | |
8 | |
7 | |
6 | |
6 | |
5 | |
5 | |
5 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.