cancel
Showing results for 
Search instead for 
Did you mean: 

Why include a primary key in a composite index?

0 Kudos
4,849

We have a composite index of three columns (date, time, pk) where pk is the primary key.

The pk column has its own index so why include pk in the composite index?

Composite index documentation

Accepted Solutions (1)

Accepted Solutions (1)

Former Member

Including PK in the index is unlikely to be of much use. To be used as a matching predicate to facilitate indexed retrieval, your query must contain a sargable predicate against the PK column. If the predicate is equality, eg PK = <value>, then the PK index alone will serve you well. If an inequality, eg PK > <value> or a range, eg. PK BETWEEN <value> AND <value> then the index may be useful, but of course only if the WHERE clause also contains strict equality conditions with the date and time values. In my experience, the latter (strict equality on date/time) is fairly rare, and the former (range comparisons on PK values) is also rare, particularly if the PK values are surrogate values.

I would expect, in the main, for applications to reference rows using PK equality conditions, or ranges of date/time values, neither of which would be helped by this composite index; an index on the date/times alone would be sufficient, in addition to the PK index that is already extant.

Two further points:

  1. You can try SQL Anywhere's index consultant to see what it would recommend as indexes for your workload.
  2. As SQL Anywhere now supports index-only retrieval, it can make sense to include additional columns in the index so that the server can answer the query directly using the index alone, without referring to the underlying rows in the table. However, even with index-only retrieval I don't think the column order in your composite index is all that helpful.
VolkerBarth
Contributor
0 Kudos

Just to understand: As for the typical date/time range queries (say "All entries that were inserted yesterday/24 hours ago" and the like), a query on them would usually make use of an according index, right?

Former Member

Probably, but because the date/time is two separate columns (rather than a TIMESTAMP) only the date column will be used if the query contains an inequality condition that references the date field.

VolkerBarth
Contributor
0 Kudos

I see - so my recommendation to use one timestamp field instead of two separate columns (cf. my answer) makes sense:)

0 Kudos

Glenn, thank you. Including the pk seems redundant considering your comments and the fact that we don't narrow queries further by pk if we're querying by (equality or range of) date and time. In our table other columns now seem much better candidates, particularly when considering index-only retrieval.

Any disadvantages to leaving pk in? Does it affect physical size or layout of index?

The index consultant was silent on the queries I tried.

Former Member
0 Kudos

Disadvantages? Sure - increased index key size means fewer entries per page, and the reduced fanout may require the index to be an extra level, which would mean a (potential) additional extra I/O for every index lookup.

The index would also have to be maintained if the PK value changes, though this is unlikely since it is rarely if ever good practice to change primary keys.

As for the index consultant - the index consultant does not make recommendations about PK or FK indexes. Were any indexes listed as "unused" after you ran your workload?

Answers (1)

Answers (1)

VolkerBarth
Contributor

Well, I would think that particular index might make sense if the following sentence (taken from the doc page you refered to) is true:

A composite index is useful if the first column alone does not provide high selectivity.

In your case, it would make sense if there are lots of entries with identical date and time values, and you would frequently select those and have to filter (or order) them further based on the pk. But then the index order would be important, too - i.e. if you sort by date/time chronologically and would use the pk in ascending order, too, the ASC/DESC order in the index columns should be set accordingly.

However, I would expect that an index without the pk column would be nearly as good for that particular situation, as the engine could additionally use the default pk index.

In case you don't use that particular filtering/sorting, I would recommend an index without the pk column (and would usually recommend to use a datetime/timestamp field instead of separate date/time columns).

As always: Some according tests with typical workloads (including looking at the plans) should show the advantages/disadvantages of the different index implementations. You might also use the Index Consultant.

0 Kudos

Volker, thanks for your response. We do have many identical date time values. We query by date-time, but don't filter further by pk in the query, so I wondered if including the pk had some desirable effect on the index behind the scenes. I'll test it without the pk as you suggest.