cancel
Showing results for 
Search instead for 
Did you mean: 

GUID as Primary Key

Former Member
5,027

I see heated discussion all over the internet about whether a GUID is a poor choice for a primary key. Usually, these conversations all revolve around SQL Server.

A lot of people say that the size difference is an issue. (GUID = 16bytes, BIGINT = 8bytes, INT = 4bytes). Is this due to physical HDD space, or page space (fragmentation?)

For sake of conversation, lets say we have a 'PEOPLE' table:

create table "people"
(
     ID            uniqueidentifier  not null default newid(),
     first_name    varchar(100),
     last_name     varchar(100),
     ssn           varchar(9)    
);

alter table "people"
    add constraint PK_PEOPLE primary key (ID);

How bad of a design is that over using a BIGINT or an INT?

We have a lot of new development going on in a legacy application, and we are tossing GUIDs on every table. We do have to take into account replication (which is why we went with GUIDs as opposed to INTs and 'Hold Keys', as we call them)

Thanks!
Calvin

Accepted Solutions (1)

Accepted Solutions (1)

VolkerBarth
Contributor

I would generally prefer GLOBAL AUTOINCREMENT over GUIDs as the latter should perform much worse - they are stored as BINARY(16) which won't perform as fast as 32 or 64 bit numbers on common platforms.

So, I personally agree with the reasoning in the docs. I would add that for many applications, even in a replication setup, INT may still be enough for a PK type if there are "only" some hundred remotes and you expect, say only a few ten thousand rows per table. That will do in many applications where data is added manually.

E.g., a few years ago, we have ported a client-server app with one central database into a SQL Remote setup by simply redefining the PK type (int) to a DEFAULT GLOBAL AUTOINCREMENT with a 200000 partition size. That gives use the opportunity to support some thousand remotes - much more than we have now. And the apps (and all the FKs) hadn't to be changed at all as they still work with INT fields. Needless to say, I was really fond of that easy migration:)

Of course YMMV.

In addition to Breck's response, I even prefer comparing INT/BIGINT values starting with 1 to those starting with higher values. At the moment, I have to check the migration of a system with miminal 8digit PK values, and that is really harder than with smaller values. But it's definitely much easier than comparing GUIDs.

Former Member
0 Kudos

How does the partition size work if you have two tables defined with DEFAULT GLOBAL AUTOINCREMENT? Does the database manage two different ranges, one for each table or does it use the same partition size(range) across all tables?

VolkerBarth
Contributor
0 Kudos

@Calvin: You can define the partition size for each column independently, s. the DEFAULT GLOBAL AUTOINCREMENT [(partition-size)] syntax. However, as with DEFAULT AUTOINCREMENT, different columns don't get distinct values, i.e. each DEFAULT AUTOINCREMENT column will start with 1, 2, 3. With GLOBAL AUTOINCREMENT, they would start with (<global_database_id> * partition_size) + 1. So GLOBAL AUTOINCREMENT is aimed to generate distinct values for different databases, but not for different tables/columns - as that is not necessary for a RDBMS. However

VolkerBarth
Contributor
0 Kudos

Forget my last "However"...

Former Member
0 Kudos

This is fantastic! I can't believe we never researched this stuff!

Lets say I have a parent/child table relationship. If the user inserts the parent, then multiple children at once and hits Save, how would I be able to assign the parent id to the children using this? Is there a method to obtain the next number BEFORE the data actually hits the database?

Thanks alot for your help!

Former Member
0 Kudos

Nevermind, figured it out - get_identity('table')

Answers (4)

Answers (4)

Breck_Carter
Participant

GUIDs may be attractive if you don't want to use DEFAULT GLOBAL AUTOINCREMENT for avoiding primary key collisions in a replicating/synchronizing setup.

IMO the big problem with GUIDs will be administrative... every time anyone has to track down a problem involving data they'll be faced with this giant un-readable un-typable and generally incomprehensible string of crap.

Or dozens, hundreds, thousands of strings of crap.

Think Windows registry. Pity the maintenance programmer.

GLOBAL AUTOINCREMENT values also have their readability problems, but 100000001234 versus 200000000789 is not nearly as bad as {1ec17912-d2b4-48f9-9a47-5da6bbe695fb} versus {8dcb5d9f-5a52-43ef-82c3-2254e4d0c163}.

And that's if you're lucky enough to see the dashes, which isn't always true when you're in a hurry.

GLOBAL AUTOINCREMENT values are also somewhat more difficult to implement than GUIDs which, as you say, can be tossed around like road salt.

The phrase "tossing GUIDs on every table" does make me somewhat nervous. But then again, I always get nervous when I hear "we are unconditionally [making some dramatic change] to every single [thing] in our application". Gives me the screaming abdabs 🙂

Everything has unintended consequences. My advice, get promoted to management before the GUIDs go into production 🙂

VolkerBarth
Contributor
0 Kudos

@Breck: By the way, are GUID values always distinct when created on the same computer (i.e. is it theoretically impossible to generate the same value twice or is this only "very improbable")?

Former Member
0 Kudos

Great answer, thanks!

Breck_Carter
Participant
VolkerBarth
Contributor
0 Kudos

@Breck: Thanks for the pointer - so the answer may be dependent on one own's sense of paranoia:)

Former Member

Here is a trick I use all the time when I have to type out those big INt values... remember that 200000000789 = 2e11+789 and later is WAY easier to type correctly the first time. SELECT * FROM MYTABLE WHERE RECORD_ID = 2E11+789

Although a big int is 2e17, not 2e11 (I just copied one of breck's numbers)

Former Member

I have written about this subject in the past: here and here.

There are no right answers with this discussion, only tradeoffs. In my opinion, the convenience of GUIDs are trumped by usability considerations. GUIDs as a key may make sense if the table has another candidate key, but all too often this other candidate key is itself a surrogate identifier and there doesn't, at least to me, seem to be much point in having a table with two surrogate identifiers. One will do.

0 Kudos

Excellent articles.

Former Member

Advantages to using GUIDs are that if your database will contain all GUID primary keys, you can seamlessly merge two instances of your database, which is pretty neat if you work with distributed offices and want to collect data in a central place. You can have disconnected users (like salesmen on a laptop on the road without internet) run on their own separated little version of the database and have them create their own new primary keys and still seamlessly integrate it all back into the main database (sync at the office at the end of the day/week). You can also create primary keys in your applications without making round trips to the database, use those to define relationships between objects and then at your leisure save it all to the database without having to worry about having to update the PKs in your objects.

We currently use autoincrement PK values in our database, but on a next project I'd be tempted to give GUIDs a good try. Right now we are facing that our company is spreading from one building into three: the existing one, one down the road and one out of state. I wish we had a way of splitting our DB into those three locations and just sync back at the end of the day, but with the way we handle the PKs there's just no way. GUID PKs would have seamlessly allowed for that.

VolkerBarth
Contributor

That's a good reason. - However, if you know (or suspect) beforehand that you are going to merge databases, you could as well use GLOBAL AUTOINCREMENTs to generate disjunct primary keys - and they would still have to advantages of readable and efficient numbers. They need to be setup (in contrast to GUIDs) but this does reduce to set the according database opion.

Breck_Carter
Participant

That's a good reason. - However, if... no, wait a minute, Volker already said that 🙂

Former Member
0 Kudos

I just saw these comments. GLOBAL AUTOINCREMENT is a nice concept, but the setup seems cumbersome. Would this still be an option if there were many databases (like hundreds or thousands vs just two or three)?

MCMartin
Participant

Some disadvantages of GUIDs which come to my mind:

  • It might be a performance question if you have a lot of inserts, because the calculations needed to create a GUID are more complex than a simple increment.

  • It mights be also a performance question if you have a lot of transformations between text representation and internal representation of GUIDs during program execution.

  • After the insert you can get the value of the autoincrement field of the just inserted row by using "select @@identity", this is not working for the GUID. (Maybe a good question how to get the last inserted GUID?, I will create a thread for it.)

Advantages I see:

  • you don't have to think about ranges of values if you have a two-way replication.

  • fix size of the GUID in a text form

  • user is not expecting any sequence (questions like: where is row id 1024? (gone by a rollback) will not be asked by users)

VolkerBarth
Contributor
0 Kudos

The fact that AUTOINCREMENTs are increasing is often an advantage (and not a disadvantage) IMHO, it makes debugging (e.g. finding out the newest records) waaaay easier.