on 2010 Jan 15 3:40 AM
How is performance impacted when you have a table that, lets say could have over 1M rows, and there are 3-5 columns that are consistently NULL.
Is it bad design to have data structures in place that support this? Should we look at normalizing it a little more?
Thanks! -Calvin
P.S. Love this place.
Request clarification before answering.
Glad you like this place! It's only going to get better, when more people show up.
I'm not sure what you are asking in your second sentence, in particular I'm not sure what NULL has to do with normalization.
Tackling the first question...
I don't think NULL values cause performance problems per se, at least not that you'd notice. I know that historically, especially with those icky legacy DBMS products like [expletive deleted] and [bodily harm], folks used to worry a lot about fixed-versus-variable length columns, physical ordering of columns in rows, [excrement] like that.
What CAN hurt performance somewhat is inserting rows with all nulls, all zeroes, all empty character strings, and then later updating rows to fill in the values: instant row splits across multiple pages, instant fragmentation of your physical database storage. At least, that was true historically with SQL Anywhere... I am not absolutely sure V11 hasn't eased that problem, I will let Mr. Paulley et al fill in the blanks here.
I haven't run any INSERT/UPDATE/rowsplit tests lately, and when I did it wasn't easy to demonstrate the problem, you had to work at it. And it wasn't directly caused by NULLs per se, but by increasing value sizes over time.
I guess I'm saying, there are lots of reasons to hate NULLs but performance isn't one of them.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I guess the question is: Is is good design to put similar things in only one table when some of them can't have values for a few fields whereas others can, or should one put the addional fields in a separate child table, i.e. the common question of OOP-Types/Subtypes and how to map them to a RDM. - Well, there's a whole ORM industry as an answer:)
Breck is correct about causing row splits due to updates of columns that were initially NULL. Some of this can be mitigated through the use of new INLINE column specification modifier introduced in Version 10, but the behaviour remains.
@Breck: I can't say much about ORMs, as I haven't used one so far. Guess I share your point of view of prefering SQL over some abstraction on that, as long as the data model isn't too complex. W.r.t. nulls, I think they are useful in certain circumstances, and for that, I prefer them. But I agree, one shouldn't use them too frequently...if a column mustn't accept nulls, it should be NOT NULL. - Well, this answer might be more serious than your questions:)
In addition to Breck's and Martin's responses, Nulls are quite ignorable w.r.t. to storage size:
From the SA 11 docs:
Column space utilization for NULL values is 1 bit per column and space is allocated in multiples of 8 bits. The NULL bit usage is fixed based on the number of columns in the table that allow NULL values.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
50 | |
9 | |
8 | |
6 | |
5 | |
5 | |
5 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.