on 2016 Jul 29 10:22 AM
I would like to have a table that would be de-normalized. This makes for a fairly large table with almost 600 columns.
The question I have is that the columns are typically varying in length. For flexibility I am thinking of just making them all varchar 256 even though most of them could easily be varchar 40
Is there an issue with performance by doing this?
I recently worked on converting a database from SQL Anywhere 9 to Microsoft SQL Server where one of the tables contained 5,110 columns, most of which were defined as LONG VARCHAR.
It contained 456,811 rows, 472M total = 459M table + 3.1M ext + 9.5M index, 1,083 bytes per row, so it was clearly sparse. The "bytes per row" includes all disk space allocated to the table (all data and index pages) so it overstates the total amount of column data in the rows.
As far as I know it was a key table with no performance issues reported to me.
FWIW MSS was utterly incapable of handling such a [cough] curiosity without major surgery (on the table, not MSS 🙂
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Here are a couple of excerpts from my book:
1.5.1 A String Is a String: BINARY, CHARACTER, LONG All character and binary columns are stored as varying length character strings regardless of how they are declared. The maximum length specifies a limit on the byte size of the data portion of the string, with a default of 1 byte. The LONG VARCHAR and LONG BINARY types have an implied maximum length of 2GB. <string_type> ::= <char_type> [ "(" <maximum_length> ")" ] | LONG BINARY | LONG VARCHAR <char_type> ::= BINARY | CHAR [ VARYING ] | CHARACTER [ VARYING ] | VARBINARY | VARCHAR <maximum_length> ::= integer literal in the range 1 to 32767 Tip: All these data types, including LONG VARCHAR and LONG BINARY, may be used for local and global variables in stored procedures and other SQL scripts, as well as for columns in tables. Storage requirements depend on the current length of each column value rather than the maximum length. Long strings are split and require more overhead than short strings, whereas short strings are stored efficiently even if they are declared as LONG VARCHAR. Here’s how it works: String values up to 254 bytes in length are always stored together with the other columns in the row. When the length grows to 255 bytes or larger the value is partitioned into two pieces; the first piece is 254 bytes long and remains where it was, while the remainder is called a blob continuation and is placed on one or more separate pages called extension pages. These extension pages are kept separate so that a query or sequential scan that doesn’t need to look at the long values won’t have to retrieve all these pages. This arrangement is described in more detail in Section 10.6.2, “Table Fragmentation.” From a SQL programming point of view, a string is a string in SQL Anywhere 9 and you don’t have to worry about the declared data type. For example, if you think all company names will fit into 30 characters but you are concerned about exceptions, there is no performance penalty for using CHARACTER ( 100 ) or even 1000. Similarly, a description column that will usually require only a few hundred characters can be declared as LONG VARCHAR to handle those special cases; your database won’t grow in size until you actually store very long values. Exactly the same data may be stored in either CHARACTER or BINARY columns. In particular, the zero byte (hexadecimal 00) may be stored in a CHARACTER column and it is treated as data, not a string terminator. Tip: In some programming environments the zero byte string terminator is called “null.” This is not the same as the database NULL value implemented by SQL Anywhere 9; database NULLs require special handling when they are used in applications. There are a few exceptions to the assumption “a string is a string.” First, sorting and comparisons involving BINARY columns always use the actual binary values, whereas CHARACTER columns are sorted and compared according to the database collation sequence and case sensitivity. For example, in a case-insensitive database (the default) the CHARACTER values 'a' and 'A' are treated as being equal, whereas the BINARY 'a' is treated as being less than the BINARY 'A' when they are compared or sorted. Tip: Use the CAST function when you need to perform case-sensitive comparisons in a case-insensitive database; e.g., IF CAST ( char1 AS BINARY ) = CAST ( char2 AS BINARY ). This also works in the WHERE and ORDER BY clauses, and can be used when you need to ignore the database collation sequence. Note: This book doesn’t discuss multi-byte character sets, except to note that some techniques, like the Tip above, are only intended for single-byte character sets.
Continued...
Second, a few functions only work on the first 255 bytes of the character string arguments: SOUNDEX, SIMILAR, and all the date and time functions ignore anything past 255 bytes.Third, a conversion from string to numeric will also ignore any data past 255 bytes.
Fourth, an attempt to assign a long string value to a column declared with a shorter maximum length will result in right truncation. This truncation will happen silently when only spaces are being truncated by an INSERT or UPDATE command, or when non-spaces are truncated but the STRING_RTRUNCATION option is still set to the default 'OFF'. To generate an error message when non-spaces are truncated you must set STRING_RTRUNCATION to 'ON'. Note that trailing spaces are significant, and are never truncated unless they won’t fit in the declared maximum length.
Tip: The LTRIM, RTRIM, and TRIM functions can be used to get rid of leading and trailing spaces.
Fifth, some application development tools generate different code and user interface elements depending on how a string is declared. In some cases a column declared as CHAR may be treated as a fixed-length string even though SQL Anywhere 9 does not implement it that way.
Note: Other database products may implement CHAR columns as fixed-length strings, and that might affect you if you’re sending data back and forth via proxy tables or MobiLink synchronization.
Finally, there are some performance implications to declaring a string column with a maximum length far larger than you need. The declared width of columns in an index is used to determine if a compressed B-tree index can be used instead of a hash B-tree index. Subquery and function caching may be turned off in cases where the total declared maximum length of the columns and arguments is very large. Also, the query optimizer may be able to get better information from the column statistics for columns with a declared maximum size less than 8 bytes. Some of these topics are discussed in Chapter 10, “Tuning.”
Otherwise, a string is still a string, and you can happily store and copy and assign values with different declared string types without any problems.
=====
1.5.3 Exact Decimals
Exact decimals give exact results when they are combined in arithmetic expressions, making them ideal for financial calculations.
The DECIMAL and NUMERIC data types are one and the same: an exact decimal number up to 127 digits in length (the precision). The number of digits to the right of the decimal point (the scale) can range from zero to all of the digits.
<exact_numeric_type> ::= <exact_keyword> [ "(" <exact_precision> [ "," <exact_scale> ] ")" ]
<exact_keyword> ::= DEC | DECIMAL | NUMERIC
<exact_precision> ::= integer literal in the range 1 to 127
<exact_scale> ::= integer literal in the range 1 to <exact_precision>
If both the precision and scale are omitted, the default is ( 30, 6 ). If the precision is specified but the scale is omitted, the scale defaults to zero. If both the precision and scale are specified, the scale must be in the range 0 <= scale <= precision. In other words, the decimal point cannot be shifted away from the actual digits of precision.
The storage requirements depend on the precision and scale of the actual values stored, not the declared precision and scale. For example, 123456789 will take more storage than 123, and 1.123456789 will take more disk space than 1.1. The formula for the bytes required is:
2 + TRUNCNUM ( ( p - s + 1 ) / 2, 0 ) + TRUNCNUM ( ( s + 1 ) / 2, 0 )
where p and s are the actual precision and scale after any leading and trailing zeroes are ignored.
Within the server there will be little to no difference since the server only stores the number of bytes that are actually present in each column.
There will be a client side difference though. When a column is described the server will tell the client that the column could be as large as 256 bytes. This will result in the client allocating (at least) 256 bytes for the column (note: exact number will depend on API). This also affects the stream layer within the client library: the low level stream will be expecting up to 256 bytes for each column and therefore will need to allocate a large enough buffer to hold N rows (when N will vary) when prefetching the rows from the server. There are limits on the size of buffer that will be allocated so depending on the number of rows being fetched it could limit the number of rows that are prefetched (and hence could reduce throughput if you are fetching thousands or millions of rows).
HTH
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
62 | |
10 | |
7 | |
7 | |
6 | |
6 | |
6 | |
5 | |
5 | |
5 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.