cancel
Showing results for 
Search instead for 
Did you mean: 

Table Size and Performance varchar

Former Member
2,732

I would like to have a table that would be de-normalized. This makes for a fairly large table with almost 600 columns.

The question I have is that the columns are typically varying in length. For flexibility I am thinking of just making them all varchar 256 even though most of them could easily be varchar 40

Is there an issue with performance by doing this?

VolkerBarth
Contributor
0 Kudos

Just to hint at some similar FAQs - yes, I'm aware you have not asked for CHAR or LONG VARCHAR:) but there are further details when the length might matter or not:


As to the "de-normalized" table: Have you considered to use a materialized view (in case that would be of help?)

Accepted Solutions (0)

Answers (2)

Answers (2)

Breck_Carter
Participant

I recently worked on converting a database from SQL Anywhere 9 to Microsoft SQL Server where one of the tables contained 5,110 columns, most of which were defined as LONG VARCHAR.

It contained 456,811 rows, 472M total = 459M table + 3.1M ext + 9.5M index, 1,083 bytes per row, so it was clearly sparse. The "bytes per row" includes all disk space allocated to the table (all data and index pages) so it overstates the total amount of column data in the rows.

As far as I know it was a key table with no performance issues reported to me.

FWIW MSS was utterly incapable of handling such a [cough] curiosity without major surgery (on the table, not MSS 🙂

Former Member
0 Kudos

On a side note what is the impact of using varchar(20) vs numeric (20,10)

Breck_Carter
Participant
0 Kudos

Here are a couple of excerpts from my book:

1.5.1 A String Is a String: BINARY, CHARACTER, LONG

All character and binary columns are stored as varying length character strings
regardless of how they are declared. The maximum length specifies a limit on
the byte size of the data portion of the string, with a default of 1 byte. The
LONG VARCHAR and LONG BINARY types have an implied maximum
length of 2GB. 

<string_type> ::= <char_type> [ "(" <maximum_length> ")" ]
                | LONG BINARY
                | LONG VARCHAR

<char_type> ::= BINARY
                | CHAR [ VARYING ]
                | CHARACTER [ VARYING ]
                | VARBINARY
                | VARCHAR

<maximum_length> ::= integer literal in the range 1 to 32767

Tip: All these data types, including LONG VARCHAR and LONG BINARY, may
be used for local and global variables in stored procedures and other SQL
scripts, as well as for columns in tables.

Storage requirements depend on the current length of each column value rather
than the maximum length. Long strings are split and require more overhead than
short strings, whereas short strings are stored efficiently even if they are
declared as LONG VARCHAR. Here’s how it works: String values up to 254
bytes in length are always stored together with the other columns in the row.

When the length grows to 255 bytes or larger the value is partitioned into two
pieces; the first piece is 254 bytes long and remains where it was, while the
remainder is called a blob continuation and is placed on one or more separate
pages called extension pages. These extension pages are kept separate so that a
query or sequential scan that doesn’t need to look at the long values won’t have
to retrieve all these pages. This arrangement is described in more detail in Section
10.6.2, “Table Fragmentation.”

From a SQL programming point of view, a string is a string in SQL Anywhere
9 and you don’t have to worry about the declared data type. For example,
if you think all company names will fit into 30 characters but you are concerned
about exceptions, there is no performance penalty for using CHARACTER
( 100 ) or even 1000. Similarly, a description column that will usually require
only a few hundred characters can be declared as LONG VARCHAR to handle
those special cases; your database won’t grow in size until you actually store
very long values.
Exactly the same data may be stored in either CHARACTER or BINARY
columns. In particular, the zero byte (hexadecimal 00) may be stored in a
CHARACTER column and it is treated as data, not a string terminator.
Tip: In some programming environments the zero byte string terminator is
called “null.” This is not the same as the database NULL value implemented by
SQL Anywhere 9; database NULLs require special handling when they are used
in applications.

There are a few exceptions to the assumption “a string is a string.” First, sorting
and comparisons involving BINARY columns always use the actual binary values,
whereas CHARACTER columns are sorted and compared according to the
database collation sequence and case sensitivity. For example, in a case-insensitive
database (the default) the CHARACTER values 'a' and 'A' are treated as
being equal, whereas the BINARY 'a' is treated as being less than the BINARY
'A' when they are compared or sorted.

Tip: Use the CAST function when you need to perform case-sensitive comparisons
in a case-insensitive database; e.g., IF CAST ( char1 AS BINARY ) = CAST
( char2 AS BINARY ). This also works in the WHERE and ORDER BY clauses, and
can be used when you need to ignore the database collation sequence.

Note: This book doesn’t discuss multi-byte character sets, except to note that
some techniques, like the Tip above, are only intended for single-byte character
sets.

Breck_Carter
Participant
0 Kudos

Continued...

Second, a few functions only work on the first 255 bytes of the character string
arguments: SOUNDEX, SIMILAR, and all the date and time functions ignore
anything past 255 bytes.

Third, a conversion from string to numeric will also ignore any data past 255 bytes.

Fourth, an attempt to assign a long string value to a column declared with a shorter maximum length will result in right truncation. This truncation will happen silently when only spaces are being truncated by an INSERT or UPDATE command, or when non-spaces are truncated but the STRING_RTRUNCATION option is still set to the default 'OFF'. To generate an error message when non-spaces are truncated you must set STRING_RTRUNCATION to 'ON'. Note that trailing spaces are significant, and are never truncated unless they won’t fit in the declared maximum length.

Tip: The LTRIM, RTRIM, and TRIM functions can be used to get rid of leading and trailing spaces.

Fifth, some application development tools generate different code and user interface elements depending on how a string is declared. In some cases a column declared as CHAR may be treated as a fixed-length string even though SQL Anywhere 9 does not implement it that way.

Note: Other database products may implement CHAR columns as fixed-length strings, and that might affect you if you’re sending data back and forth via proxy tables or MobiLink synchronization.

Finally, there are some performance implications to declaring a string column with a maximum length far larger than you need. The declared width of columns in an index is used to determine if a compressed B-tree index can be used instead of a hash B-tree index. Subquery and function caching may be turned off in cases where the total declared maximum length of the columns and arguments is very large. Also, the query optimizer may be able to get better information from the column statistics for columns with a declared maximum size less than 8 bytes. Some of these topics are discussed in Chapter 10, “Tuning.”

Otherwise, a string is still a string, and you can happily store and copy and assign values with different declared string types without any problems.

=====

1.5.3 Exact Decimals

Exact decimals give exact results when they are combined in arithmetic expressions, making them ideal for financial calculations.

The DECIMAL and NUMERIC data types are one and the same: an exact decimal number up to 127 digits in length (the precision). The number of digits to the right of the decimal point (the scale) can range from zero to all of the digits.

<exact_numeric_type> ::= <exact_keyword> [ "(" <exact_precision> [ "," <exact_scale> ] ")" ]

<exact_keyword> ::= DEC | DECIMAL | NUMERIC

<exact_precision> ::= integer literal in the range 1 to 127

<exact_scale> ::= integer literal in the range 1 to <exact_precision>

If both the precision and scale are omitted, the default is ( 30, 6 ). If the precision is specified but the scale is omitted, the scale defaults to zero. If both the precision and scale are specified, the scale must be in the range 0 <= scale <= precision. In other words, the decimal point cannot be shifted away from the actual digits of precision.

The storage requirements depend on the precision and scale of the actual values stored, not the declared precision and scale. For example, 123456789 will take more storage than 123, and 1.123456789 will take more disk space than 1.1. The formula for the bytes required is:

2 + TRUNCNUM ( ( p - s + 1 ) / 2, 0 ) + TRUNCNUM ( ( s + 1 ) / 2, 0 )

where p and s are the actual precision and scale after any leading and trailing zeroes are ignored.

MarkCulp
Participant

Within the server there will be little to no difference since the server only stores the number of bytes that are actually present in each column.

There will be a client side difference though. When a column is described the server will tell the client that the column could be as large as 256 bytes. This will result in the client allocating (at least) 256 bytes for the column (note: exact number will depend on API). This also affects the stream layer within the client library: the low level stream will be expecting up to 256 bytes for each column and therefore will need to allocate a large enough buffer to hold N rows (when N will vary) when prefetching the rows from the server. There are limits on the size of buffer that will be allocated so depending on the number of rows being fetched it could limit the number of rows that are prefetched (and hence could reduce throughput if you are fetching thousands or millions of rows).

HTH