cancel
Showing results for 
Search instead for 
Did you mean: 

Data type for text

1,207

which data type is best to use for texts? The table will have a simple structure:

Material number Char(15) Text (approx. 2 DIN A 4 pages) ??? Creation date DateTime User created Char(30) Change date DateTime User changed Char(30)

/Franz

VolkerBarth
Contributor

FWIW, those maintenance columns are candidates for the according special values defaults that automatically add resp. update those values, such as:

CreatedAt datetime default current timestamp,
CreatedBy varchar(30) default current user,
LastModifiedAt datetime default timestamp,
LastModifiedBy varchar(30) default last user

Accepted Solutions (0)

Answers (1)

Answers (1)

Breck_Carter
Participant

The short answer is VARCHAR ( <maximum_length> ).

The long answer is in section 1.5.1 of The Book


1.5.1 A String Is a String: BINARY, CHARACTER, LONG

All character and binary columns are stored as varying length character strings regardless of how they are declared. The maximum length specifies a limit on the byte size of the data portion of the string, with a default of 1 byte. The LONG VARCHAR and LONG BINARY types have an implied maximum length of 2GB.

<string_type> ::= <char_type> [ "(" <maximum_length> ")" ]
| LONG BINARY
| LONG VARCHAR

<char_type> ::= BINARY
| CHAR [ VARYING ]
| CHARACTER [ VARYING ]
| VARBINARY
| VARCHAR

<maximum_length> ::= integer literal in the range 1 to 32767

Tip: All these data types, including LONG VARCHAR and LONG BINARY, may be used for local and global variables in stored procedures and other SQL scripts, as well as for columns in tables.

Storage requirements depend on the current length of each column value rather than the maximum length. Long strings are split and require more overhead than short strings, whereas short strings are stored efficiently even if they are declared as LONG VARCHAR. Here’s how it works: String values up to 254 bytes in length are always stored together with the other columns in the row. When the length grows to 255 bytes or larger the value is partitioned into two pieces; the first piece is 254 bytes long and remains where it was, while the remainder is called a blob continuation and is placed on one or more separate pages called extension pages. These extension pages are kept separate so that a query or sequential scan that doesn’t need to look at the long values won’t have to retrieve all these pages. This arrangement is described in more detail in Section 10.6.2, “Table Fragmentation.”

From a SQL programming point of view, a string is a string in SQL Anywhere 9 and you don’t have to worry about the declared data type. For example, if you think all company names will fit into 30 characters but you are concerned about exceptions, there is no performance penalty for using CHARACTER ( 100 ) or even 1000. Similarly, a description column that will usually require only a few hundred characters can be declared as LONG VARCHAR to handle those special cases; your database won’t grow in size until you actually store very long values.

Exactly the same data may be stored in either CHARACTER or BINARY columns. In particular, the zero byte (hexadecimal 00) may be stored in a CHARACTER column and it is treated as data, not a string terminator.

Tip: In some programming environments the zero byte string terminator is called “null.” This is not the same as the database NULL value implemented by SQL Anywhere 9; database NULLs require special handling when they are used in applications.

There are a few exceptions to the assumption “a string is a string.” First, sorting and comparisons involving BINARY columns always use the actual binary values, whereas CHARACTER columns are sorted and compared according to the database collation sequence and case sensitivity. For example, in a case-insensitive database (the default) the CHARACTER values 'a' and 'A' are treated as being equal, whereas the BINARY 'a' is treated as being less than the BINARY 'A' when they are compared or sorted.

Tip: Use the CAST function when you need to perform case-sensitive comparisons in a case-insensitive database; e.g., IF CAST ( char1 AS BINARY ) = CAST ( char2 AS BINARY ). This also works in the WHERE and ORDER BY clauses, and can be used when you need to ignore the database collation sequence.

Note: This book doesn’t discuss multi-byte character sets, except to note that some techniques, like the Tip above, are only intended for single-byte character sets.

Second, a few functions only work on the first 255 bytes of the character string arguments: SOUNDEX, SIMILAR, and all the date and time functions ignore anything past 255 bytes.

Third, a conversion from string to numeric will also ignore any data past 255 bytes.

Fourth, an attempt to assign a long string value to a column declared with a shorter maximum length will result in right truncation. This truncation will happen silently when only spaces are being truncated by an INSERT or UPDATE command, or when non-spaces are truncated but the STRING_RTRUNCATION option is still set to the default 'OFF'. To generate an error message when non-spaces are truncated you must set STRING_RTRUNCATION to 'ON'. Note that trailing spaces are significant, and are never truncated unless they won’t fit in the declared maximum length.

Tip: The LTRIM, RTRIM, and TRIM functions can be used to get rid of leading and trailing spaces.

Fifth, some application development tools generate different code and user interface elements depending on how a string is declared. In some cases a column declared as CHAR may be treated as a fixed-length string even though SQL Anywhere 9 does not implement it that way.

Note: Other database products may implement CHAR columns as fixed-length strings, and that might affect you if you’re sending data back and forth via proxy tables or MobiLink synchronization.

Finally, there are some performance implications to declaring a string column with a maximum length far larger than you need. The declared width of columns in an index is used to determine if a compressed B-tree index can be used instead of a hash B-tree index. Subquery and function caching may be turned off in cases where the total declared maximum length of the columns and arguments is very large. Also, the query optimizer may be able to get better Chapter 1: Creating 9 information from the column statistics for columns with a declared maximum size less than 8 bytes. Some of these topics are discussed in Chapter 10, “Tuning.”

Otherwise, a string is still a string, and you can happily store and copy and assign values with different declared string types without any problems.

VolkerBarth
Contributor
0 Kudos

Just to add: Breck's book is about version 9. Version 10 introduced Unicode data types like NVARCHAR and the like. Unless you are using a UTF encoding by default or are just using strings in one language, it may be preferable to use (LONG) NVARCHAR instead of (LONG) VARCHAR.

0 Kudos

Thank you for the very good description! /Franz