on 2010 Sep 23 9:03 PM
According to this reply, a LONG VARCHAR COMPRESSED column value shorter that 256 bytes will not be compressed because the length is shorter than the default INLINE 256: http://sqlanywhere-forum.sap.com/questions/1038/column-compression/1039#1039
Suppose I have a LONG VARCHAR column with the following characteristics (in particular, avg_length = 152):
SELECT * FROM sa_column_stats ( tab_name = 'ttt', col_name = 'ccc', tab_owner = 'DBA', max_rows = 1000000000 ); table_owner,table_name,column_name,num_rows_processed,num_values_compressed,avg_compression_ratio,avg_length,stddev_length,min_length,max_length,avg_uncompressed_length,stddev_uncompressed_length,min_uncompressed_length,max_uncompressed_length 'DBA','ttt','ccc',9757647,0,,151.89775032807609,2348.2860711857497,0,170,151.89775032807609,2348.2860711857497,0,170
What are the pros and cons of forcing compression with COMPRESSED INLINE 0 PREFIX 0?
Will that even work? ( see, I'm not afraid to look stoopid 🙂
Edit: Yes, it seems to work ( not completely stoopid, at least not this time 🙂
Here is a small test, first of an uncompressed table fff:
CREATE TABLE fff ( pkey INTEGER NOT NULL DEFAULT AUTOINCREMENT PRIMARY KEY, ccc LONG VARCHAR NULL ); INSERT fff ( ccc ) SELECT REPEAT ( 'ABCDEFGHIJ', 15 ) FROM sa_rowgenerator ( 1, 10000 ); COMMIT;
Foxhound shows it uses up 1.7M of disk space (181 bytes per row) in a 11.0.1.2276 database with 4K page size:
CREATE TABLE DBA.fff ( -- 10,000 rows, 1.7M total = 1.7M table + 0 ext + 16k index, 181 bytes per row pkey /* PK */ INTEGER NOT NULL DEFAULT autoincrement, ccc LONG VARCHAR NULL, CONSTRAINT ASA85 PRIMARY KEY ( -- 16k pkey ) );
Here is what Foxhound says for table ttt, which is identical except for COMPRESSED INLINE 0 PREFIX 0:
CREATE TABLE DBA.ttt ( -- 10,000 rows, 868k total = 344k table + 452k ext + 72k index, 89 bytes per row pkey /* PK */ INTEGER NOT NULL DEFAULT autoincrement, ccc LONG VARCHAR COMPRESSED INLINE 0 PREFIX 0 NULL, CONSTRAINT ASA84 PRIMARY KEY ( -- 72k pkey ) );
Here are the compression statistics (in particular, avg_compression_ratio = 77%):
SELECT @@VERSION, * FROM sa_column_stats ( tab_name = 'ttt', col_name = 'ccc', tab_owner = 'DBA', max_rows = 1000000000 ); @@VERSION,table_owner,table_name,column_name,num_rows_processed,num_values_compressed,avg_compression_ratio,avg_length,stddev_length,min_length,max_length,avg_uncompressed_length,stddev_uncompressed_length,min_uncompressed_length,max_uncompressed_length '11.0.1.2276','DBA','ttt','ccc',10000,10000,76.66666666666667,150.0,0.0,150,150,,,,
So... it looks like COMPRESSED INLINE 0 PREFIX 0 can reduce disk usage quite a bit, even for a relatively short column.
What are the cons?
Request clarification before answering.
There are three cons:
prefix 0 inline 0
, then no uncompressed data is stored at all. This means that every access to every piece of your string requires decompression. If you do select substr(ccc,1,1) from ttt
we would have to decompress the first chunk of each and every string. If you have a non-zero values for prefix
, no decompression of any string would be necessary. If you have prefix 0
but a non-zero inline
value, then strings shorter than the inline
value would not need to be decompressed but longer ones would.Determining whether or not to use compression is, possibly counterintuitively, more a question of performance than disk savings, since disk space is really cheap these days. Saving a few megabytes here and there is less compelling if all your queries slow down (especially if you can get a terabyte for less than $100). If you have a fast enough processor, you likely want to compress as much as possible since decompression is going to be cheaper than extra disk I/Os (i.e. reading 20 compressed pages and decompressing them all may be faster than reading 200 uncompressed pages). If you have a fast disk and a slow processor, decompression may be slower but doing some extra reads may not be that big a deal.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Graeme alludes to a third con but doesn't explicitly state it:
@Graeme: What if the 99.9% of the queries don't involve the commpressed column, and what if INLINE 0 PREFIX 0 reduces the size of the data portion of the row by one third? (even if not COMPRESSED) Performance is not just CPU, performance is disk I/O... sometimes it's ALL ABOUT disk I/O, and having that cheapo terabyte disk doesn't help if it takes forever to read it. (and yes, in fact, I am using a pair of cheapo terabyte disks for the testing, but alas, they cost more than $100 when I bought them a couple of years ago 🙂
@Graeme: Folks sometimes put the blobs in separate tables... INLINE 0 PREFIX 0 has a somewhat similar effect without all the DDL sturm und drang... and possibly better performance. Caveat: Putting blobs in a separate table is still necessary if the point is to avoid unnecessary MobiLink upload traffic if the other columns are frequently uploaded but the blobs aren't.
One con: you have at least 12 bytes header for the compression.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
@Breck: No, that part does not apply to compressed columns. With a compressed column, the character set conversion is performed on the uncompressed data, so you can use VARCHAR or BINARY. That's referring to storing a value that has been compressed externally (or through the COMPRESS function).
Here are the results of two tests of the INSERT side of a high-throughput application, before and after COMPRESSED INLINE 0 PREFIX 0 was specified for one single LONG VARCHAR column:
Average Core2 Quad Transaction Disk Disk Average Compression CPU Usage Time Space Space Uncompressed Ratio for INSERT for INSERT Rows for Table per Row Column Length for Column Process Process --------- --------- --------- --------- ---------- ---------- ------- Before 9,703,524 5.9G 656 bytes 152 bytes 0 35% 1 sec After 28,776 18M 657 bytes 158 bytes 12.6% 80% 30 sec
Some comments and observations...
The two tests aren't exactly the same: the "Before" test ran a lot longer, and the uncompressed column lengths turned out to be slightly different. Otherwise, the only difference is that the "After" test used COMPRESSED INLINE 0 PREFIX 0 on one single LONG VARCHAR column.
The "Disk Space" columns are based on the total number of pages allocated to the table. This includes the data, extension and index pages as well as any free space contained therein... in other words, real disk space usage.
The compression ratio was fairly low. That was a disappointment.
Approximately 5 bytes per row were saved, hardly worth it (6 extra bytes per column value, minus 1 extra byte disk space per row). The compression ratio indicates 20 bytes per row were saved; perhaps the missing 15 bytes went towards the overhead of creating extension pages.
It is possible that the "After" test was too short to yield accurate Disk Space values, making point 4 meaningless. But that's moot, read on...
All four CPU processors were pegged at 100% in the "After" test, with 80% of that going to the INSERT Process... that was a big surprise.
The Transaction Time was an even bigger surprise. From the end-user point of view, the application was crushed. That's the reason the "After" test was cut short: the patient was dying, time to stop the experiment.
Clearly, at least in this case, COMPRESSED INLINE 0 PREFIX 0 was a really bad idea... and perhaps it is a bad idea in all cases.
Perhaps there is a good reason the defaults are INLINE 256 PREFIX 8.
And perhaps the suggestion "don't use COMPRESSED on short columns" is a good one.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
87 | |
10 | |
9 | |
9 | |
7 | |
6 | |
6 | |
5 | |
5 | |
5 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.