on 2014 Sep 26 8:54 AM
Update: Nick's comment is appreciated... is it the official response from Engineering?
What is the Sybase Central 16 "Fragmentation" tab showing?
The following screen shot from Sybase Central in SQL Anywhere 16.0.0.1915 says the rroad_group_2_property_pivot table has "1,224 table fragments" (see the bottom right corner).
What does that mean, and where does it come from?
In other words, does it affect query performance, and is there a system procedure I can call to get that number, plus the data used to display the pretty picture?
Testing indicates it does not have anything to do with "file fragmentation":
SELECT DB_PROPERTY ( 'DBFileFragments' ); DB_PROPERTY('DBFileFragments') '2'
Testing also indicates it does not have anything to do with "table fragmentation":
SELECT * FROM sa_table_fragmentation ( 'rroad_group_2_property_pivot', 'DBA' ); TableName, rows, row_segments, segs_per_row 'rroad_group_2_property_pivot', 1110934, 1110934, 1.0
Is "1,224 table fragments" saying that the table data for rroad_group_2_property_pivot is being stored in 1,224 non-contiguous blocks of pages within the database file (they are non-contiguous with respect to pages containing data for other tables, NOT that they are non-contiguous with respect to the disk drive... DBFileFragments = 2)?
If so, why should I care about that?
If I should care about it, how can I change the number? The REORGANIZE TABLE statement does not appear to directly affect that number; in fact, REORGANIZE TABLE can make Sybase Central's "table fragments" go UP as well as down.
If NOT, then be aware that the Sybase Central Fragmentation display is frightening the smallfolk... it is frightening ME 🙂
Here are my responses to your questions:
Q: What is the Sybase Central 16 "Fragmentation" tab showing?
It’s showing a page map of the dbspace containing the selected table or index. Basically, it shows which pages are used for the selected object, and provides a visual display of the level of page fragmentation.
Selecting a table in the list shows its table and extension pages in the page map. In addition, it shows index pages for any indexes on the table that reside in the same dbspace as the table. Selecting an index in the list shows its index pages in the page map.
Q: The following screen shot from Sybase Central in SQL Anywhere 16.0.0.1915 says the rroad_group_2_property_pivot table has "1,224 table fragments" (see the bottom right corner). What does that mean, and where does it come from?
It means that the pages for the table’s data are spread across 1,224 non-contiguous blocks within the dbspace. In contrast, if all pages for the table’s data were contiguous within the dbspace, then there would be 1 table fragment.
This information comes from the table_page_list column in SYSTAB. Similarly, the extension fragments information comes from the ext_page_list column in SYSTAB. The index fragments information comes from SYSPHYSIDX.allocation_bitmap.
Q: In other words, does it affect query performance, and is there a system procedure I can call to get that number, plus the data used to display the pretty picture?
It can affect query performance. If there are a lot of small fragments (say, one or two pages in size) and the fragments are spread widely across the dbspace, then the server requires more 64K reads (and potentially more drive head movement) to do a sequential scan of the table.
There is no system procedure that returns this information. However, Sybase Central uses the following function to count fragments:
CREATE TEMPORARY FUNCTION count_fragments( bitmap LONG VARBIT ) RETURNS INTEGER BEGIN DECLARE num_fragments INTEGER; SELECT SUM( is_first_bit ) INTO num_fragments FROM ( SELECT bitnum, IF bitnum = MIN( bitnum ) OVER ( ORDER BY bitnum RANGE BETWEEN 1 PRECEDING AND CURRENT ROW ) THEN 1 ELSE 0 ENDIF AS is_first_bit FROM dbo.sa_get_bits( bitmap ) ) DT; RETURN num_fragments; END;
When a table is selected, Sybase Central executes the following SQL to obtain the page map and fragmentation information for the table (and any indexes on the table that reside in the same dbspace as the table):
WITH MergedIndexBitmaps AS ( SELECT T.table_id, BIT_OR( P.allocation_bitmap ) AS idxs_bitmap FROM SYS.SYSPHYSIDX P JOIN SYS.SYSIDX I ON I.table_id = P.table_id AND I.phys_index_id = P.phys_index_id JOIN SYS.SYSTAB T ON T.table_id = I.table_id AND T.dbspace_id = I.dbspace_id GROUP BY T.table_id ) SELECT LOCATE( T.tab_page_list, '1', 1 ) AS tab_first_page, LOCATE( T.tab_page_list, '1', -1 ) AS tab_last_page, COMPRESS( SUBSTR( T.tab_page_list, tab_first_page, tab_last_page-tab_first_page+1 ), 'gzip' ) AS tab_bitmap_zip, count_fragments( T.tab_page_list ), LOCATE( T.ext_page_list, '1', 1 ) AS ext_first_page, LOCATE( T.ext_page_list, '1', -1 ) AS ext_last_page, IF T.ext_page_count > 0 THEN COMPRESS( SUBSTR( T.ext_page_list, ext_first_page, ext_last_page-ext_first_page+1 ), 'gzip' ) ENDIF AS ext_bitmap_zip, count_fragments( T.ext_page_list ), LOCATE( M.idxs_bitmap, '1', 1 ) AS idxs_first_page, LOCATE( M.idxs_bitmap, '1', -1 ) AS idxs_last_page, IF COUNT_SET_BITS( M.idxs_bitmap ) > 0 THEN COMPRESS( SUBSTR( M.idxs_bitmap, idxs_first_page, idxs_last_page-idxs_first_page+1 ), 'gzip' ) ENDIF AS idxs_bitmap_zip FROM SYS.SYSTAB T JOIN SYS.SYSUSER U ON U.user_id = T.creator LEFT OUTER JOIN MergedIndexBitmaps M ON M.table_id = T.table_id WHERE T.table_name = '<table-name>' AND U.user_name = '<table-owner>';
When an index is selected, Sybase Central executes the following SQL to obtain the page map and fragmentation information for the index:
SELECT LOCATE( P.allocation_bitmap, '1', 1 ) AS idx_first_page, LOCATE( P.allocation_bitmap, '1', -1 ) AS idx_last_page, COMPRESS( SUBSTR( P.allocation_bitmap, idx_first_page, idx_last_page-idx_first_page+1 ), 'gzip' ) AS idx_bitmap_zip, count_fragments( P.allocation_bitmap ) FROM SYS.SYSPHYSIDX P JOIN SYS.SYSIDX I ON I.table_id = P.table_id AND I.phys_index_id = P.phys_index_id JOIN SYS.SYSTAB T ON T.table_id = I.table_id JOIN SYS.SYSUSER U ON U.user_id = T.creator WHERE I.index_name = '<index-name>' AND I.index_category = '<index-category>' AND T.table_name = '<table-name>' AND U.user_name = '<table-owner>';
where index-category is defined as 1 for primary keys, 2 for foreign keys, and 3 for secondary indexes (unique constraints and indexes).
Q: Testing indicates it does not have anything to do with "file fragmentation": SELECT DB_PROPERTY ( 'DBFileFragments' );
DB_PROPERTY('DBFileFragments') '2'
True. The page map doesn’t indicate the fragmentation of the dbspace file on disk, so for a given object, the fragment count within the dbspace is equivalent to the fragment count on the disk only if the dbspace file is contiguous on the disk.
Q: Testing also indicates it does not have anything to do with "table fragmentation": SELECT * FROM sa_table_fragmentation ( 'rroad_group_2_property_pivot', 'DBA' );
TableName, rows, row_segments, segs_per_row 'rroad_group_2_property_pivot', 1110934, 1110934, 1.0
True. This procedure displays information on row-splits. In this case rows and row_segments are equal if no rows are split across pages; otherwise, the number of row_segments will be greater than the number of rows. This info is unrelated to the number of table fragments.
Q: Is "1,224 table fragments" saying that the table data for rroad_group_2_property_pivot is being stored in 1,224 non-contiguous blocks of pages within the database file (they are non-contiguous with respect to pages containing data for other tables, NOT that they are non-contiguous with respect to the disk drive... DBFileFragments = 2)?
Exactly.
Q: If so, why should I care about that?
The number of table fragments is only useful if you also take into account the number of table pages. Consider the ratio of fragments/pages: If it is high (say, approaching 1.0) and the fragments are widely spaced across the dbspace, then the server requires more reads when doing a sequential scan.
Q: If I should care about it, how can I change the number? The REORGANIZE TABLE statement does not appear to directly affect that number; in fact, REORGANIZE TABLE can make Sybase Central's "table fragments" go UP as well as down.
If a table is suffering from excessive fragmentation within the dbspace, then REORGANIZE TABLE will remedy that (and also reorder the rows by the table’s primary key or clustered index). But REORGANIZE TABLE won’t reduce the number of fragments if there is little or no improvement to be gained. The number of fragments can increase by a relatively small amount as a result of a reorganization, but that isn’t a cause for concern if the fragments are closely spaced together (because the number of 64K reads won’t change).
You really need to take into account the number of table pages, the number of table fragments, and how closely the fragments are to each other when determining whether to do a REORGANIZE TABLE.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
The above answers by Nick and Rich are accurate. To restate:
The grouping of table pages can affect the performance for sequential scans of the table. The following is a sketch of how you can estimate the cost of scanning a table sequentially with the current layout of pages or after all gaps are removed:
create temporary function F_CostDTT(in @dist int) returns int begin set F_CostDTT = case when @dist <= 1 then 122 when @dist <= 40 then 209 when @dist <= 80 then 263 when @dist <= 200 then 582 when @dist <= 400 then 697 when @dist <= 800 then 701 else 3876 end ; end; create temporary procedure P_CostSequentialScan(in @bits long varbit) result(est_seconds_if_all_gaps_removed double, est_seconds_current_layout double) begin select sum(F_CostDTT(1))/1e6 as est_seconds_if_all_gaps_removed, sum(F_CostDTT(dist))/1e6 as est_seconds_current_layout from ( select bitnum, coalesce(bitnum - min(bitnum) over ( order by bitnum rows between 1 preceding and 1 preceding ), 1 ) as dist from sa_get_bits(@bits) ) D; end;
The F_CostDTT() approximates the disk transfer time using the defaults for a hard drive. You could update the constants using the sa_get_dtt() procedure or you could extend the function to do that properly. The real DTT linearly approximates between band positions but this sketch doesn't take that into account.
The P_CostSequentialScan() takes a table bitmap and estimates the cost of scanning it if all the pages are perfectly contiguous and with the current layout. An example of using the procedure:
select T.table_name, C.* from sys.systab T cross apply P_CostSequentialScan(tab_page_list) C order by T."Count" desc
For example, one of my tables has a number of gaps between pages and the current estimated cost is 155ms and if it could be completely contiguous this would be 24ms.
There is another important benefit to REORGANIZE TABLE that is not currently captured by the Fragmentation tab: how closely the base table row order matches the clustered index for the table. The catalog contains estimates for each index of how closely the index matches the table order. You can use a query such as the following to report the "percent clustered" for each index:
select table_name, index_name, TCard, percent_clustered from ( select T.table_name, I.index_name , case when T.clustered_index_id = I.index_id then 'Y' else 'N' end case as Clustered , T."count" as TCard, T.table_page_count, T.ext_page_count , PI.key_value_count, PI.leaf_page_count, PI.depth , (100.*(1-PI.rand_transitions/greater(TCard,1))) as percent_clustered , PI.seq_transitions, PI.rand_transitions, (100.*PI.rand_distance/greater(rand_transitions,1)) avg_rand_distance from sys.systab T join sys.sysidx I on T.table_id = I.table_id join sys.sysphysidx PI on I.table_id = PI.table_id and I.phys_index_id = PI.phys_index_id where clustered='Y' ) D
Here the percent_clustered column shows what happens when scanning the index from beginning to end. The "percent_clustered" rows are either on the same page as the previous row or the immediately following page; the remainder of rows are some larger distance apart (2 or more pages).
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
As to row splits:
In case a row split is a necessity because a row is that large that it simply won't fit on a single page (say, during its insert it has more than 4K (minus overhead) data in a database with 4K pages), will it occupy two (or more) continuous pages (still one base page and further extension pages) and therefore will be stored in an optimal fashion?
If my assumtion is correct: Is there any means to distinguish such necessary row splits from those that are caused by inserting "small" data and then later "fill the real values" by updates? - I guess sa_table_fragmentation() would simply return a higher segs_per_row value in both cases...
Volker, I must say I miss your presence at the summit. It is good to have your counterpoints in the discussion.
The sa_table_fragmentation() can tell you how many rows per segment but not WHY there is a split. If it is due to rows longer than a page, then reorganizing the table will not improve matters. If it is due to inserting short rows and later updating to larger values, then it would be improved by reorganization.
You can use values in SYS.SYSTAB to try to distinguish these two cases, but these values are only describing the entire table and won't really help if you have a lot of nicely behaved data and a small amount of different data.
Consider: select table_name, "count", "count"/table_page_count as rows_per_table_page, table_page_count, ext_page_count from sys.systab where table_name in ('TRowFragments','TLongRows');
Here we estimate the number of rows per base table page. If the rows were large when inserted (large with respect to page size), the rows per page will be low, perhaps as low as one if most of the table rows are large. On the other hand, with rows that grow after insertion, the rows per page will be higher (perhaps 30 or so). The sa_table_fragmentation() tells you how many row segments need to be read to completely read a row.
User | Count |
---|---|
66 | |
10 | |
10 | |
10 | |
10 | |
8 | |
6 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.