on 2014 Feb 27 7:28 PM
We currently use SQL Anywhere 12 in our consumer product.
With the absence of a "auto-shrink" feature in SQL Anywhere 12, in order to keep the databases manageable we automatically rebuild them (using dbunload via dbtools). However, we only do this if there appears to be a great amount of free/dirty pages in the database.
For example, if DB_PROPERTY('FreePages') + (DB_PROPERTY('CleanablePagesAdded') - DB_PROPERTY('CleanablePagesCleaned')) is above some amount on unload, then we will rebuild the database.
The problem is, the aforementioned properties don't seem to be accurate when we go to unload the database -- even after running the database cleaner. However, if the database is reloaded then these properties reflect the actual number of free pages.
To reproduce this (on a much smaller scale):
import sqlanydb import os def get_page_count(connection😞 """ Gets the number of free and dirty pages. """ cursor = connection.cursor() cursor.execute("SELECT CAST(DB_PROPERTY('FreePages') AS INT), CAST((DB_PROPERTY('CleanablePagesAdded')-DB_PROPERTY('CleanablePagesCleaned')) AS INT);") free, dirty = cursor.fetchone() cursor.close() return free, dirty def run_cleaner(connection😞 """ Forces the database cleaner to run """ cursor = connection.cursor() cursor.execute('CALL sa_clean_database();') cursor.close() def fill_with_data(connection😞 """ Adds a bunch of data to the pad table. """ cursor = connection.cursor() cursor.execute('BEGIN TRANSACTION;') cursor.execute('CREATE TABLE "Pad" ("foo" BINARY(1024), "bar" BINARY(1024));') for x in xrange(0, 1024😞 cursor.execute("INSERT INTO Pad (foo, bar) VALUES(CAST(newid()+newid()+newid()+newid() AS BINARY(1024)), CAST(newid()+newid()+newid()+newid() AS BINARY(1024)));") cursor.execute('COMMIT;') cursor.close() def empty_data(connection😞 """ Empty the data from the pad table. """ cursor = connection.cursor() cursor.execute('BEGIN TRANSACTION;') cursor.execute('DELETE FROM Pad;') cursor.execute('COMMIT;') cursor.close() def make_db(): """ Create the test database. """ path = os.path.join(os.path.abspath(os.path.curdir), 'example.db') if os.path.exists(path😞 os.remove(path) connection = sqlanydb.connect(uid='DBA', pwd='sql', dbn='utility_db') cursor = connection.cursor() cursor.execute("CREATE DATABASE '%s';" % path) connection.close() return path def main(): path = make_db() connection = sqlanydb.connect(uid='DBA', pwd='sql', dbf=path) # Add a bunch of data fill_with_data(connection) print('After add: %d free pages, %d dirty pages' % get_page_count(connection)) # Remove it empty_data(connection) print('After remove: %d free pages, %d dirty pages' % get_page_count(connection)) # Run the cleaner run_cleaner(connection) print('After cleaner: %d free pages, %d dirty pages' % get_page_count(connection)) connection.close() # Load the same database again connection2 = sqlanydb.connect(uid='DBA', pwd='sql', dbf=path) print('After reload: %d free pages, %d dirty pages' % get_page_count(connection2)) connection2.close() if __name__ == '__main__': main()
Which outputs:
$ python freepages-unload.py
After add: 4 free pages, 38 dirty pages
After remove: 4 free pages, 124 dirty pages
After cleaner: 90 free pages, 0 dirty pages
After reload: 97 free pages, 0 dirty pages
So the question is, why is the number of free pages different after a reload? Is it the cache? And is there any way to get an accurate reading? Some of these questions are also raised on the (very good) SQL Anywhere blog
Request clarification before answering.
I believe that these properties are not updated in the system catalog until a checkpoint of the database. Try issuing an explicit CHECKPOINT before querying the properties.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
What makes you think the values are "stale"? A reload (a.k.a. reorg) absolutely positively changes everything about how rows are stored in pages, as opposed to the documentation about the cleaner which is vague about when it runs, what it actually does, and how thoroughly it does its job (well, the docs are fine, the cleaner's behavior is supposed to be vague 🙂
AFAIK the cleaner doesn't do everything a reorg does, otherwise it would be called "reorg" instead of "cleaner"... and a reorg definitely changes the number of free pages, it's why people run a reorg (or one of the reasons).
Note that two rules apply to rows: First, a row is not moved from the page it is originally inserted on. Second, a page can only hold rows from one table, not multiple tables. A reorg breaks the first rule, of course (that's why it's called "reorg")... and I am not sure if the cleaner can break the rule as well (hence my question above).
A re-reading of the cleaner docs reveals this phrase "reorganizing database pages for more efficient access"... does that apply to data pages, or index pages, or both? I am guessing that your testing proves the cleaner does NOT do everything a reorg (reload) does... perhaps you could call sa_clean_database() and then CHECKPOINT to see if forcing a long cleanup period causes it to match the effects of a reorg.
Example: Two pages contain one non-deleted row each, with lots of empty space on each page. At a minimum, a reorg will move one of those rows to a different page, thus creating one free page. Will the cleaner do that? Or does it just create a free page if all the rows on that page have been deleted?
FWIW suggestions to "run a CHECKPOINT" are often made because some statistical columns in the system catalog tables can become out-of-date and a CHECKPOINT will force them to be updated; e.g., the number of rows in SYSTAB.count. I believe the motivation behind letting these statistics become out of date is "nobody wants to spend the money updating the system catalog tables every single time a single row is inserted or deleted or etcetera"... which is true as far as I'm concerned 🙂
The primary purpose of the cleaner is to 'clean' pages of rows which have been deleted. I.e. when a row is deleted it may not actually be removed from the table, but rather just marked as deleted... and it is the cleaner's role to find those rows (we keep track of where they are) and physically removed them from the table pages.
The cleaner never moves row "heads" (the piece that starts a row). AFAIK the reference to "reorganizing database pages for more efficient access" refers to moving data around within a database page, not between (from one to another) database pages.
Growth in the number of free pages versus total pages is a first-order symptom of database file bloat, and it's easy to measure (FreePages versus FileSize).
Growth in the number of sparsely-filled pages is a second order symptom, harder to measure (see this answer describing sa_index_levels, sa_index_density, sa_table_fragmentation and sa_table_stats).
Growth in number of cleanable pages (which BTW is not the same as dirty pages) is probably a third- or fourth-order symptom, not worth the effort.
Suggestion: Just measure free pages versus total pages, see if it works well over time.
If it doesn't, then try adding off-peak REORGANIZE statements ahead of the free page versus total page check... the REORGANIZE operations will increase the number of free pages.
If you discover that measuring the number of cleanable pages is important, I will eat Surströmming 🙂
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thanks for the reply. I agree that the "larger than expected" file sizes are symptoms that directly relate to how we're using SQL Anywhere, e.g. we currently don't do what's recommended on http://dcx.sybase.com/index.html#1201/en/dbadmin/da-unexpected.html
Forgive my re-definition of dirty pages (cleanable pages, rather than those that need to be flushed). REORGANIZE isn't a cheap operation? The stat we're after is "how big will the database be after a rebuild," something that FreePages vs FileSize doesn't seem to tell us (even with a CHECKPOINT). And a reload certainly doesn't do a REORGANIZE?
User | Count |
---|---|
60 | |
8 | |
7 | |
6 | |
6 | |
5 | |
5 | |
4 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.