cancel
Showing results for 
Search instead for 
Did you mean: 

FreePage different after database reload

Former Member
3,504

We currently use SQL Anywhere 12 in our consumer product.

With the absence of a "auto-shrink" feature in SQL Anywhere 12, in order to keep the databases manageable we automatically rebuild them (using dbunload via dbtools). However, we only do this if there appears to be a great amount of free/dirty pages in the database.

For example, if DB_PROPERTY('FreePages') + (DB_PROPERTY('CleanablePagesAdded') - DB_PROPERTY('CleanablePagesCleaned')) is above some amount on unload, then we will rebuild the database.

The problem is, the aforementioned properties don't seem to be accurate when we go to unload the database -- even after running the database cleaner. However, if the database is reloaded then these properties reflect the actual number of free pages.

To reproduce this (on a much smaller scale):

import sqlanydb
import os

def get_page_count(connection😞
    """
    Gets the number of free and dirty pages.
    """
    cursor = connection.cursor()
    cursor.execute("SELECT CAST(DB_PROPERTY('FreePages') AS INT), CAST((DB_PROPERTY('CleanablePagesAdded')-DB_PROPERTY('CleanablePagesCleaned')) AS INT);")
    free, dirty = cursor.fetchone()
    cursor.close()

    return free, dirty

def run_cleaner(connection😞
    """
    Forces the database cleaner to run
    """
    cursor = connection.cursor()
    cursor.execute('CALL sa_clean_database();')
    cursor.close()

def fill_with_data(connection😞
    """
    Adds a bunch of data to the pad table.
    """
    cursor = connection.cursor()
    cursor.execute('BEGIN TRANSACTION;')
    cursor.execute('CREATE TABLE "Pad" ("foo" BINARY(1024), "bar" BINARY(1024));')

    for x in xrange(0, 1024😞
        cursor.execute("INSERT INTO Pad (foo, bar) VALUES(CAST(newid()+newid()+newid()+newid() AS BINARY(1024)), CAST(newid()+newid()+newid()+newid() AS BINARY(1024)));")

    cursor.execute('COMMIT;')
    cursor.close()

def empty_data(connection😞
    """
    Empty the data from the pad table.
    """
    cursor = connection.cursor()
    cursor.execute('BEGIN TRANSACTION;')
    cursor.execute('DELETE FROM Pad;')
    cursor.execute('COMMIT;')
    cursor.close()

def make_db():
    """
    Create the test database.
    """
    path = os.path.join(os.path.abspath(os.path.curdir), 'example.db')

    if os.path.exists(path😞
        os.remove(path)

    connection = sqlanydb.connect(uid='DBA', pwd='sql', dbn='utility_db')

    cursor = connection.cursor()
    cursor.execute("CREATE DATABASE '%s';" % path)
    connection.close()

    return path

def main():
    path = make_db()
    connection = sqlanydb.connect(uid='DBA', pwd='sql', dbf=path)

    # Add a bunch of data
    fill_with_data(connection)
    print('After add: %d free pages, %d dirty pages' % get_page_count(connection))

    # Remove it
    empty_data(connection)
    print('After remove: %d free pages, %d dirty pages' % get_page_count(connection))

    # Run the cleaner
    run_cleaner(connection)
    print('After cleaner: %d free pages, %d dirty pages' % get_page_count(connection))

    connection.close()

    # Load the same database again
    connection2 = sqlanydb.connect(uid='DBA', pwd='sql', dbf=path)
    print('After reload: %d free pages, %d dirty pages' % get_page_count(connection2))
    connection2.close()

if __name__ == '__main__':
    main()

Which outputs:

$ python freepages-unload.py
After add: 4 free pages, 38 dirty pages
After remove: 4 free pages, 124 dirty pages
After cleaner: 90 free pages, 0 dirty pages
After reload: 97 free pages, 0 dirty pages

So the question is, why is the number of free pages different after a reload? Is it the cache? And is there any way to get an accurate reading? Some of these questions are also raised on the (very good) SQL Anywhere blog

Accepted Solutions (1)

Accepted Solutions (1)

MarkCulp
Participant

I believe that these properties are not updated in the system catalog until a checkpoint of the database. Try issuing an explicit CHECKPOINT before querying the properties.

Former Member
0 Kudos

That works -- thanks for the speedy reply.

New output:

$ python freepages-unload.py
After add: 12 free pages, 0 dirty pages
After remove: 12 free pages, 86 dirty pages
After cleaner: 98 free pages, 0 dirty pages
After reload: 97 free pages, 0 dirty pages
Breck_Carter
Participant
0 Kudos

Does the cleaner break the rule that row positions are immutable? i.e., the page on which a row resides remains unchanged after insert.

Former Member
0 Kudos

Is the database cleaner run as part of a CHECKPOINT?

It seems the CHECKPOINT helps in some cases but if (most) of a database is deleted and CHECKPOINT'd then the values still seem to be stale until the next reload?

Breck_Carter
Participant

What makes you think the values are "stale"? A reload (a.k.a. reorg) absolutely positively changes everything about how rows are stored in pages, as opposed to the documentation about the cleaner which is vague about when it runs, what it actually does, and how thoroughly it does its job (well, the docs are fine, the cleaner's behavior is supposed to be vague 🙂

AFAIK the cleaner doesn't do everything a reorg does, otherwise it would be called "reorg" instead of "cleaner"... and a reorg definitely changes the number of free pages, it's why people run a reorg (or one of the reasons).

Note that two rules apply to rows: First, a row is not moved from the page it is originally inserted on. Second, a page can only hold rows from one table, not multiple tables. A reorg breaks the first rule, of course (that's why it's called "reorg")... and I am not sure if the cleaner can break the rule as well (hence my question above).

A re-reading of the cleaner docs reveals this phrase "reorganizing database pages for more efficient access"... does that apply to data pages, or index pages, or both? I am guessing that your testing proves the cleaner does NOT do everything a reorg (reload) does... perhaps you could call sa_clean_database() and then CHECKPOINT to see if forcing a long cleanup period causes it to match the effects of a reorg.

Example: Two pages contain one non-deleted row each, with lots of empty space on each page. At a minimum, a reorg will move one of those rows to a different page, thus creating one free page. Will the cleaner do that? Or does it just create a free page if all the rows on that page have been deleted?

FWIW suggestions to "run a CHECKPOINT" are often made because some statistical columns in the system catalog tables can become out-of-date and a CHECKPOINT will force them to be updated; e.g., the number of rows in SYSTAB.count. I believe the motivation behind letting these statistics become out of date is "nobody wants to spend the money updating the system catalog tables every single time a single row is inserted or deleted or etcetera"... which is true as far as I'm concerned 🙂

MarkCulp
Participant

The primary purpose of the cleaner is to 'clean' pages of rows which have been deleted. I.e. when a row is deleted it may not actually be removed from the table, but rather just marked as deleted... and it is the cleaner's role to find those rows (we keep track of where they are) and physically removed them from the table pages.

The cleaner never moves row "heads" (the piece that starts a row). AFAIK the reference to "reorganizing database pages for more efficient access" refers to moving data around within a database page, not between (from one to another) database pages.

Answers (1)

Answers (1)

Breck_Carter
Participant

Growth in the number of free pages versus total pages is a first-order symptom of database file bloat, and it's easy to measure (FreePages versus FileSize).

Growth in the number of sparsely-filled pages is a second order symptom, harder to measure (see this answer describing sa_index_levels, sa_index_density, sa_table_fragmentation and sa_table_stats).

Growth in number of cleanable pages (which BTW is not the same as dirty pages) is probably a third- or fourth-order symptom, not worth the effort.

Suggestion: Just measure free pages versus total pages, see if it works well over time.

If it doesn't, then try adding off-peak REORGANIZE statements ahead of the free page versus total page check... the REORGANIZE operations will increase the number of free pages.

If you discover that measuring the number of cleanable pages is important, I will eat Surströmming 🙂

alt text

Former Member
0 Kudos

Thanks for the reply. I agree that the "larger than expected" file sizes are symptoms that directly relate to how we're using SQL Anywhere, e.g. we currently don't do what's recommended on http://dcx.sybase.com/index.html#1201/en/dbadmin/da-unexpected.html

Forgive my re-definition of dirty pages (cleanable pages, rather than those that need to be flushed). REORGANIZE isn't a cheap operation? The stat we're after is "how big will the database be after a rebuild," something that FreePages vs FileSize doesn't seem to tell us (even with a CHECKPOINT). And a reload certainly doesn't do a REORGANIZE?