cancel
Showing results for 
Search instead for 
Did you mean: 

How can a valid database file turn into a bad one?

VolkerBarth
Contributor
6,339

On one test system with 12.0.1.3356 on Win32, I cannot start the database after a regular machine shutdown. When starting the database server, it issues error SQLCODE -1006 (SQLE_FILE_BAD_DB, "Unable to start specified database: '%1' is not a valid database file"). Rather surprising, as the shutdown was a regular one.

Well, of course I have a backup, and the file has been validated successfully just before the backup was done (yep, I'm aware that validating the backup is even smarter...). Unfortunately, the backup file just issues the same problem.

FWIW: DBLOG doesn't work against the database file (it issues the same error), whereas the log can be translated successfully.

Are there any hints what might be the cause for the bad file? I do not have hints for disk problems, and the SQL Anywhere 12 installation seems o.k. - at least I can start other databases...

As said, it's a test system so there's no problem with lost data here, but I'd like to get to know what could have happened.

Accepted Solutions (1)

Accepted Solutions (1)

Former Member

Sorry folks, I have not been very active on the forum lately and am therefore very late with my reply.

As Volker has already indicated in a different thread, the documentation states very clearly that a database should be backed up prior to upgrading and should be restarted immediately upon a successful upgrade. What the documentation does not state very clearly is that a failed upgrade WILL ALWAYS lead to an invalid database file. The reason is quite simple. At the start of the upgrade process, the server hammers the set of bytes in the database file that indicate whether or not this database is valid. These bytes are then restored at the end of the upgrade process provided the upgrade was successful. The whole idea is that if the upgrade is unsuccessful for whatever reason, then since we cannot be certain what state the database is in with respect to the partially updated catalog tables, procedures and options, we make it impossible for the user to continue using the database. Instead, we expect that the user will be able to revert to their backup copy.

Now, with respect to your failed upgrade Volker, I have reproduced the problem that you are seeing and have opened a support case to have the problem fixed.

VolkerBarth
Contributor
0 Kudos

Well, as I pointed out in the other question, a failed upgrade should primarily make it more clear that it had failed at all...

As stated, a second attempt to upgrade that database seemed to be successful as it gave no error message, and the expected SYSHISTORY upgrade entry was added. (Yes, I did not restart the database in-between. I should.)

Karim, I'm glad the issue is reproducable and is about to be fixed. Could you supply the CR number as soon as it is available?

Former Member
0 Kudos

Volker, the CR number is 675790.

VolkerBarth
Contributor
0 Kudos

Thanks, so I'm gonna have a look now and then...

VolkerBarth
Contributor
0 Kudos

@Karim: Just to get this closed: Any news on this topic?

FWIW, the public documentation for this CR number is not that verbose:).

Answers (4)

Answers (4)

Breck_Carter
Participant

Run a test to check for bad RAM.

Bad RAM can mimic, and then permanently cause, disk errors, so it is a huge deal. If you suspect bad RAM, and you care at all about the contents of your hard drives, shut your computer off immediately because ANYTHING you do can result in permanent damage.

Then get/build a bootable RAM checker like http://www.memtest86.com/

VolkerBarth
Contributor
0 Kudos

That's a further possibility, for sure. I'm gonna take care.

graeme_perrow
Advisor
Advisor
0 Kudos

My old laptop had bad RAM and I didn't know it. For two years I used this machine, which sometimes worked fine and other times was unbearably slow. Eventually I found the Windows Memory Diagnostics utility (mdsched.exe) which found the problem. Got a new laptop, identical to the old one but with good RAM, and it's like a brand new machine.

VolkerBarth
Contributor
0 Kudos

Breck, thanks for the pointer.

Fortunately/unfortunately(?), several passes of Memtest-86 have not revealed any errors. So I guess this won't be the cause...

Breck_Carter
Participant
0 Kudos

Wow... it took two years to convince Mordac The Preventer Of Information Services that you had a problem... think of the money he saved the company by deferring the issue 🙂

graeme_perrow
Advisor
Advisor
0 Kudos

No, it was my fault. I complained locally but never told IT. I ran every test I could think of over that time but could never find a problem. Didn't think to test RAM. Once I did find the problem and told IT about it, I had a new laptop the next day.

jeff_albion
Advisor
Advisor

Mark's idea is a good path to go down: if you type: "dbinit new.db" and compare the first 4096 bytes to the original database's 4096 bytes, are there any major areas on the original page that look like they might be missing? Opening a technical support case might be another good way to go, so that we can bring the file in and take a look.

VolkerBarth
Contributor
0 Kudos

Jeff, another good starting point, thanks:)

Comparing with a new db was not that helpful, as the invalid db states 12.0.1.3324 as its version number whereas new ones are created with 12.0.1.3356. Obviously, this version is written several times in the header page. However, I noted that the header page for strongly encrypted databases seems quite similar to those of not unencrypted ones. (Note, the db in question is strongly encrypted.)

So I compared the header page of a production database and the test database, and both name the same version (though the latter has been upgraded to 12.0.1.3356 recently).

Diff'ing both headers revealed several occurences of 0x3F in the failing db where the working db has 0xAD - typically in the "Sybase Inc., Copyright" strings. Besides that, both headers seem identical, except the last 4 bytes (0xB5B3EFD0 vs. 0x8362BF7A).

Could the upgrade lead to this problem? The failing start was the first database start after I had issued

ALTER DATABASE UPGRADE PROCEDURE ON

two days before...

MarkCulp
Participant

Page 0 is not encrypted in encrypted DBs' headers (after all we still need to determine that the file is a database and that it is encrypted) so it is not surprising that you have seen much of a difference.

If you are seeing 0x3F or 0xAD bytes within the "Sybase Inc., Copyright..." area (i.e. trailing bytes) of page 0 then I question how did you view the page? ... because this area of the page should not have any such bytes. (at least in my testing I do not see any).

So, how did you extract/view/compare the bytes within page 0 of each database file?

VolkerBarth
Contributor
0 Kudos

I used VisualStudio's binary editor to look at the headers.

However, as the database file is to big to be view directly, I used a small SQL block (what else?!) to read the file in a long varbinary variable and write the first 4096 bytes with

unload select ... format text escapes off quotes off hexadecimal off.

Tomorrow I'm gonna check whether a direct opening of smaller databases will reveal the same bytes...

VolkerBarth
Contributor
0 Kudos

Well, I just tested from a different box and used a long binary (intead of long varchar) variable - possibly a relevant change...

On this box, the header doesn't contain 0xAD and just one 0x3F in the first some bytes - and the copyright string is correct.

More tests tomorrow...

johnsmirnios
Participant

fc /b file1 file2 will show you bytes that have changed. The important parts are in the area prior to the copyright strings which are just there as informational filler. Usually, "not a database file" means that the signature is wrong. What are the bytes at offset 0x14? They should be 5E BA 7A DA (or little endian for "DA7ABA5E" (DATABASE)).

MCMartin
Participant

Winhex is a good tool to look at files and also at harddisks directly. If you use the evaluation version you can view but not change, anyway not too expensive...

VolkerBarth
Contributor
0 Kudos

Ah, WinDiff hasn't revealed differences in those leading bytes, but there are: The corrupt database has 75 BD 9A DA at offset 0x14.

Comparing a valid production db file with the corrupt one reveals 35 different bytes in the header page. I do not know if they should be identical, as the test db is an older "fork" of the production db.

I can present a WinHex diff report, if that's helpful for anyone interested...

Would it be of any worth to try to "fix" these differences and check whether the database is running again? (I guess the answer will be "no", as the chance will be very high that other pages are somewhat invalid, too...)

VolkerBarth
Contributor
0 Kudos

Great tip, I tried it immediately - thanks!

MarkCulp
Participant

Ah, so we now have the answer: The signature that you see (75 BD 9A DA) indicates that an UPGRADE of the database has failed, so the database is in a "bad" (unknown) state and cannot be used. Is it possible that an upgrade was in progress when you shutdown the database server/computer?

VolkerBarth
Contributor
0 Kudos

No, but (as stated in my first comment on this answer) I did an upgrade when I had started the database, and I remember that the first run issued a warning (which I forget to write down). A further attempt seemed to work, and so I thought everything went fine. In syshistory, the UPGRADE from 12.0.1.3324 to 3356 was noted as expected.

(FWIW, I tried to study the new SQL Remote HTTP stuff...).

Then I used the database to do some tests with OLAP and spatial data (which led to further questions here...). Finally, the server was restarted, and after that the database became invalid...

So it seems the UPGRADE failed somewhat. I csn supply the translated log for that part, if that would help...

johnsmirnios
Participant
0 Kudos

It would be most useful if you had a copy of the pre-upgrade database (you made one, right?) and can reproduce an upgrade failure with it.

VolkerBarth
Contributor
0 Kudos

Not exactly - it's a test database, and the automated backup event has backed up the upgraded database as well (as it was validated successfully).

But I've just taken a copy of a production database and tried to upgrade that - and it returns the same error (translated from German):

"Error in line 1 - procedure 'sa_oledb_catalogs' not found."

In the dbisqlc message pane there are several errors shown:

Error in file sybprocs.sql in line 5379 with SQLCODE -110
...
Error in file oleschem.sql in line 64 with SQLCODE -265

Now, syshistory doesn't have an UPGRADE entry and ... after the database is shutdown, it won't start.

So I have a reproducable:)

VolkerBarth
Contributor
0 Kudos

And it has the 75 BD 9A DA signature. Anyone interested in the translog? - I started a fresh one just before the upgrade, so it starts with

set option PUBLIC.upgrade_capability = '';

VolkerBarth
Contributor
0 Kudos

...whereas a working upgrade (see my own answer) continues with another transaction starting with

drop procedure dbo.sa_oledb_catalogs
go
create procedure dbo.sa_oledb_catalogs(...

statements...

Former Member
0 Kudos

FYI, there are many Hex editors. I like frhed.

MarkCulp
Participant

Its hard to tell why/how your database file has become corrupted without looking at the file itself. Since you are getting the "... is not a valid database file" error message it would imply that page zero has become corrupted.

You may want to read the SQL Anywhere I/O Requirements for Windows and Linux whitepaper since a misconfigured computer can lead to this type of corruption.

VolkerBarth
Contributor
0 Kudos

Well, I could give the default "but I don't have changed anything recently" response...

It's a low-cost machine that has run SQL Anywhere test systems (v8-v12) for years on the same hardware with no issues at all, with a classic HD drive used with NTFS. CHKDSK /F doesn't reveal any errors, nor does the event viewer.

The one thing I do have changed is a second HD drive that has been added a few days ago but that has not been used by the database system. I would not expect that this has influence on the flushing/write through behaviour.

Besides that, the backup has been written to the local disk and then copied to a net share - both copies do show the error.

What else might I have to check?

MCMartin
Participant
0 Kudos

Just a simple idea: maybe a virus which is adding or changing some bytes in the .db file?

VolkerBarth
Contributor
0 Kudos

No, according to the AV software... and by accident, I do not have excluded the database directory from scanning:)

VolkerBarth
Contributor
0 Kudos

By trying to upgrade from 12.0.1.3324 to 12.0.1.3356 - at least in my case:(

Cf. the bunch of comments on Jeff's answer...

FWIW, I've attached the translated log from the

ALTER DATABASE UPGRADE PROCEDURE ON

statement.

VolkerBarth
Contributor
0 Kudos

FWIW, I tested again, now with Sybase Central's Upgrade wizard. That way I could only issue an

ALTER DATABASE UPGRADE JCONNECT OFF

statement, i.e. the PROCEDURE ON clause was not available.

Interestingly enough, this upgrade has worked, the database can be restarted, SYSHISTORY claims an 3356 upgrade, and the SQL Remote HTTP stuff seems to be added, too (at least there are these sr_xxx procedures and functions).

So I'm about to ask:

Is the note in the sqlremote1201.pdf doc file (p. 109) wrong that I have to apply the upgrade with PROCEDURE ON to get the desired functionality?