cancel
Showing results for 
Search instead for 
Did you mean: 

How can I show if an assertion failure is caused by hardware?

Former Member
4,741

After 3 occurrences of " ERROR Assertion failed: 101412[LIVE] (11.0.1.2713) Page number on page does not match page requested" with 3 subsequent database unload/reloads (all of this in 2 weeks), I am convinced I'm staring down the barrel of a hardware problem on the database server.

Any ideas on how to go about testing for this? Or proving it to the IT department who's virtual server my database is running on?

Accepted Solutions (0)

Answers (3)

Answers (3)

jeff_albion
Product and Topic Expert
Product and Topic Expert

If you have a support plan, you can open a technical support case and submit the database for review. If the page in the database is genuinely corrupted, the "type" of corruption that happened to the page can sometimes give clues as to what might have happened. Some real-life past examples from other customers:

  1. Looking at the page in question in a hex editor found a bunch of "mso-font-alt" and "xmlns:o="urn:schemas-microsoft-com:office:office" HTML tags in the page, in the middle of the file - this indicates a OS file system or possible hardware issue from a file block overwrite.
  2. The page(s) in question are just 'zero'ed', in the middle of the file, in an perfect 64K block. This more strongly correlates to a hardware problem, due to the block size.
  3. The page is at the end of the file, and the file length in the database header does not match the file size on the disk - this indicates an OS file metadata flushing issue upon shutdown, which is caused by your disk driver.
  4. Upon looking at the actual bytes in the file, we can see this was a database software issue and recommend a CR fix number that is associated with the problem (or create a CR fix to solve the problem and release it in an EBF).

Reviewing the database for the type of corruption through a technical support plan is your best plan for trying to determine "who is to blame."

MarkCulp
Participant

Showing that you have a hardware problem is difficult. I would recommend that you consult your hardware manufacturer's documentation to see if they have any hardware diagnostic tools? You may also find some generic tools - memory tests, disk tests, etc - online by doing some googling.

Your issue could also be caused by software errors or a misconfigured system. Ensure that your disk, file system and operating system is configure to not cache disk pages. For example, if you are using Linux you must ensure that the file system has write caching turned off.

Breck_Carter
Participant

Liam used the term "IT department" in a sentence, whereas your answer seems to assume a different sort of environment, you know, one where everyone pulls together in common cause 🙂

alt text

Former Member
0 Kudos

Breck has a point. When working with one's own IT team, resolution tends to be a smoother road. When dealing with a client, one always has to be careful how you approach it. IT guys (as per your cartoon), can be quite defensive if a service provider even suggest there may be a problem on their hardware 😉

Former Member

Just some feedback: After adding the -u option (http://dcx.sybase.com/index.html#1101/en/dbadmin_en11/u-database-dbengine.html) to disable direct i/o (thank's Eric and Jeff), the client's database has been running smoothly for 3 weeks now.

The IT department still disputes the contention that it's a hardware problem though 😉

Breck_Carter
Participant
0 Kudos

They may be correct, since a VM is software 🙂

The Help topic doesn't say dbsrv11 -u explicitly "disables" anything, it changes behavior to go through the OS cache system, and it seems to imply you only need to do this if the database is running on an overcrowded computer... like a VM... which IMO is a bad thing to do with a busy database.

What else is running on the same computer? Does IT understand that you don't get something for nothing, not even with a VM?