cancel
Showing results for 
Search instead for 
Did you mean: 
Read only

HANA indexserver crashing during backups on RHEL9

perryt11
Newcomer
0 Kudos
336

Hello,

I have a HANA DB on version 2.00.075.00 with Red Hat Linux 9.4. I am experiencing an issue on my HANA DB where during some backups, performance of the DB is severely degraded with numerous I/O errors showing on the OS. It will eventually keep throwing I/O errors until the HANA directory is unmounted from the system, thus resulting in HANA crashing. This has happened at least three times, and in the three crash dumps I have, they all list this call stack:

 

[CRASH_STACK] Stacktrace of crash: (2024-10-08 05:29:41 846 Local)
----> Pending exceptions (possible root cause) <----
--- Pending assertion:
exception 1: no.2000008 (Basis/IO/FileAccess/impl/LocalFileCompletionThread.cpp:338) TID: 3245
Error during asynchronous file transfer (async_write), rc=5: Input/output error
$fileCallback$=
[W] , buffer= 0x00007e31f5473000, offset= 1264310312960, size= 0/4096/4096, file= ('<root>/datavolume_0000.dat' ((open, mode= RW, file_access= rw-rw-r--, flags= ASYNC|DIRECT|MUST_EXIST|LOCK, size= --- <exception 1: no.2001004 (Basis/IO/FileAccess/impl/LocalFile.cpp:170) TID: 3245
getSize(path= '/hana/data/HEP/mnt00001/hdb00003.00003/datavolume_0000.dat', RESOLVELINKS= true): IO error (rc= 5, 'Input/output error')
>), factory= (root= '/hana/data/HEP/mnt00001/hdb00003.00003/' (root_access= <error>, flags= AUTOCREATE_PATH|CREATE_HINT_FILE, usage= DATA, fs= xfs, config= (async_write_submit_active=on,async_write_submit_blocks=all,async_read_submit=on,num_submit_queues=1,num_completion_queues=1,size_kernel_io_queue=512,max_parallel_io_requests=64,min_submit_batch_size=16,max_submit_batch_size=64))) {shortRetries= 0, fullRetries= 0 (10/10)}
$res$=LogicalPageCtrlBlock@0x00007e31f3fc4c00 [0x9a810L@DefaultPageAccess, , FlushPPageNo=0x60001265ec46P [0x1265ec46000,4k,0], ConvPPageNo=0x60001265ec46P [0x1265ec46000,4k,0]][Managed=1/ExternalCached=0/Disp=Shortterm/InternalShortInRecovery=0/Mod=0/Del=0/isToBeDel=0/Loading=0/needsBarrierForFlush=0/useBarrierForFlush=0/isInFlushQueue=0/isInIO=1/Ref=0/IterRef=0/Check=2], Page= varsize={doublelink={logical={page={savepoint=39899,version=12,page_type=TableContainerPage,checksum_algo=CRC32,delete_mark=0,converter_type=Default,size=4k,checksum=2495033314,flush_counter=0,reserved=0},owner=0xfe00102a6f,pageno=0x9a810L,nextpageno=[invalid]},prevpageno=0x136df5L},index_size=60,data_end=480,total_free_count=59,free_count_without_data=0,skip_flag=0,without_free_space_handling=0,active_flag=1,static_alloc_done=0,delete_without_refcount=0,counter_missmatch_occured=0,flag1..2=0,LoadIDForCheckDeprecated=0,external_page_id=18446744073709551615,reserved_or_deleted_count=0,reserved0=0,reserved1=0}
exception throw location:
0: 0x00007f4d8a6d5ecc in FileAccess::SingleFileIOControlBlock::reportError(char const*, unsigned long, char const*, int)+0x38 at Basis/IO/FileAccess/impl/SingleFileCallback.cpp:142 (libhdbbasis.so)
1: 0x00007f4d8a677fa9 in FileAccess::LocalFileCompletionThread::run(Execution::ThreadRC&)+0xa45 at Basis/IO/FileAccess/impl/LocalFileCallback.cpp:172 (libhdbbasis.so)
2: 0x00007f4d8a621fd4 in Execution::Thread::staticMainImp(Execution::Thread*)+0x610 at Basis/Execution/impl/Thread.cpp:612 (libhdbbasis.so)
3: 0x00007f4d8a6285bf in Execution::pthreadFunctionWrapper(Execution::Thread*)+0x1eb at Basis/Execution/impl/ThreadInterposition.cpp:684 (libhdbbasis.so)
4: 0x00007f4d89689c02 in start_thread+0x2d0 (libc.so.6)
5: 0x00007f4d8970ec40 in __clone3+0x30 (libc.so.6)

exception type information:
- 0: public ltt::exception@0x00007f4d89ee1290
----> Symbolic stack backtrace <----
0: __pthread_kill_implementation + 0x11c
SFrame: IP: 0x00007f4d8968b94c (0x00007f4d8968b830+0x11c) FP: 0x00007f4b5dc46d40 SP: 0x00007f4b5dc46c80 RP: 0x00007f4d8963e646
Params: 0xc6a, 0xcad, 0x6, 0x7f4d8968b94c, 0x7f4b5dc46d50, 0x1
Regs: rax=0x0, rdx=0x6, rcx=0x7f4d8968b94c, rbx=0x7f4b5dc47230, rsi=0xcad, rdi=0xc6a, rbp=0xcad, r8=0x7f4b5dc46d50, r9=0x1, r10=0x8, r11=0x246, r12=0x6, r13=0x0, r14=0x7f4b8def10a0, r15=0x7f4b5dc46e20
Module: /lib64/libc.so.6
-----------------------------------------
1: __GI_raise + 0x16
SFrame: IP: 0x00007f4d8963e646 (0x00007f4d8963e630+0x16) FP: 0x00007f4b5dc46d50 SP: 0x00007f4b5dc46d40 RP: 0x00007f4d8a9ab99b
Regs: rbx=0x7f4b5dc47230, rbp=0x6, r12=0x60, r13=0x0, r14=0x7f4b8def10a0, r15=0x7f4b5dc46e20
Module: /lib64/libc.so.6
-----------------------------------------
2: raiseSIGABRT_SEGV_forCrash() + 0x5b
Symbol: _ZL26raiseSIGABRT_SEGV_forCrashv
SFrame: IP: 0x00007f4d8a9ab99b (0x00007f4d8a9ab940+0x5b) FP: 0x00007f4b5dc46de0 SP: 0x00007f4b5dc46d50 RP: 0x00007f4d8a9b6b22
Regs: rbx=0x7f4b5dc47230, rbp=0x7f4b5dc46d50, r12=0x60, r13=0x0, r14=0x7f4b8def10a0, r15=0x7f4b5dc46e20
Source: Basis/impl/Crash.cpp:738
Module: /usr/sap/HEP/HDB00/exe/libhdbbasis.so
-----------------------------------------
3: f_crashImpl(char const*, int, char const*, ltt::const_exception_ptr) + 0x2b2
Symbol: _ZL11f_crashImplPKciS0_N3ltt19const_exception_ptrE
SFrame: IP: 0x00007f4d8a9b6b22 (0x00007f4d8a9b6870+0x2b2) FP: 0x00007f4b5dc471e0 SP: 0x00007f4b5dc46de0 RP: 0x00007f4d8a9b6ca0
Regs: rbx=0x7f4b5dc47230, rbp=0x7f4d89ee1290, r12=0x60, r13=0x0, r14=0x7f4b8def10a0, r15=0x7f4b5dc46e20
Source: Basis/impl/Crash.cpp:796
Module: /usr/sap/HEP/HDB00/exe/libhdbbasis.so
-----------------------------------------
4: Basis::crashImpl(char const*, int, ltt::exception const&) + 0x10
Symbol: _ZN5Basis9crashImplEPKciRKN3ltt9exceptionE
SFrame: IP: 0x00007f4d8a9b6ca0 (0x00007f4d8a9b6c90+0x10) FP: 0x00007f4b5dc471f0 SP: 0x00007f4b5dc471e0 RP: 0x00007f4d8a9b6cae
Regs: rbx=0x0, rbp=0x7e31f3fc4c00, r12=0x7f4b5dc47230, r13=0x0, r14=0x7f4b8e1c3100, r15=0x7f4d8aad3490
Source: Basis/impl/Crash.cpp:843
Module: /usr/sap/HEP/HDB00/exe/libhdbbasis.so
-----------------------------------------
5: Basis::crashImpl(char const*, int, char const*, ltt::exception const&)
Symbol: _ZN5Basis9crashImplEPKciS1_RKN3ltt9exceptionE
Source: Basis/impl/Crash.cpp:885
NOTE: Inlined Function
-----------------------------------------
6: Basis::crashImpl(char const*, int, ltt::exception const&) + 0xe
Symbol: _ZN5Basis9crashImplEPKciRKN3ltt9exceptionE
SFrame: IP: 0x00007f4d8a9b6cae (0x00007f4d8a9b6ca0+0xe) FP: 0x00007f4b5dc47200 SP: 0x00007f4b5dc471f0 RP: 0x00007f4d8a9b8bc9
Regs: rbx=0x0, rbp=0x7e31f3fc4c00, r12=0x7f4b5dc47230, r13=0x0, r14=0x7f4b8e1c3100, r15=0x7f4d8aad3490
Source: Basis/impl/Crash.cpp:844
Module: /usr/sap/HEP/HDB00/exe/libhdbbasis.so
-----------------------------------------
7: Basis::asyncCrashImpl(char const*, int, ltt::exception const&) + 0x9
Symbol: _ZN5Basis14asyncCrashImplEPKciRKN3ltt9exceptionE
SFrame: IP: 0x00007f4d8a9b8bc9 (0x00007f4d8a9b8bc0+0x9) FP: 0x00007f4b5dc47210 SP: 0x00007f4b5dc47200 RP: 0x00007f4d8cf66112
Regs: rbx=0x0, rbp=0x7e31f3fc4c00, r12=0x7f4b5dc47230, r13=0x0, r14=0x7f4b8e1c3100, r15=0x7f4d8aad3490
Source: Basis/impl/Crash.cpp:907
Module: /usr/sap/HEP/HDB00/exe/libhdbbasis.so
-----------------------------------------
8: PageAccess::PageFlushCallback::transferError(PageAccess::PageIOCallback::Status const&, ltt::exception const&) + 0x42
Symbol: _ZN10PageAccess17PageFlushCallback13transferErrorERKNS_14PageIOCallback6StatusERKN3ltt9exceptionE
SFrame: IP: 0x00007f4d8cf66112 (0x00007f4d8cf660d0+0x42) FP: 0x00007f4b5dc472a0 SP: 0x00007f4b5dc47210 RP: 0x00007f4d8cf75880
Regs: rbx=0x0, rbp=0x7e31f3fc4c00, r12=0x7f4b5dc47230, r13=0x0, r14=0x7f4b8e1c3100, r15=0x7f4d8aad3490
Source: DataAccess/PageAccess/impl/PageFlushCallback.cpp:96
Module: /usr/sap/HEP/HDB00/exe/libhdbdataaccess.so
-----------------------------------------
9: PageAccess::PageIOCallback::transferError(FileAccess::FileCallback::Status const&, ltt::exception const&) + 0x30
Symbol: _ZN10PageAccess14PageIOCallback13transferErrorERKN10FileAccess12FileCallback6StatusERKN3ltt9exceptionE
SFrame: IP: 0x00007f4d8cf75880 (0x00007f4d8cf75850+0x30) FP: 0x00007f4b5dc472c0 SP: 0x00007f4b5dc472a0 RP: 0x00007f4d8a6d5f4f
Regs: rbx=0x0, rbp=0x7e31f3fc4dd0, r12=0x7f4b5dc472f0, r13=0x0, r14=0x7f4b8e1c3100, r15=0x7f4d8aad3490
Source: DataAccess/PageAccess/impl/PageIOImpl.cpp:200
Module: /usr/sap/HEP/HDB00/exe/libhdbdataaccess.so
-----------------------------------------
10: FileAccess::SingleFileIOControlBlock::reportError(char const*, unsigned long, char const*, int) + 0xbf
Symbol: _ZN10FileAccess24SingleFileIOControlBlock11reportErrorEPKcmS2_i
SFrame: IP: 0x00007f4d8a6d5f4f (0x00007f4d8a6d5e90+0xbf) FP: 0x00007f4b5dc47380 SP: 0x00007f4b5dc472c0 RP: 0x00007f4d8a677fa9
Regs: rbx=0x101, rbp=0x7e31f3fc4e10, r12=0x7f4b5dc472f0, r13=0x0, r14=0x7f4b8e1c3100, r15=0x7f4d8aad3490
Source: Basis/IO/FileAccess/impl/SingleFileCallback.cpp:161
Module: /usr/sap/HEP/HDB00/exe/libhdbbasis.so

 

 

This part in the crash dump led me to finding this note: 3318049 - Indexserver Crashes at __pthread_kill_implementation When Running on RHEL 9.x (https://me.sap.com/notes/3318049/). The symptoms described in this note are exactly what our systems are experiencing, however it says the solution is to upgrade to Revisions >= 070.00 (SPS07) when we are already on 2.00.075. Support continues to say that it is an issue with our underlying hardware or OS, however seeing that our crash dump is almost identical to the one in the note, I am beginning to think otherwise. Our hardware and OS vendors have not been able to find any discrepancies in our systems. This issue has been randomly occurring for over 2 months on prod DBs causing our backup process to be shaky and this note is the closest thing I have seen to addressing the problem.

Has anyone else that has ran HANA on RHEL9 ever experienced this issue or similar?

Accepted Solutions (0)

Answers (0)