cancel
Showing results for 
Search instead for 
Did you mean: 
Read only

Error starting database / nameserver

Former Member
0 Likes
11,128

Hi gurus,

After unexpected server shutdown HANA is not starting. Getting a little more deeper in the logs I´ve found the error listet bellow, it happens in nameserver startup process. Any hint?

hana001:/usr/sap/HDB/SYS/exe/hdb> ./hdbnameserver

service startup...

accepting requests at 127.0.0.1:30201; 127.0.0.2:30201

searching for master nameserver hana001:30201 ...

assign as master nameserver. assign to volume 1 started

checking for recovery request ...

loading topology ...

opening persistence ...

assign failed with persistence startup error. exception  1: no.3020046  (DataAccess/PageAccess/impl/PageImpl.cpp:399)

    Wrong savepoint version: Expected 8373 but found 8361.

exception throw location:

1: 0x00007fb55f11bbbc in PageAccess::Page::verifyHeader(PageAccess::SizeClass, DataAccess::SavepointVersion const&, bool) const+0x3d8 at PageImpl.cpp:399 (libhdbdataaccess.so)

2: 0x00007fb55ee80da7 in DataAccess::SavepointImpl::loadRestartPage(PageAccess::PageIO&, unsigned long, bool, bool)+0x693 at Page.hpp:277 (libhdbdataaccess.so)

3: 0x00007fb55ee0205f in DataAccess::PersistenceManagerImpl::prepareOpen(unsigned long, bool, bool)+0x4b at PersistenceManagerImpl.cpp:5385 (libhdbdataaccess.so)

4: 0x00007fb55ee0223c in DataAccess::PersistenceManager::open(ltt::refcounted_handle<DataAccess::PersistenceConfiguration> const&, bool)+0x88 at PersistenceManagerImpl.cpp:2461 (libhdbdataaccess.so)

5: 0x00007fb56c510771 in PersistenceLayer::PersistenceSystem::initialize(NameServer::ServiceStartInfo const&, bool, PersistenceLayer::PERSISTENCE_MODE)+0x4a0 at PersistenceSystem.cpp:392 (libhdbpersistence.so)

6: 0x00007fb56c5437bc in PersistenceLayer::PersistenceFactory::initPersistence(PersistenceLayer::PERSISTENCE_MODE, ltt::releasable_handle<DataAccess::LoggerFactory>&, DataAccess::TransactionCallback*, NameServer::ServiceStartInfo&, ltt::refcounted_handle<TransactionManager::TransactionControlBlockFactory>&, bool)+0x158 at PersistenceFactory.cpp:394 (libhdbpersistence.so)

7: 0x00007fb56c43375a in PersistenceController::startup(PersistenceLayer::PERSISTENCE_MODE, NameServer::ServiceStartInfo*, bool, DataAccess::TablePreloadWriteCallback*, DataAccess::TablePreloadReadCallback*, Backup::RecoverCbc_Federation*)+0x1256 at PersistenceController.cpp:559 (libhdblogger.so)

8: 0x00007fb5751eedd5 in NameServer::Topology::initPersistence(NameServer::ServiceStartInfo&, bool, bool, TREX_ERROR::TRexError*, bool, ltt_adp::basic_string<char, ltt::char_traits<char>, ltt::integral_constant<bool, true> >, NameServer::ServiceStartInfo::RequestAction)+0x421 at Topology.cpp:285 (libhdbns.so)

9: 0x00007fb5752fb0c4 in NameServer::TREXNameServer::loadTopology(NameServer::LoadTopologyMode, NameServer::ServiceStartInfo&, Backup::Backup_ExtendedRecoveryInformation*)+0x5b0 at TREXNameServer.cpp:18396 (libhdbns.so)

10: 0x00007fb575305c23 in NameServer::TREXNameServer::assign(NameServer::ServiceStartInfo&)+0x2790 at TREXNameServer.cpp:2123 (libhdbns.so)

11: 0x00007fb5762d4f17 in TRexAPI::TREXIndexServer::assign(NameServer::ServiceStartInfo&, bool, TREX_ERROR::TRexError&)+0x93 at TREXIndexServer.cpp:992 (hdbnameserver)

12: 0x00007fb57630c129 in TRexAPI::AssignThread::run(void*)+0x35 at TREXIndexServer.cpp:543 (hdbnameserver)

13: 0x00007fb56cd03db5 in TrexThreads::PoolThread::run()+0x831 at PoolThread.cpp:389 (libhdbbasement.so)

14: 0x00007fb56cd056f0 in TrexThreads::PoolThread::run(void*&)+0x10 at PoolThread.cpp:165 (libhdbbasement.so)

15: 0x00007fb55471a9f0 in Execution::Thread::staticMainImp(void**)+0x700 at Thread.cpp:461 (libhdbbasis.so)

16: 0x00007fb55471bfc8 in Execution::Thread::staticMain(void*)+0x34 at ThreadMain.cpp:26 (libhdbbasis.so)

stopping service...

persistence initialization failed -> stopping instance ...

cannot send signal. (2, No such file or directory)

prepare for shutting service down...

stop ClockMonitor thread...

stop MasterTokenLockWriter /usr/sap/HDB/SYS/global//hdb/nameserver.lck thread...

setInactive(nameserver@hana001:30201)

hana001:/usr/sap/HDB/SYS/exe/hdb>

Also bellow HDB start log:

hana001:/usr/sap/HDB/HDB02> ./HDB start

StartService

Impromptu CCC initialization by 'rscpCInit'.

  See SAP note 1266393.

OK

OK

Starting instance using: /usr/sap/HDB/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 02 -function StartWait 2700 2

08.06.2016 03:44:24

Start

OK

08.06.2016 03:44:46

StartWait

FAIL: process hdbdaemon HDB Daemon not running

Final part from Daemon trace:


[17502]{-1}[-1/-1] 2016-06-08 03:44:28.545057 i Daemon       TrexDaemon.cpp(03513) : stdout = /usr/sap/HDB/HDB02/hana001/trace/xsuaaserver.out
[17502]{-1}[-1/-1] 2016-06-08 03:44:28.545062 i Daemon       TrexDaemon.cpp(03513) : stderr =
[17502]{-1}[-1/-1] 2016-06-08 03:44:28.545066 i Daemon       TrexDaemon.cpp(03513) : maxstdfiles = 2
[17502]{-1}[-1/-1] 2016-06-08 03:44:28.545071 i Daemon       TrexDaemon.cpp(03513) : runlevel = 4
[17502]{-1}[-1/-1] 2016-06-08 03:44:28.545075 i Daemon       TrexDaemon.cpp(03513) : flags =
[17502]{-1}[-1/-1] 2016-06-08 03:44:28.545080 i Daemon       TrexDaemon.cpp(03513) : window =
[17502]{-1}[-1/-1] 2016-06-08 03:44:28.545084 i Daemon       TrexDaemon.cpp(03513) : isolation = 0
[17502]{-1}[-1/-1] 2016-06-08 03:44:28.545088 i Daemon       TrexDaemon.cpp(03513) : userId = 4294967295
[17502]{-1}[-1/-1] 2016-06-08 03:44:28.545093 i Daemon       TrexDaemon.cpp(03513) : groupId = 4294967295
[17502]{-1}[-1/-1] 2016-06-08 03:44:28.545104 i Daemon       Daemon.cpp(00666) : runlevel 0 completely started
[17502]{-1}[-1/-1] 2016-06-08 03:44:28.545115 i Daemon       Program.cpp(00173) : line up of program group mdcdispatcher to instances <none>.
[17502]{-1}[-1/-1] 2016-06-08 03:44:28.545126 i Daemon       Program.cpp(00173) : line up of program group nameserver to instances 0.
[17502]{-1}[-1/-1] 2016-06-08 03:44:28.545460 i Daemon       TrexDaemon.cpp(02078) : starting program at '/usr/sap/HDB/HDB02/exe/hdbnameserver' with args ''
[17502]{-1}[-1/-1] 2016-06-08 03:44:28.569590 i Daemon       RunningInstance.cpp(00191) : start 'hdbnameserver' as process 17518
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.313937 i Daemon       SignalsUNIX.cpp(00647) : signo 3 SIGQUIT from user errno 0 code 0
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.314034 i Daemon       SignalsUNIX.cpp(00647) : sender pid 17518 real user id 1000 executable '/hana/shared/HDB/exe/linuxx86_64/HDB_1.00.112.02.1459851171_2840326/hdbnameserver'
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.314167 i Daemon       TrexDaemon.cpp(03558) : got shutdown event (stop)
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.314228 i Daemon       Daemon.cpp(00752) : comment file contains: nameserver: persistence initialization failed
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.314836 i Daemon       RunningInstance.cpp(00214) : stop process hdbnameserver with pid 17518
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.315143 i Daemon       TrexDaemon.cpp(02598) : stopped child with pid 17518 (17518)
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.401872 i Daemon       TrexDaemon.cpp(03778) : process hdbnameserver with pid 17518 exited normally with status 1
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.401939 i Daemon       TrexDaemon.cpp(03850) : all instances in runlevel 1 stopped
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.401952 i Daemon       Program.cpp(00173) : line up of program group dummyserver to instances <none>.
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.402492 i Daemon       ProgramStarter.cpp(00229) : writing started programs file /usr/sap/HDB/HDB02/hana001/lock/started_programs.txt:
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.528876 i Daemon       TrexDaemon.cpp(01616) : cleaning all
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.528931 i Daemon       TrexDaemon.cpp(02681) : found 8 segments in use with maximum id 8
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.529017 i Daemon       TrexDaemon.cpp(02705) : segment 4, key 0x9f08cf81, id 3506180, sequence 107, owner 1000/1001, size 117, atime 2016-06-08 03:44:32, dtime 2016-06-08 03:44:35, ctime 2016-06-08 03:18:33, created by pid 17077, last changed by pid 17077, nattach 0
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.529056 i Daemon       TrexDaemon.cpp(02705) : segment 5, key 0x00000000, id 3538949, sequence 108, owner 1000/1001, size 1060911, atime 2016-06-08 03:44:35, dtime 2016-06-08 03:44:35, ctime 2016-06-08 03:18:33, created by pid 17077, last changed by pid 17077, nattach 1 (deleted)
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.529069 i Daemon       TrexDaemon.cpp(02705) : segment 6, key 0x00000000, id 3571718, sequence 109, owner 1000/113, size 1024, atime 2016-06-08 03:44:26, dtime 2016-06-08 03:44:26, ctime 2016-06-08 03:44:25, created by pid 17494, last changed by pid 17494, nattach 1
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.529157 i Daemon       Main.cpp(01530) : exit hdbdaemon
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.529171 i Daemon       Main.cpp(01541) : Success: /sapmnt/ld7272/a/HDB/jenkins_prod/workspace/FA_CO_LIN64GCC48HAPPY_rel_fa~newdb100_rel/sys/src/TrexDaemon/Main.cpp:1541 in "int main(int, char**)". Leaving the "main"
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.530121 i Network      NetworkListener.cpp(00805) : closing listen socket 7 bound to 127.0.0.1
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.530283 i Network      NetworkListener.cpp(00805) : closing listen socket 8 bound to 127.0.0.2
View Entire Topic
n_vorstermans2
Explorer
0 Likes

Have you first checked if any PID's are still running? Looking at the screenshots I get the idea your system still has processes running in the background (especially on the last message). As this is not Production or is it live, you can run "HDB kill" on the OS to make sure everything is down before trying again.

The lock files was based on your earlier screenshot where it had an ID mismatch. They are generally found in /sapmnt/<SID>/HDB<Instance#>/<host>/lock

Former Member
0 Likes

Nick,

I´ve moved /hana/shared/HDB/HDB02/backup/log to somewhere else, after this full backup restore worked ok. The environment is back. For my current understanding is weird, but ok, gonna move forward and check it later.

Thanks for your help!