Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
5,525
This post provides information on the key troubleshooting issues you might encounter while using the Fault Manager, and the various diagnostic and monitoring tools you can use to fix them. It also details recommendations on configuring your Fault Manager and SAP Host Agent. The post includes the following:
— Troubleshooting HADR System/Fault Manager Issues
— Miscellaneous Issues
— Recommendations


Troubleshooting HADR System/Fault Manager Issues


When the Root Partition is Full
On one of the hosts running the primary or companion servers, the Fault Manager heartbeat log file (dev_hbeat) may grow very large in size, and as a result, the host's root partition fills up and the asehostctrl command fails.
Resolution: Use the following command to check the size of the dev_hbeat file to determine if the increased file size is causing the failure:
sudo du -sh /usr/sap/hostctrl/work/dev_hbeat
16G /usr/sap/hostctrl/work/dev_hbeat

To resolve this issue, delete the dev_hbeat file. If the dev_hbeat file does not consume much space, you might want to check other files on the partition.

When the ASE Cockpit Frequently Displays Timeout Messages
This indicates that the sapdbctrl calls from the Fault Manager are timing out.
Resolution: Increase the timeout period for sapdbctrl by increasing the value for the ha/syb/dbctrl_timeout parameter in the Fault Manager Profile file. The default value of the parameter is 30 seconds. After you have made the necessary changes, restart the Fault Manager using the restart command:
$SYBASE/FaultManager/bin/sybdbm restart

When Fault Manager Calls to the SAP Host Control Fail
Resolution: Refer to the following logs and search for the errors:
— Fault Manager log (<installation_directory> /log/FaultManager.log)
— SAP Host Agent log (/usr/sap/hostctrl/work/dev_sapdbctrl file)
Generally, start with the Fault Manager log and check for the command that has failed. For example, if you are suspecting that the error is caused by system heartbeat failure, in the Fault Manager log, search for TASK = HEARTBEAT_CHECK. Now search for the text HEARTBEAT_CHECK in the SAP Host Agent log for the same timestamp. For correct diagnostic, ensure that the system clocks of the Fault Manager host and the SAP Host Agent are in sync. It’s recommended to use trace level 3 (for maximum verbose output) while debugging SAP Host Agent issues.
The SAP Host Agent is a software component that can accomplish many lifecycle management tasks, such as operating system monitoring, database monitoring, system instance control and so on. It contains several sub-modules, including the SAP Host Control. The SAP Host Control runs within the SAP Host Agent under the sapadm user. For more information, refer to the SAP Host Agent architectural overview.

Error While Stopping the Fault Manager
While using the stop command to shut down the Fault Manager, you see this message:
fault manager did not change to mode UNKNOWN within 60 seconds. fault manager running, pid = 15922, fault manager overall status = OK, currently executing in mode DIAGNOSE
Resolution: Re-execute the stop command. Don’t stop the Fault Manager using the kill -9 operating system command.

The sybdbfm Utility Displays a "No Fault Manager Found" Message
When using the sybdmfm utility, you may see this message:
no fault manager found for current working directory error: stop failed.
Most likely, you are not running the sybdbfm command from the directory where the profile file and other Fault Manager-generated files (such as sp_sybdbfm and stat_sybdbfm) are located.
Resolution: Re-execute the sybdbfm command from the directory where these files are located.

Replication status Messages
Though the primary and companion HADR nodes are healthy (when db host and db status is OK), the sanity report still displays the replication status as one of following:
DEAD
SUSPENDED
UNKNOWN
ASYNC_OK
Resolution: Refer to the Replication Server error logs for information.

Fault Manager Could Not Create a Connection to the Host Agent
The Fault Manager error log indicates (as shown below) that the Fault Manager could not create a connection to the Host Agent.
***LOG Q0I=> NiPConnect2: 10.172.162.61:1128: connect (111: Connection refused)
[/bas/CGK_MAKE/src/base/ni/nixxi.cpp 3324]
*** ERROR => NiPConnect2: SiPeekPendConn failed for hdl 6/sock 6
(SI_ECONN_REFUSE/111; I4; ST; 10.172.162.61:1128) [nixxi.cpp 3324]

Resolution: Check if the sapstartsrv process is running by executing the following command:
ps -aef | grep sapstartsrv
Normally, when the SAP Host Agent is started, the sapstartsrv process starts automatically with it. If the sapstartsrv process is not running already, you need to start it, then re-start the SAP Host Agent.

Miscellaneous Issues



  • Ensure that you have write permissions for the SAP ASE installation directory, the Fault Manager installation and execution directories, and the /tmp directory. The Fault Manager creates temporary directories under /tmp, and adds temporary files. In the absence of appropriate permissions, SAP Host Agent calls fail. Also, it’s important to prevent the /tmp directory from becoming full. If /tmp is full, the Fault Manager cannot create temporary files. Check the status of /tmp by executing the df -k /tmp command. If this command shows 100 percent usage, make room in /tmp.

  • Verify that the GLIBC (GNU C Library) version is 2.7 or later. The Fault Manager is built with GLIBC version 2.7, therefore the hosts running it must use GLIBC version 2.7 or later. Use the following command to check the GLIBC version:
    ldd –version

  • Make sure you enter the correct passwords for sa, DR_admin, and sapadm.

  • Set the appropriate value for file descriptors: A file descriptor is an integer number that uniquely represents an opened file in the operating system. Verify that the user limit value (file descriptor) for open files is set to an adequate number (4096 or more) before you configure the HADR system for large databases.
    To determine the number of file descriptors to which your system is set, enter the following command:

    • For C-shell: limit descriptors

    • For Bourne shell: ulimit –n


    To change the value for the file descriptor (for instance, 4096), enter:

    • For C-shell: limit descriptors 4096

    • For Bourne shell: ulimit –n 4096


    Recommendations


    Increase the Trace Level for Troubleshooting
    Set the trace level (essentially, the level of detail in the error log) to its highest level on the SAP Host Agent and the Fault Manager so your error log output is as detailed as possible.

    • For the Fault Manager: Set the value of the trace level for the ha/syb/trace parameter in the profile file (SYBHA.PFL), then restart the Fault Manager (using the $SYBASE/FaultManager/bin/sybdbm restart command). For example, to get the maximum verbose information, set the trace level to 3 by adding the line ‘ha/syb/trace = 3’ to SYBHA.PFL file. The SYBHA.PFL file is located in the installation directory of the Fault Manager on all platforms. Increasing the trace level increases the number of log entries, and may increase the file size. You may choose one the following values for the ha/syb/trace parameter:

      • 1 – Basic verbose output

      • 2 – Medium verbose output

      • 3 – Maximum verbose output



    • For the SAP Host Agent: Set the trace level in the profile file, and restart the SAP Host Agent using the saphostexec program. For example, to get the maximum verbose output, add the line service/trace = 3 to the host profile (/usr/sap/hostctrl/exe/host_profile). The profile file is located in:

      • (UNIX): /usr/sap/hostctrl/exe/host_profile

      • (Windows): %ProgramFiles%\SAP\hostctrl\exe\host_profile1





8 Comments
fernandofpardo
Explorer
0 Kudos

Hi,

 

I am having an error installing Fault Manager:

 

– Root user

– ldd version 2.11

-Linux SUSE 12

– hostagent 721 patch23

-Installing in a different host than ERP1 and ERP2 (Primary and standby SAP Servers)

-ulimit 4096

ERROR

2017 01/27 17:37:54.824 (11876) loading executable /usr/sap/SYB/SYS/exe/run/sybdbfm for heartbeat to SAPHostAgent tools.
2017 01/27 17:37:54.824 (11876) upload executable /usr/sap/SYB/SYS/exe/run/sybdbfm.
2017 01/27 17:37:54.824 (11876) ERROR: cannot open file /usr/sap/SYB/SYS/exe/run/sybdbfm for read.
2017 01/27 17:37:54.824 (11876) bootstrap failed.
2017 01/27 17:37:57.825 (11876) start bootstrap.

 

I dont understand why it is asking for SAP directory , on the other hand SID for ERP is PRD not SYB.

Finally I couldn´t find fault_manager_responses.txt in $SYBASE/log directory (secondary server)

Any clue?

crisnormand
Advisor
Advisor
0 Kudos
Hello,

This blog refers to HADR for SAP ASE for custom applications, so it does not apply to HADR for SAP ASE for Business Suite.

Note that in a Business Suite environment, Fault Manager is currently not supported, as stated in SAP Note  1891560 - Disaster Recovery Setup with SAP Replication Server :




General Limitations for SAP Replication Server 15.7.1:

SAP Netweaver Business Warehouse (BW) or systems using SAP BW features like SAP SCM APO, SAP SEM, and SAP Solution Manager are currently not supported.

SAP Replication Server 15.7.1 SP200 and higher is not supported for SAP ASE 15.7.
SRS SP200 and higher requires SAP ASE 16.0 as a minimum. The versions that are supported for SAP ASE 16.0 are specified below.

Important: Fault Manager is not supported for HADR for Business Suite environments.




Regards,

Cris

 

 

 
fernandofpardo
Explorer
0 Kudos
Hi Cris,

 

I didn't notice this limitation when I checked the note.

 

I have this versions>

ASE                                    SRS








16.0 SP02 PL05 HF1 15.7.1 SP305 supported

 

I can see in ASE Cockpit both servers with its status green(primary) and grey(stand by) and replication works fine but if Fault Manager is not supported what tool should I use? or what's next?

 

Regards.

 

 
0 Kudos
Great blog!
crisnormand
Advisor
Advisor
Hello Fernando,

There are still issue preventing Fault Manager to be supported for the Business Suite, even if ASE 16 SP02 PL05 HF1 is supported for HADR.

DBA Cockpit is the recommended tool when running SAP applications on SAP ASE, HADR options have been enhanced there. ASE Cockpit has not been specifically designed for ASE for Business Suite, and usually customers running SAP Applications on ASE are not even aware of its existence 🙂

The advantage of the Fault Manager is that it monitors the health of the components of an HADR environment (ASE, SRS, RMA) for you and will take actions automatically depending on the health. Without it, you can still setup your HADR environment, monitor and take the actions needed.

HTH

Regards,

Cris
Former Member
0 Kudos

  • SAP Solution Manager & SAP Netweaver Business Warehouse (BW) or systems using SAP BW features are currently not supported.

  • SAP ASE 15.7. do not support SAP Replication Server 15.7.1 SP200
    SAP ASE 16.0 is required as a minimum for SRS SP200


Thanks

Lisa C | Customer Success Manager

7600 Dublin Blvd #210
PH: (877) 895-9163 | C: (770) 393-3234

Drivers Update Windows 10
Former Member
0 Kudos
Hi Fernando,

can you please tell me which tool or software you use for auto fail over for ASE HADR for Business suite. I have also configured ASE HADR for business suite and looking some mechanism to auto-fail over this

Thanks

Abu
0 Kudos
Hi,

I am having issues to get the Fault Manager to work.

I have 3 Windows 2008 R2 VMs (HADR1, HADR2 and HADR3). I have an ASE 16 SP03 PL03 installed on HADR1 and HADR3, in HADR mode. sap_status path indicate that for all dbs the path is active and replicaiton can occur.

I installed the Fault Manager on HADR2 but when I start it I get errors on HADR1 and HADR3.

On HADR1 - dev_sybdbfm

2018 01/31 11:54:22.726 (1520) start HeartBeatClient.
2018 01/31 11:54:22.726 (1520) sybdbfm exe directory is C:\Program Files\SAP\hostctrl\exe\ASE1
2018 01/31 11:54:22.726 (1520) check_create: 0
2018 01/31 11:54:22.726 (1520) HeartBeatClient started.
2018 01/31 11:54:22.726 (1520) starting heartbeat thread (client) for HADR2:13777.
2018 01/31 11:54:25.737 (1520) start H2HServer.
2018 01/31 11:54:25.737 (1520) starting heartbeat server at: HADR1:13797.
2018 01/31 11:54:25.737 (1520) starting heartbeat thread (server) for HADR1:13797.
2018 01/31 11:54:25.737 (1520) thread status 1 at 217EC7C.
2018 01/31 11:54:25.737 (1520) heartbeat server started.
2018 01/31 11:54:25.737 (1520) HeartBeatServer started.
2018 01/31 11:54:28.748 (1520) HeartBeatSanityCheck: start.
2018 01/31 11:54:28.748 (1520) dbctrl call cnt: 0 .
2018 01/31 11:54:28.748 (1520) executing: asehostctrl -function GetDatabaseStatus -dbname HA1 -dbtype syb -dbinstance ASE1 .
2018 01/31 11:54:28.748 (1520) starting control call.
2018 01/31 11:54:30.183 (1520) Error: Database not found

2018 01/31 11:54:30.183 (1520) dbctrl call cnt reset: 0 .
2018 01/31 11:54:30.183 (1520) control call ended.
2018 01/31 11:54:30.183 (1520) call_saphostctrl completed ok.
2018 01/31 11:54:30.183 (1520) check saphostctrl running (F00)....
2018 01/31 11:54:30.183 (1520) terminateThread (F00).
2018 01/31 11:54:30.183 (1520) ThrExitCode returned (0).
2018 01/31 11:54:30.183 (1520) call exited (exit code 1).
2018 01/31 11:54:30.183 (1520) terminateThread (F00) done.
2018 01/31 11:54:30.183 (1520) ThrDetach returned (5).
2018 01/31 11:54:30.183 (1520) terminate ctrl thread done.
2018 01/31 11:54:30.183 (1520) saphostctrl executed.
2018 01/31 11:54:30.183 (1520) dbctrl call cnt reset 2: 0 .
2018 01/31 11:54:30.183 (1520) database is UNKNOWN.

 

On HADR1 - dev_sapdbctrl

Wed Jan 31 11:32:52 2018
[PID 1820] ODBC driver for Sybase Adaptive Server is not installed.
[PID 1820] DBConfigPath is C:\SAP\sapdbctrl-config
[PID 1820] LiveUpdateOption Status
[PID 1820] ODBC driver for Sybase Adaptive Server is not installed.
[PID 1820] Wed Jan 31 11:32:52 2018 INTERNAL_ERROR sybServer.cpp:5016:SYB_Server::isExisting DESCRIPTION: Cfg file not found: C:\SAP\HA1.cfg LAST ERROR: (0) : The operation completed successfully.
[PID 1820] SAP ASE Server instance ASE1 does not exist.
[PID 1820] LiveUpdateOption retrieving db status failed
sapparam: sapargv(argc, argv) has not been called!
sapparam(1c): No Profile used.
sapparam: SAPSYSTEMNAME neither in Profile nor in Commandline

 

On HADR3 - dev_sybdbfm

018 01/31 11:54:27.035 (1668) start HeartBeatClient.
2018 01/31 11:54:27.035 (1668) sybdbfm exe directory is C:\Program Files\SAP\hostctrl\exe\ASE1
2018 01/31 11:54:27.035 (1668) check_create: 0
2018 01/31 11:54:27.035 (1668) HeartBeatClient started.
2018 01/31 11:54:27.035 (1668) starting heartbeat thread (client) for HADR2:13787.
2018 01/31 11:54:30.046 (1668) start H2HClient.
2018 01/31 11:54:30.046 (1668) HeartBeatClient started.
2018 01/31 11:54:30.046 (1668) starting heartbeat thread (client) for HADR1:13797.
2018 01/31 11:54:33.103 (1668) HeartBeatSanityCheck: start.
2018 01/31 11:54:33.103 (1668) dbctrl call cnt: 0 .
2018 01/31 11:54:33.103 (1668) executing: asehostctrl -function GetDatabaseStatus -dbname HA1 -dbtype syb -dbinstance ASE1 .
2018 01/31 11:54:33.103 (1668) starting control call.
2018 01/31 11:54:34.492 (1668) Error: Database not found

2018 01/31 11:54:34.492 (1668) dbctrl call cnt reset: 0 .
2018 01/31 11:54:34.492 (1668) control call ended.
2018 01/31 11:54:34.492 (1668) call_saphostctrl completed ok.
2018 01/31 11:54:34.492 (1668) check saphostctrl running (FA0)....
2018 01/31 11:54:34.492 (1668) terminateThread (FA0).
2018 01/31 11:54:34.492 (1668) ThrExitCode returned (0).
2018 01/31 11:54:34.492 (1668) call exited (exit code 1).
2018 01/31 11:54:34.492 (1668) terminateThread (FA0) done.
2018 01/31 11:54:34.492 (1668) ThrDetach returned (5).
2018 01/31 11:54:34.492 (1668) terminate ctrl thread done.
2018 01/31 11:54:34.492 (1668) saphostctrl executed.
2018 01/31 11:54:34.492 (1668) dbctrl call cnt reset 2: 0 .
2018 01/31 11:54:34.492 (1668) database is UNKNOWN.

 

On HADR3 - dev_sapdbctrl

Wed Jan 31 11:54:26 2018
[PID 1616] DBConfigPath is C:\SAP\sapdbctrl-config
[PID 1616] LiveUpdateOption LUT_Start_Heartbeat
[PID 1616] Wed Jan 31 11:54:26 2018 INTERNAL_ERROR sybProcess.cpp:754:SybProcess::readInfo DESCRIPTION: OpenProcess failed for PID 4 LAST ERROR: (0) : The operation completed successfully.
[PID 1616] Wed Jan 31 11:54:26 2018 INTERNAL_ERROR sybProcess.cpp:754:SybProcess::readInfo DESCRIPTION: OpenProcess failed for PID 3848 LAST ERROR: (87) : The parameter is incorrect.
[PID 1616] heartbeat started.[PID 1616]
Wed Jan 31 11:54:29 2018
[PID 1616] Wed Jan 31 11:54:29 2018 INTERNAL_ERROR sybProcess.cpp:754:SybProcess::readInfo DESCRIPTION: OpenProcess failed for PID 4 LAST ERROR: (5) : Access is denied.
[PID 1616] LiveUpdateOption start Heartbeat ok.
[PID 2284]
Wed Jan 31 11:54:34 2018
[PID 2284] ODBC driver for Sybase Adaptive Server is not installed.
[PID 2284] Wed Jan 31 11:54:34 2018 INTERNAL_ERROR sybServer.cpp:5016:SYB_Server::isExisting DESCRIPTION: Cfg file not found: C:\SAP\HA1.cfg LAST ERROR: (0) : The operation completed successfully.
[PID 2284] SAP ASE Server instance ASE1 does not exist.
[PID 2284] *** ERROR => 'Get database status' failed: Database not found [sapdbctrl.cp 3690]
[PID 696]
Wed Jan 31 11:54:40 2018
[PID 696] lookup of secstore path failed.
[PID 696] ODBC driver for Sybase Adaptive Server is not installed.
[PID 696] DBConfigPath is C:\SAP\sapdbctrl-config
[PID 696] LiveUpdateOption Status
[PID 696] ODBC driver for Sybase Adaptive Server is not installed.
[PID 696] Wed Jan 31 11:54:40 2018 INTERNAL_ERROR sybServer.cpp:5016:SYB_Server::isExisting DESCRIPTION: Cfg file not found: C:\SAP\HA1.cfg LAST ERROR: (0) : The operation completed successfully.
[PID 696] SAP ASE Server instance ASE1 does not exist.
[PID 696] LiveUpdateOption retrieving db status failed
sapparam: sapargv(argc, argv) has not been called!
sapparam(1c): No Profile used.
sapparam: SAPSYSTEMNAME neither in Profile nor in Commandline

 

On HADR2 -  dev_sybdbfm

2018 01/31 11:54:25.074 (2708) read password from secstore rc (0)
2018 01/31 11:54:25.074 (2708) executing: asehostctrl -host HADR3 -user sapadm ******** -function LiveDatabaseUpdate -dbname HA1 -dbtype syb -dbinstance ASE1 -timeout 30 -updatemethod Execute -updateoption TASK=HEARTBEAT_STARTUP .
2018 01/31 11:54:25.074 (2708) starting control call.
2018 01/31 11:54:29.488 (2708) Webmethod returned successfully

2018 01/31 11:54:29.488 (2708) Operation ID: 000C29E6D6E41ED881CEA33293C7FC5B

2018 01/31 11:54:29.488 (2708) ----- Response data ----

2018 01/31 11:54:29.488 (2708) LogMsg/Text=Executing LiveDatabaseUpdate

2018 01/31 11:54:29.488 (2708) START_HEARTBEAT=ok

2018 01/31 11:54:29.488 (2708) LogMsg/Text=LiveDatabaseUpdate successfully executed

2018 01/31 11:54:29.488 (2708) ----- Log messages ----

2018 01/31 11:54:29.488 (2708) Info: saphostcontrol: Executing LiveDatabaseUpdate

2018 01/31 11:54:29.488 (2708) Info: saphostcontrol: LiveDatabaseUpdate successfully executed

2018 01/31 11:54:29.488 (2708) dbctrl call cnt reset: 0 .
2018 01/31 11:54:29.488 (2708) control call ended.
2018 01/31 11:54:29.488 (2708) call_saphostctrl completed ok.
2018 01/31 11:54:29.488 (2708) check saphostctrl running (9BC)....
2018 01/31 11:54:29.488 (2708) terminateThread (9BC).
2018 01/31 11:54:29.488 (2708) ThrExitCode returned (0).
2018 01/31 11:54:29.488 (2708) call exited (exit code 0).
2018 01/31 11:54:29.488 (2708) terminateThread (9BC) done.
2018 01/31 11:54:29.488 (2708) ThrDetach returned (5).
2018 01/31 11:54:29.488 (2708) terminate ctrl thread done.
2018 01/31 11:54:29.488 (2708) saphostctrl executed.
2018 01/31 11:54:29.488 (2708) dbctrl call cnt reset 2: 0 .
2018 01/31 11:54:29.488 (2708) heartbeat: success.
2018 01/31 11:54:29.488 (2708) heartbeat client started.
2018 01/31 11:54:29.488 (2708) SimpleFetch: select convert(integer, convert(varchar,@@version_number) + substring(convert(varchar,@@sbssav),6,2) + substring(convert(varchar,@@sbssav),9,2))
2018 01/31 11:54:29.551 (2708) SimpleFetch out: 160000303
2018 01/31 11:54:29.551 (2708) FM will acknowledge ASYNC request
2018 01/31 11:54:29.551 (2708) SimpleFetch: sp_configure 'FM Enabled',1
2018 01/31 11:54:29.660 (2708) SimpleFetch out: FM Enabled
2018 01/31 11:54:29.707 (2708) Config option 'FM Enabled' changed on Primary ASE to 1
2018 01/31 11:54:29.707 (2708) SimpleFetch: sp_configure 'FM Enabled',1
2018 01/31 11:54:34.730 (2708) SQLGetDiagRec 0
2018 01/31 11:54:34.730 (2708) ERROR in function SimpleFetch (1427) (SQLExecDirect failed): (30149) [HYT00] [SAP][ASE ODBC Driver]Th
2018 01/31 11:54:34.730 (2708) ERROR in function SimpleFetch (1427) (SQLExecDirect failed): (30086) [HY008] [SAP][ASE ODBC Driver]Operation Canceled.
2018 01/31 11:54:34.730 (2708) Failed to execute statement sp_configure 'FM Enabled',1 on Standby
2018 01/31 11:54:34.730 (2708) bootstrap finished.
2018 01/31 11:54:34.730 (2708) *** sanity check report (1)***.
2018 01/31 11:54:34.730 (2708) node 1: server HADR1, site SITE01.
2018 01/31 11:54:34.730 (2708) db host status: UNKNOWN.
2018 01/31 11:54:34.730 (2708) db status DB INDOUBT hadr status PRIMARY.
2018 01/31 11:54:34.730 (2708) node 2: server HADR3, site SITE02.
2018 01/31 11:54:34.730 (2708) db host status: UNKNOWN.
2018 01/31 11:54:34.730 (2708) db status DB INDOUBT hadr status STANDBY.
2018 01/31 11:54:34.730 (2708) replication status: UNKNOWN.
2018 01/31 11:54:34.730 (2708) insert status to fault manager status table.
2018 01/31 11:54:34.730 (2708) omitting insert into fault manager status table as db is not in status ok.
2018 01/31 11:54:34.730 (2708) omitting insert into fault manager status table as db is not in status ok.
2018 01/31 11:54:34.730 (2708) sybdbfm server mode.
2018 01/31 11:54:34.730 (2708) Virtual memory used by current process (bytes): 21737472

 

Any idea what can be the problem ?  Fro mthose error message I have no clue what is the underlying issue.

 

Best regards,

 

Juan Vega