Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
5,102
This post provides information on the key troubleshooting issues you might encounter while using the Fault Manager, and the various diagnostic and monitoring tools you can use to fix them. It also details recommendations on configuring your Fault Manager and SAP Host Agent. The post includes the following:
— Troubleshooting HADR System/Fault Manager Issues
— Miscellaneous Issues
— Recommendations


Troubleshooting HADR System/Fault Manager Issues


When the Root Partition is Full
On one of the hosts running the primary or companion servers, the Fault Manager heartbeat log file (dev_hbeat) may grow very large in size, and as a result, the host's root partition fills up and the asehostctrl command fails.
Resolution: Use the following command to check the size of the dev_hbeat file to determine if the increased file size is causing the failure:
sudo du -sh /usr/sap/hostctrl/work/dev_hbeat
16G /usr/sap/hostctrl/work/dev_hbeat

To resolve this issue, delete the dev_hbeat file. If the dev_hbeat file does not consume much space, you might want to check other files on the partition.

When the ASE Cockpit Frequently Displays Timeout Messages
This indicates that the sapdbctrl calls from the Fault Manager are timing out.
Resolution: Increase the timeout period for sapdbctrl by increasing the value for the ha/syb/dbctrl_timeout parameter in the Fault Manager Profile file. The default value of the parameter is 30 seconds. After you have made the necessary changes, restart the Fault Manager using the restart command:
$SYBASE/FaultManager/bin/sybdbm restart

When Fault Manager Calls to the SAP Host Control Fail
Resolution: Refer to the following logs and search for the errors:
— Fault Manager log (<installation_directory> /log/FaultManager.log)
— SAP Host Agent log (/usr/sap/hostctrl/work/dev_sapdbctrl file)
Generally, start with the Fault Manager log and check for the command that has failed. For example, if you are suspecting that the error is caused by system heartbeat failure, in the Fault Manager log, search for TASK = HEARTBEAT_CHECK. Now search for the text HEARTBEAT_CHECK in the SAP Host Agent log for the same timestamp. For correct diagnostic, ensure that the system clocks of the Fault Manager host and the SAP Host Agent are in sync. It’s recommended to use trace level 3 (for maximum verbose output) while debugging SAP Host Agent issues.
The SAP Host Agent is a software component that can accomplish many lifecycle management tasks, such as operating system monitoring, database monitoring, system instance control and so on. It contains several sub-modules, including the SAP Host Control. The SAP Host Control runs within the SAP Host Agent under the sapadm user. For more information, refer to the SAP Host Agent architectural overview.

Error While Stopping the Fault Manager
While using the stop command to shut down the Fault Manager, you see this message:
fault manager did not change to mode UNKNOWN within 60 seconds. fault manager running, pid = 15922, fault manager overall status = OK, currently executing in mode DIAGNOSE
Resolution: Re-execute the stop command. Don’t stop the Fault Manager using the kill -9 operating system command.

The sybdbfm Utility Displays a "No Fault Manager Found" Message
When using the sybdmfm utility, you may see this message:
no fault manager found for current working directory error: stop failed.
Most likely, you are not running the sybdbfm command from the directory where the profile file and other Fault Manager-generated files (such as sp_sybdbfm and stat_sybdbfm) are located.
Resolution: Re-execute the sybdbfm command from the directory where these files are located.

Replication status Messages
Though the primary and companion HADR nodes are healthy (when db host and db status is OK), the sanity report still displays the replication status as one of following:
DEAD
SUSPENDED
UNKNOWN
ASYNC_OK
Resolution: Refer to the Replication Server error logs for information.

Fault Manager Could Not Create a Connection to the Host Agent
The Fault Manager error log indicates (as shown below) that the Fault Manager could not create a connection to the Host Agent.
***LOG Q0I=> NiPConnect2: 10.172.162.61:1128: connect (111: Connection refused)
[/bas/CGK_MAKE/src/base/ni/nixxi.cpp 3324]
*** ERROR => NiPConnect2: SiPeekPendConn failed for hdl 6/sock 6
(SI_ECONN_REFUSE/111; I4; ST; 10.172.162.61:1128) [nixxi.cpp 3324]

Resolution: Check if the sapstartsrv process is running by executing the following command:
ps -aef | grep sapstartsrv
Normally, when the SAP Host Agent is started, the sapstartsrv process starts automatically with it. If the sapstartsrv process is not running already, you need to start it, then re-start the SAP Host Agent.

Miscellaneous Issues



  • Ensure that you have write permissions for the SAP ASE installation directory, the Fault Manager installation and execution directories, and the /tmp directory. The Fault Manager creates temporary directories under /tmp, and adds temporary files. In the absence of appropriate permissions, SAP Host Agent calls fail. Also, it’s important to prevent the /tmp directory from becoming full. If /tmp is full, the Fault Manager cannot create temporary files. Check the status of /tmp by executing the df -k /tmp command. If this command shows 100 percent usage, make room in /tmp.

  • Verify that the GLIBC (GNU C Library) version is 2.7 or later. The Fault Manager is built with GLIBC version 2.7, therefore the hosts running it must use GLIBC version 2.7 or later. Use the following command to check the GLIBC version:
    ldd –version

  • Make sure you enter the correct passwords for sa, DR_admin, and sapadm.

  • Set the appropriate value for file descriptors: A file descriptor is an integer number that uniquely represents an opened file in the operating system. Verify that the user limit value (file descriptor) for open files is set to an adequate number (4096 or more) before you configure the HADR system for large databases.
    To determine the number of file descriptors to which your system is set, enter the following command:

    • For C-shell: limit descriptors

    • For Bourne shell: ulimit –n


    To change the value for the file descriptor (for instance, 4096), enter:

    • For C-shell: limit descriptors 4096

    • For Bourne shell: ulimit –n 4096


    Recommendations


    Increase the Trace Level for Troubleshooting
    Set the trace level (essentially, the level of detail in the error log) to its highest level on the SAP Host Agent and the Fault Manager so your error log output is as detailed as possible.

    • For the Fault Manager: Set the value of the trace level for the ha/syb/trace parameter in the profile file (SYBHA.PFL), then restart the Fault Manager (using the $SYBASE/FaultManager/bin/sybdbm restart command). For example, to get the maximum verbose information, set the trace level to 3 by adding the line ‘ha/syb/trace = 3’ to SYBHA.PFL file. The SYBHA.PFL file is located in the installation directory of the Fault Manager on all platforms. Increasing the trace level increases the number of log entries, and may increase the file size. You may choose one the following values for the ha/syb/trace parameter:

      • 1 – Basic verbose output

      • 2 – Medium verbose output

      • 3 – Maximum verbose output



    • For the SAP Host Agent: Set the trace level in the profile file, and restart the SAP Host Agent using the saphostexec program. For example, to get the maximum verbose output, add the line service/trace = 3 to the host profile (/usr/sap/hostctrl/exe/host_profile). The profile file is located in:

      • (UNIX): /usr/sap/hostctrl/exe/host_profile

      • (Windows): %ProgramFiles%\SAP\hostctrl\exe\host_profile1





8 Comments