Autonomous Self-Analysis System II: Notify in Sla...
Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
This is the second post from the Autonomous Self-Analysis System blog series.
"Autonomous Self-Analysis System I" is the first one from the blog series which explains an autonomous self-analysis solution to automate email notification with issue analysis report on a HANA Issue. When I was setting up the solution to automate the email notification, I was thinking: can I have something a little bit cooler? e.g. Can I have a more interactive assistant who can solve a HANA problem and has more functionalities than only analyzing one issue?
What about a Chat Bot assistant who can talk and work on HANA issues?
This blog is to explain how to achieve such an autonomous self-analysis system: get notified on a chat bot from a messaging app when a HANA issue happens. With this solution, the user can be notified proactively with the analysis and recommendation directly.
In this post, we are using Slack as the messaging app, Hubot as the chat bot, SAP HANA dump analyzer to automate issue analysis from a SAP HANA runtime dump, SAP HANASitter to automate HANA runtime dump generation when the HANA system is having the issue.
The autonomous self-analysis system in this blog describes the following scenario in Fig.1. Please be noted, this blog is only using a simple example to explain how the system/solution works, e.g. the system sends a message to a Slack channel when the defined issue is detected. This will notify the channel members. If you want to run it on your production system, you need to complete the implementation based on the requirements from your real-life scenario.
This blog provides the code snippet to make it easier to get started. If you are going to implement the solution on a production environment, you need to fully test it before running it productively as you will run it “at your own risk”.
Fig.1 Example Scenario of the Autonomous Self-Analysis System
Get Your Chat Bot Running Locally in Slack
Slack is a messaging app for teams. Messaging is categorized in channels that everyone is free to follow or not. It enables us to easily communicate with each other.
Hubot is a chat bot by Github. It is open source and written in CoffeeScript and Node.js. You can automate processes with Hubot through scripts, or create a customized robot assistant for your team communication. In this post, our chat bot is called ‘Janix’. She notifies me if a critical issue has occurred on the HANA systems, tells us what happened and what actions should be taken. I can also ask her to do stuff for me, like checking historical issues on the HANA issues etc.
In the Hubot page, the steps of starting a Hubot in Slack are described. After following the steps on the Hubot page and setting up the Hubot in Slack, you can start your chat bot via the following command (in the following example, the chat bot is started from C:\Janix on my windows laptop):
cd C:\Janix
bin\hubot --adapter slack
After your chat bot is started, you can interact with her in slack with the native commands, e.g.
Listen to HANA Runtime Dump Created Event in Chat Bot
In order to react on a HANA issue in Slack, you have to set up an endpoint in your chat bot that will react once a HANA runtime dump analyzed event is received. In your text editor(I’m using GVim on windows), create a new coffeescript script called hda.coffee inside your bots script directory like the following:
The export function is a requirement and is part of the anatomy of a Hubot script. The robot parameter is an instance of your bot.
router is a built-in Hubot method. By adding .post you create a route for POST requests. Naturally, if you want to create a route for GET request, you simply call .get .
The string '/hubot/Janix/:room' is the URL you set up, where :room is a variable you have to define. The callback, (req, res) is called when a POST request comes in. Save the room name and the JSON object that contains all data in 2 variables :room and data.
Hubot has built-in event listener methods: emit. With the emit method, you can create your own event that Hubot will listen to. To listen to a HANA runtime dump analyzed event, you can create runtimedump_analyzed method.
Finally, you want to tell request sender you’ve successfully processed the request. You can do that by sending a success response, i.e. via send 'OK'.
Generate the Analysis Report Using SAP HANA dump analyzer Command Line
The SAP HANA dump analyzer can be executed via command line to analyze issue from a provided HANA runtime dump and return the analysis report. The help page of SAP HANA dump analyzer command line is available via:
java -jar HANADumpAnalyzer.jar -help
The SAP HANA dump analyzer provides different options to use the command line in different scenarios.
-v, -version: output the version information of SAP HANA dump analyzer
-s: HANA runtime dump file to be parsed. Currently analyzing only one runtime dump or its zip file is supported via the command line mode
-d: output directory to write the analysis report. If no value is provided for -d option, the analysis report will be opened via browser automatically if there is a browser available
The typical examples are:
Analyze rtedump1.trc, open the analysis report via the browser (on the same host where the command line is executed) directly:
java -jar HANADumpAnalyzer.jar -s rtedump1.trc
Analyze rtedump1.trc, write the analysis report to directory dir:
java -jar HANADumpAnalyzer.jar -s rtedump1.trc -d dir
In this post, you need to write the analysis report to a pre-defined directory.
Create the URL to the Analysis Report
A web server (e.g. Tomcat Server in our example) can be started and used to display the analysis report. The steps are:
Go to Deploy section to “Deploy directory or WAR file located on server”, then provide the target directory in the “Context path” and press “Deploy”.
Once the analysis report is created, you can copy the analysis report to the report directory. The URL to the analysis report can be provided to check the HANA issue in detail.
Automatically Send a Message to Your Slack Channel on a HANA Issue
HANASitter is implemented as a python script. It can be used to configure reaction methods (e.g. creation of traces and dumps or collection of performance histories) when specific conditions (like high load) are met. In this blog, the HANASitter is configured to
Check the HANA system every 10 minutes.
If the HANA system is not responding “select * from dummy” request after 3 minutes, generates one HANA runtime dump.
We aim to enhance it in the following way:
After the HANA runtime dump is generated, call the HANA dump analyzer to generate the analysis report
Copy the analysis report to the report directory on the Tomcat server
Send the message notification to the Slack channel
The steps need to do are:
Start Tomcat server and deploy the report directory (i.e. to place the analysis reports) via Tomcat manager app page.
Create analysis report and notify Slack in HANASitter
def notify_hubot(rte):
# Get HANA runtime dump name
rte_name = rte[:-4]
# Command string for generate the analysis report from the HANA runtime dump
hda_str = "java -jar HANADumpAnalyzer.jar -s " + rte + "<report folder>"
# Create parameters used for Slack notification
url = "http://<tomcat server:8080>/reports/HANADumpAnalyzer/" + rte_name + "/analysis.html"
reason="<SID> is not responding!"
data = '"{\\"url\\": \\"' + url +'\\", \\"reason\\": \\"' + reason +'\\"}"'
# Command string for send message to Slack
curl_str = '/usr/bin/curl --noproxy \'*\' --header "Content-Type: application/json" --request POST --data ' + data +' http://<ip adress of the Hubot server>:8080/hubot/Janix/general'
# Execute HANA runtime dump to analyze the HANA runtime dump
os.system(hda_str)
# Send message to Slack
os.system(curl_str)
return true
P.S.: Please be noted the above sample code is only explaining the key steps. There are also some steps e.g.user permission, error handling etc. you need to take care of. For example, most likely you will run the HANASitter with sidadm user. SAP HANA dump analyzer is called within HANASitter to generate the analysis report. The Tomcat server is using a different user for displaying the analysis report. If this is the case, the following pre-requisites must be fulfilled:
The sidadm should have the authorization to call SAP HANA dump analyzer.
The sidadm should have the authorization to write the analysis report to the target folder.
The analysis report file permission setting should allow the Tomcat server to display it.
As mentioned in the beginning, you will need to complete your own code/set up to make it fully working.
Trigger the Slack message when the HANA runtime dump is generated by the HANASitter
In the example code, it calls the notification function notify_hubot(rte) after the runtime dump is created via function record_rtedump(rtedumps_interval, hdbcons, comman):
def record_rtedump(rtedumps_interval, hdbcons, comman):
total_printout = ""
for hdbcon_string, host in zip(hdbcons.hdbcons_strings, hdbcons.hosts):
tenantDBString = hdbcons.tenantDBName+"_" if hdbcons.is_tenant else ""
start_time = datetime.now()
if hdbcons.rte_mode == 0: # normal rte dump
rte = "rtedump_normal_"+host+"_"+hdbcons.SID+"_"+tenantDBString+datetime.now().strftime("%Y-%m-%d_%H-%M-%S")+".trc"
filename = (comman.out_dir.replace(".","_")+rte)
os.system(hdbcon_string+'runtimedump dump -c" > '+filename) # have to dump to std with -c and then to a file with > since in case of scale-out -f does not work
print('write rtedump: ' + filename)
#Start to insert
#Triggering the analysis and email notification
time.sleep(5)
print('start of notification')
notify_hubot(rte)
#Finish the insert
...
Define the condition for HANASitter and how it should react in config file.
In this example, we are simply setting the following rule: If HANA is not responding to “select * from dummy” request after 3 minutes, then generate one HANA runtime dump.
With the above condition, the HANASitter.config file looks like:
# if no response after 180s
-pt 180
# create 1 runtime dump
-nr 1
# check every 600 seconds
-ci 600
# using key
-k <HANASITTERKEY>
-od <target directory to place the HANA runtime dumps>
After completing the above 4 steps, your chat bot will automatically notify you and tell you what happens if your HANA system is not responding for more than 3 minutes.
Here is a demo for the 'Janix' chat bot in Slack:
Now you are ready to Show Error History for a HANA System by Chat Bot in Slack!