Disclaimer
Material in this blog post is provided for information and technical features demonstration purposes only. Although described techniques are based on usage of standard features and capabilities of SAP JVM, an approach that has been undertaken for identification and retrieval of required logged information, relies on CPI internal functionalities. Please use the below described technique at your own risk.
Intro
When executing load tests and analyzing test results, there are certain metrics that are essential to collect and examine in order to ensure that the system under test is capable of handling generated volume efficiently. For Java based applications (or, to be more precise, for applications developed in JVM languages - Java, Groovy, Kotlin, etc.), one of such critical groups of metrics relates to memory consumption and garbage collection. Excessive consumption of memory and/or inefficient memory deallocation can lead to memory exhaustion and in extreme cases, can cause out of memory errors in JVM, and consequently, abnormal termination of operations performed by running applications. Фт
On premise deployed Java based systems (such as SAP PI/PO, SAP EP) benefit from plethora of techniques and tools that can be used by customers in various use cases to collect relevant information on memory consumption and garbage collector efficiency metrics, and analyze it in textual and graphical (visualized) forms. Just to name few of them that are common to SAP landscapes:
- Java bytecode instrumentation with an agent that collects metrics required for further live or retrospective (historic) analysis - for example, Wily Introscope.
- Additional verbose logging of SAP JVM metrics and usage of non-SAP generic and SAP specific complementary tools for such logs’ retrospective analysis.
- Sampling or profiling of a running SAP JVM that can be used for a short period of time (due to amount of collected data and potential negative performance impact on a profiled JVM).
Given that SAP CPI is based on Java, and messages processing for deployed iFlows takes place in the context of a SAP JVM that is used by a CPI runtime node, a natural question is: can we apply already familiar techniques and extend already known tools’ usage from on premise systems to cloud services - for example, to employ common toolset when analyzing JVM memory related metrics for load tests executed in PI/PO and in CPI systems? The answer is: yes, to some extent.
Customers cannot instrument CPI tenants with agents on their own - hence, we cannot use infrastructure such as Wily Introscope or its equivalent to monitor CPI landscape. We de-list a group #1 provided in the list above, from our further discussion in this blog post.
SAP JVM Profiler requires connection to a specific dedicated port (a JVM debugging port or a port of jvmmond service), which is not directly accessible for JVMs of CPI runtime nodes from public networks. We de-list a group #3, too.
What we can use is, SAP JVM verbose logs that also include garbage collector logs and that are already produced by a SAP JVM of a CPI runtime node - hence, we will focus on a group #2 further in this blog post. Firstly, since these logs are already produced, we don’t need to activate any additional logging and thus don’t cause overhead to a running JVM. Secondly, garbage collector logs contain useful information about memory allocation rate and deallocation (garbage collection) efficiency and performance. Particularly, we are going to inspect two types of logs:
- Garbage collector events log. As its name suggests, this log contains information about garbage collector events that are triggered during garbage collection cycles and some most important characteristics such as duration of specific garbage collection phases and their efficiency. This log is a nice starting point in analysis, as it provides a quick outlook into garbage collection. The log file contains plain text content, so it can be analyzed in a textual form, but it is worth visualizing this log’s content in order to simplify detection of potential issues. There are many standalone and online tools that can parse this log’s content and present it in a visual form - in this blog post, I will use one of them, IBM Pattern Modeling and Analysis Tool for Java Garbage Collector (IBM PMAT).
- Garbage collection history log. This log contains more advanced and detailed information about garbage collection and provides consolidated view on memory configuration and garbage collection parameterization of a JVM, and invaluable insight into various aspects of garbage collection cycles. Log file contains binary content and can be parsed and visualized using SAP JVM Supportability Tools that are distributed as an Eclipse plugin (refer to SAP Note 1622369 - "SAP JVM Supportability Tools - Installation and Update" - for installation details).
It is also possible to obtain high level information on some metrics related to memory and garbage collector statistics by alternative means - such as:
- Using several standard MX beans - MemoryMXBean, collection of beans of MemoryPoolMXBean and GarbageCollectorMXBean. Corresponding methods of MX beans can be called from within a Groovy script step of a custom iFlow. Such an iFlow can be used to collect relevant information and output it afterwards
- Utilizing CPI internal Operations API. Operations API is HTTP-based - hence, corresponding operations can be invoked by external HTTP client / application. It shall be noted that Operations API is not a public API provisioned by CPI, but an API that was primarily intended to be used for internal purposes and by (by now deprecated) an HCI/CPI plugin for Eclipse - hence, usage of this API in customer scenarios is not supported by SAP. Should you want to read about Operations API in more details, please check the blog post written by ariel.bravoayala3.
Above mentioned options provide high level information that is not sufficient for more sophisticated analysis, as well as they introduce additional overhead to a CPI runtime node, as respective calls (of MX beans from within a Groovy script of Operations API) have to be triggered frequently enough to provide representative information, so we are not going to discuss these approaches in the blog post and will focus on already collected garbage collector logs.
Prerequisites
Ahead of reading this blog post further, please ensure that you are comfortable with topics listed below, as they are not a subject of this blog post and will not be covered here, but corresponding knowledge is a prerequisite to make value-added practical use of material described later on:
- Memory areas in a JVM. Our analysis will be primary focused on heap memory, so awareness about particularly heap memory areas is essential.
- Memory allocation in a JVM.
- Memory deallocation in a JVM and principles of work of garbage collector. Considering that SAP JVM currently uses Concurrent Mark Sweep (CMS) garbage collector, awareness about phases of CMS garbage collection cycle is beneficial.
- Logs relevant for garbage collector in a SAP JVM. Awareness about collection and analysis of verbose log of garbage collector events and garbage collection history is essential.
- Tools that can be used for parsing and visualization of corresponding logs mentioned earlier.
Start - retrieval of JVM startup arguments
The very first step is to get familiar with configuration of logging that has been activated for a JVM of a CPI runtime node. This can be achieved by checking arguments (options) that have been used during startup of JVM. To list JVM arguments, we can use one of standard MX beans - RuntimeMXBean - that provides an applicable method
RuntimeMXBean.getInputArguments()
. This approach and example of RuntimeMXBean usage have been described in more details in
my earlier blog post. There are various options about how the code snippet can be wrapped into a Groovy script step in an iFlow and how its output can be delivered, but they all are fundamentally based on invoking a method getInputArguments() on an instance of RuntimeMXBean that is placed in a Groovy script step within an iFlow. Here, I will use a simplistic approach by executing the iFlow that consists of the only step - a Groovy script step - which in its turn sends a list of JVM arguments in a response message. The Groovy script’s code snippet is provided below:
import com.sap.gateway.ip.core.customdev.util.Message
import java.lang.management.ManagementFactory
import java.lang.management.RuntimeMXBean
Message processData(Message message) {
StringBuilder builder = new StringBuilder()
RuntimeMXBean runtimeMXBean = ManagementFactory.runtimeMXBean
runtimeMXBean.inputArguments.each { builder << "${it}\n" }
message.body = builder.toString()
return message
}
Outcome of execution of such an iFlow is as following (several arguments have been highlighted, as we are going to focus on them further in the blog post):
Location of JVM garbage collector events log
After a list of JVM arguments of a CPI runtime node has been obtained, we will make sure that garbage collector events logging is enabled. In this regard, when looking through obtained JVM arguments, we are going to search for two arguments –
-verbose:gc
and
-Xloggc
:
- An argument
-verbose:gc
enables simple garbage collector events logging. Logged output is sent to a standard output. The argument can be complemented by additional arguments.
- An argument
-Xloggc
enables more advanced garbage collector events logging. It also allows customization of location of the file to which logged output is sent.
If both arguments
-verbose:gc
and
-Xloggc
are specified during JVM startup,
-Xloggc
take precedence.
While inspecting retrieved list of JVM arguments, we can observe following parameterization:
-Xloggc:./log/gc.log
-verbose:gc
This observation indicates that garbage collector events log can be found in the file
./log/gc.log.
Note that in Java 9, an argument
-Xloggc
and several other arguments related to garbage collector logging were deprecated and superseded by an argument
-Xlog
that was introduced as a part of unified JVM logging framework (or more details, refer to
JEP 158: Unified JVM Logging and
JEP 271: Unified GC Logging). At the time of this blog post writing, SAP CPI used Java 8, so classic arguments are used and will be found in the output, and we are not yet looking for an argument
-Xlog
, though take a note of it for the future.
Location of JVM garbage collection history
Next, we are going to check if garbage collection history is enabled and figure out location of a corresponding file. In this regard, when looking through obtained JVM arguments, we are going to search for two arguments -
-XX:{+-}GCHistory
and
-XX:GCHistoryFilename
:
- An argument
-XX:{+-}GCHistory
enables or disables garbage collection history. Logged output is sent to the file sapjvm_gc.prf.
- An argument
-XX:GCHistoryFilename
is used for customization of location of the file to which logged output is sent.
Several other arguments
-XX:GCHistory*
are used to fine tune logging of garbage collection history, such as define maximum size of each garbage collection history file, maximum number of rotated files and compression level.
While inspecting retrieved list of JVM arguments, we can observe following parameterization:
-XX:+GCHistory
This observation indicates that garbage collection history can be found in the file
./sapjvm_gc.prf.
Note that while garbage collector events log file gc.log is updated instantly, this is not the case for garbage collection history file sapjvm_gc.prf - in case a corresponding JVM is still running, it is required to explicitly trigger a dedicated command (retrieve complete GC history) in order copy consistent content for garbage collection history to a target file. This can be done using tools such as jvmmon (a part of SAP JVM distribution) or SAP JVM Profiler. This step is mentioned in SAP JVM documentation - for example, in SAP Notes
1995856 - "How to retrieve the complete GC history" and
2724902 - "How to fetch Garbage Collection (GC) History using the SAP JVM Eclipse plug-in". Both tools require connectivity to a running SAP JVM, which cannot be achieved for JVM of a CPI runtime node when accessing a CPI tenant from public networks. Because of that, content of sapjvm_gc.prf might be found to be not up to date, and it is likely to be required to wait for few hours before it will contain information for a needed time period.
Retrieval of identified log files
After locations of corresponding log files have been identified, it is now time to retrieve those files from a CPI runtime node. In order to do this, we can use another iFlow and retrieve content of files by their location. Example on how this can be achieved practically has been provided in
the earlier mentioned blog post, and I also highly recommend checking
the blog post written by
ariel.bravoayala3, where he addresses a use case of browsing directories and accessing files located on a CPI runtime node and provides a more user-friendly technical implementation in a custom developed iFlow.
Finale - logs analysis
Required log files have been identified and obtained - now is the time to feed their content to relevant tools and dive into analysis. Subsequent steps - such as analysis technique and strategy - don’t differ from what is known to us from corresponding logs’ analysis collected from on premise systems / SAP JVMs, so I will not go into details, but will only conclude this blog post with few examples of parsed and visualized log data.
In sake of illustration, several large messages were sent to the CPI tenant under test, so that we can produce some volume and see how memory consumption and garbage collector respond to this.
We start from garbage collector events log - its original content in plain text, followed by graph view in IBM PMAT:
After that, we get to garbage collection history. Firstly, we have a quick look into a summary, memory and garbage collector related configuration and garbage collection statistics:
And proceed to more detailed analysis of relevant metrics - as an example, general telemetry (CPU and memory consumption, garbage collection), followed by more precise inspection of heap memory consumption and garbage collection efficiency:
Finally, we can zoom in and focus on analysis of a specific time period - both in graphical (overview) and verbose textual (for each garbage collector run) representations:
And - let’s face it - although this part of the blog post is called "Finale" and concludes the blog post, it is truly a beginning of subsequent analysis process that is commonly more challenging and time-consuming than retrieval of log data.