Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
TimGerlach
Advisor
Advisor
7,263
In part 1 of this series we talked about Java memory management and how it is different in containers. In part 2 we will focus on different out of memory errors that may occur with Java apps in containerized environments and how to mitigate them.

Out Of Memory Errors


Generally, we distinguish between two different out of memory situations:

  • Firstly, the aforementioned container out of memory error, which happens when the container consumes more memory than allowed by its quota.

  • Secondly, the Java Virtual Machine can still run into the ordinary java.lang.OutOfMemoryError situation, where there's just not enough memory available to hold the required data into one of its memory areas, the heap being one of them.


Since troubleshooting for both situations is significantly different, it's crucial to understand which out of memory error you are facing to apply the correct countermeasures.

Container Out Of Memory


As mentioned before, a container out of memory situation occurs when your JVM tries to use more memory than your container quota allows. This can happen when the JVM was given memory flags that configure it to use more memory than is available in the container. Another situation we have observed is that a JVM was scheduling more threads than memory was reserved for via the -Xss flag. By default, the Java buildpack expects and reserves enough memory for 250 * 1M (default value for -Xss).
Symptom

The easiest way to identify a container out of memory situation is via the Cloud Foundry command line tool. When calling cf logs APP_NAME --recent, you see something like this:

2021-05-03T15:54:37.33+0200 [APP/PROC/WEB/0] OUT Exit status 137 (out of memory)


and cf events APP_NAME shows you:

Getting events for app spring-music-oom in org cp-performance / space demo as tim.gerlach@sap.com...
time event actor description
2021-05-03T15:54:42.00+0200 app.crash spring-music-oom index: 0, reason: CRASHED, cell_id: 3663701e-98b5-419d-bc51-e4895b767361, instance: 6d36618b-941e-4a03-42f5-a906, exit_description: APP/PROC/WEB: Exited with status 137 (out of memory)
2021-05-03T15:54:42.00+0200 audit.app.process.crash web index: 0, reason: CRASHED, cell_id: 3663701e-98b5-419d-bc51-e4895b767361, instance: 6d36618b-941e-4a03-42f5-a906, exit_description: APP/PROC/WEB: Exited with status 137 (out of memory)


Note the (out of memory) at the end, which indicates that the container itself ran out of memory (and not the JVM).
Mitigation

Cause 1 - Inappropriate memory flags: Are there any JVM memory flags provided to the Java application which are incompatible with the container? Recommendation: If memory flags were provided manually to the JVM, you should review and evaluate them. For example, it doesn't make sense to explicitly request a 1G heap if the overall container has only 1G of size.

Cause 2 - High thread count: A very high thread count can lead to memory errors at the container level when the sum of memory consumed by every thread (via -Xss) exceeds the available memory of the container. Especially under a high load, a high number of threads might be processing requests. Recommendation: Verify the maximum number of threads used by your JVM (as explained above) and provide a hint to the Java buildpack with the number of threads to be expected. You can add this hint to the SAP Java buildpack via the JBP_CONFIG_JAVA_OPTS environment variable and the stack_threads property as documented here.

Cause 3 - High native memory consumption: In some cases the JVM requires more native memory than the memory calculation of the buildpack accounted for. You can excplicitly request more memory headroom to be taken into account by adding a hint to the SAP Java buildpack via the JBP_CONFIG_JAVA_OPTS environment variable and the headroom property as documented here. The headroom describes the percentage of available container memory that should be reserved and not assigned to the JVM.

Java Virtual Machine Out Of Memory


When you've properly configured the Java Virtual Machine, it knows exactly how much memory it is allowed to allocate for each memory area within the boundaries of the overall container quota. If it is unable to allocate more memory for any of its memory areas, as configured via the memory flags, it crashes with a java.lang.OutOfMemoryError: <space>.
Symptom

When checking the events of the app via cf events <APP_NAME>, this shows up as app.crash event exit status 137:

$ cf events APP_NAME
Getting events for app spring-music in org cp-performance / space demo as tim.gerlach@sap.com...
time event actor description
2021-04-30T17:37:47.00+0200 app.crash spring-music-oom index: 0, reason: CRASHED, cell_id: 44f68af5-c10d-4c4a-81c9-8d4027dd04fe, instance: 4fcde600-b711-4499-6092-edef, exit_description: APP/PROC/WEB: Exited with status 137
2021-04-30T17:37:47.00+0200 audit.app.process.crash web index: 0, reason: CRASHED, cell_id: 44f68af5-c10d-4c4a-81c9-8d4027dd04fe, instance: 4fcde600-b711-4499-6092-edef, exit_description: APP/PROC/WEB: Exited with status 137


In order to understand which memory area caused the memory exhaustion, check the logs via cf logs APP_NAME --recent. At the top, the log contains information about the memory area which was exhausted:

$ cf logs APP_NAME --recent
2021-04-30T17:37:35.69+0200 [APP/PROC/WEB/0] ERR Resource exhaustion event: the JVM was unable to allocate memory from the heap.


In addition, the log output contains a heap histogram which allows you to find out which object types were found in the heap at the point in time the application crashed:

   2021-04-30T17:37:37.46+0200 [APP/PROC/WEB/0] OUT | Instance Count | Total Bytes | Class Name                                                                                                       |
2021-04-30T17:37:37.46+0200 [APP/PROC/WEB/0] OUT | 6102 | 125167864 | [Ljava/lang/Object; |
2021-04-30T17:37:37.46+0200 [APP/PROC/WEB/0] OUT | 50985 | 5097648 | [C |
2021-04-30T17:37:37.46+0200 [APP/PROC/WEB/0] OUT | 13288 | 1574216 | Ljava/lang/Class; |
2021-04-30T17:37:37.46+0200 [APP/PROC/WEB/0] OUT | 48268 | 1544576 | Ljava/util/concurrent/ConcurrentHashMap$Node; |
2021-04-30T17:37:37.47+0200 [APP/PROC/WEB/0] OUT | 50709 | 1217016 | Ljava/lang/String; |


You also find a detailed dump of the memory usage per memory pool:

   2021-04-30T17:37:37.57+0200 [APP/PROC/WEB/0] OUT Memory usage:
2021-04-30T17:37:37.57+0200 [APP/PROC/WEB/0] OUT Heap memory: init 12582912, used 146688144, committed 381157376, max 381157376
2021-04-30T17:37:37.57+0200 [APP/PROC/WEB/0] OUT Hint: Heap memory is over 95% full. To increase it, increase the container size.
2021-04-30T17:37:37.57+0200 [APP/PROC/WEB/0] OUT Non-heap memory: init 7667712, used 95216776, committed 99704832, max 554958848
2021-04-30T17:37:37.58+0200 [APP/PROC/WEB/0] OUT Memory pool usage:
2021-04-30T17:37:37.58+0200 [APP/PROC/WEB/0] OUT CodeHeap 'non-nmethods': init 2555904, used 1402752, committed 2555904, max 6291456
2021-04-30T17:37:37.58+0200 [APP/PROC/WEB/0] OUT CodeHeap 'profiled nmethods': init 2555904, used 8860160, committed 8912896, max 121634816
2021-04-30T17:37:37.58+0200 [APP/PROC/WEB/0] OUT CodeHeap 'non-profiled nmethods': init 2555904, used 2087936, committed 2555904, max 123731968
2021-04-30T17:37:37.58+0200 [APP/PROC/WEB/0] OUT Metaspace: init 0, used 73458568, committed 75587584, max 155844608
2021-04-30T17:37:37.58+0200 [APP/PROC/WEB/0] OUT Compressed Class Space: init 0, used 9407360, committed 10092544, max 147456000
2021-04-30T17:37:37.58+0200 [APP/PROC/WEB/0] OUT Eden Space: init 4390912, used 2258512, committed 105185280, max 105185280
2021-04-30T17:37:37.58+0200 [APP/PROC/WEB/0] OUT Survivor Space: init 524288, used 0, committed 13107200, max 13107200
2021-04-30T17:37:37.58+0200 [APP/PROC/WEB/0] OUT Tenured Gen: init 7143424, used 144429632, committed 262864896, max 262864896


Mitigation

If you find your Java application in this situation, it's often an indicator of a problem in application design where you’re trying to hold more data in memory than your JVM is capable of. A typical example includes situations where a large set of objects are queried from the database without chunking or streaming. While it sounds appealing to just increase the allocated memory to the Java application, it's important to understand that scaling up vertically (more memory) doesn't solve an architectural problem. Another issue might be a memory leak in the application. If the heap histogram is not sufficient to find the memory leak, get a full heap dump of the Java process (part 3 of this blog series).

Next


In the next blog post, part 3, we will show you how heap dumps and thread dumps can be created from running Java applications on SAP BTP Cloud Foundry.
6 Comments
yogesh_beria
Advisor
Advisor
0 Kudos
Hi Tim,

 

When looking at the recent logs after app crashes due to OOM, I am unable to see the logs related to heap histograms or memory usage.

 

Does it need any parameters or log levels to be changed for these to be visible.

 

Thanks!
TimGerlach
Advisor
Advisor
0 Kudos

Dear Yogesh,

thank you for your comment. Have you verified that the app crash is indeed a JVM out of memory error and not a container out of memory error? The heap histogram is only written for JVM out of memory errors and should show up in your logs, but only for a short time as the buffer for the --recent logs is rather low.

Hope this helps.

 

wei_wu05
Explorer
0 Kudos

Hi Tim,

 

Thanks a lot for the informative blog

 

We observe a different type of Exit status 137 for our service as the log snippet below showing.

It's preceded by an instance replacement request from some cell and a Exit status 143.

 

Could you help with the following questions?

 

Is it the Exit Status 137 also caused by out of memory?

What is the cell requesting instance replacement?

What causes the instance replacement?

 

Thank you in advance,

Wei

 

Apr 20, 2023 @ 15:31:16.219  Cell 39301f44-dfa2-482e-97e8-508d730d5a29 successfully destroyed container for instance 055109f6-fb32-408d-4bcc-fb88

Apr 20, 2023 @ 15:29:50.377 Exit status 137

Apr 20, 2023 @ 15:29:50.221 Cell 39301f44-dfa2-482e-97e8-508d730d5a29 destroying container for instance 055109f6-fb32-408d-4bcc-fb88

Apr 20, 2023 @ 15:29:50.221 Exit status 143

Apr 20, 2023 @ 15:29:42.725 Cell 39301f44-dfa2-482e-97e8-508d730d5a29 stopping instance 055109f6-fb32-408d-4bcc-fb88

Apr 20, 2023 @ 15:24:32.131 Container became healthy

....

Apr 20, 2023 @ 15:21:52.484 Cell c05bccca-6c1b-491b-b5c6-e651c7087a85 creating container for instance 1c28821c-19ac-4662-40e2-83e5

Apr 20, 2023 @ 15:21:46.209 Cell 39301f44-dfa2-482e-97e8-508d730d5a29 requesting replacement for instance 055109f6-fb32-408d-4bcc-fb88

 

 

TimGerlach
Advisor
Advisor
0 Kudos
Hi Wei,

I guess it'd be worth to check cf events APP_NAME to see if there are some more details on the events that were logged with exist status 137 and 143 during that time.

 
wei_wu05
Explorer
Thank you very much, Tim

 

I tried with cf events but am not sure how to interpret the results.

I documented the details in the wiki Exit Status 137 in Production Space

 

The results are not consistent to me. For example:

We had Exit status 137 four times between Apr 20 and Apr 22.

They look the same on Cloud Logging.

However, there are only two crashes of one of them on the output of the cf events command.

And even these two seem like to be caused by the recreation of the instances.

 

What do you think?

 

Thanks again,

Wei

 

 
abiskop
Advisor
Advisor
0 Kudos

Thank you for this great article, Tim. We in fact just experienced the exact problem you described, and were able to fix it by applying your recommendations.

One quick note/correction: in order to configure the stack_threads, class_count and headroom hints for the SAP Java buildpack, one has to set the JBP_CONFIG_SAPJVM env variable (as opposed to JBP_CONFIG_JAVA_OPTS), like so:

JBP_CONFIG_SAPJVM: "[memory_calculator_v2: {stack_threads: 266, class_count: 1001, headroom: 5}]"

This is also what the "Memory Calculator V2" docs suggest. Specifying JBP_CONFIG_JAVA_OPTS with the respective memory settings directly is an alternative way of doing things if I understand correctly.

Thanks again, Alex