Full GC on java server nodes

Read only

4,474

SAP Managed Tags
NW AS Java Administrator (NWA)

Hi guys,

My portal system is on SAP NW 7.0 EHP2 and backend on ECC 6 EHP6. Recently, on the portal system I'm getting lot of Full GCs and concurrent mode failure. I have two server nodes and heap size set to 3gb for both (It was 2gb initially). When the Full GCs and concurrent mode failure are occurring, users are not able to work on the portal application and it is timing out. It is only becoming operational again long after a heap dump occurs or I restart the instance manually. Please suggest how I can resolve this issue.

Additional info: The portal is a Tax Officer Portal used to view, input and amend tax returns. Since the portal is java stack, my ECC system uses this java stack for ADS (z-reports displayed in PDF).

So the issue is that I'm unable to know what is causing the frequent Full GC situation and the OOM heap dump. I tried to analyze using Memory Analyzer but it's not clear enough. Please advise.

Regards.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

To answer on your question you need to provide logs for analysis. Can you do this?

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Hi all,

In the morning it was working ok but just some minutes ago, server0 started giving this (std_server0.out):

58103.556: [GC 58103.556: [ParNew: 451883K->36605K(512000K), 0.1618678 secs] 2472347K->2058569K(3043328K), 0.1623421 secs] [Times: user=2.29 sys=0.01,

real=0.16 secs]

58107.064: [GC 58107.065: [ParNew: 446205K->36078K(512000K), 0.1622301 secs] 2468169K->2060564K(3043328K), 0.1629539 secs] [Times: user=2.27 sys=0.01,

real=0.16 secs]

58110.573: [GC 58110.573: [ParNew: 445678K->36381K(512000K), 0.1551900 secs] 2470164K->2062137K(3043328K), 0.1556522 secs] [Times: user=2.10 sys=0.01,

real=0.16 secs]

58110.732: [GC [1 CMS-initial-mark: 2025756K(2531328K)] 2069865K(3043328K), 0.0710652 secs] [Times: user=0.07 sys=0.00, real=0.07 secs]

58110.804: [CMS-concurrent-mark-start]

58114.084: [GC 58114.085: [ParNew: 445981K->35487K(512000K), 0.1408452 secs] 2471737K->2062783K(3043328K), 0.1414961 secs] [Times: user=1.93 sys=0.01,

real=0.14 secs]

58117.622: [GC 58117.622: [ParNew: 445087K->42786K(512000K), 0.1572854 secs] 2472383K->2071419K(3043328K), 0.1577878 secs] [Times: user=2.12 sys=0.01,

real=0.16 secs]

58120.534: [CMS-concurrent-mark: 9.319/9.730 secs] [Times: user=24.35 sys=0.22, real=9.73 secs]

58120.535: [CMS-concurrent-preclean-start]

58120.576: [CMS-concurrent-preclean: 0.039/0.041 secs] [Times: user=0.09 sys=0.00, real=0.04 secs]

58120.577: [CMS-concurrent-abortable-preclean-start]

58121.073: [GC 58121.073: [ParNew: 452386K->38914K(512000K), 0.1523676 secs] 2481019K->2069226K(3043328K), 0.1528477 secs] [Times: user=2.06 sys=0.01,

real=0.15 secs]

58123.710: [CMS-concurrent-abortable-preclean: 2.938/3.133 secs] [Times: user=9.03 sys=0.11, real=3.13 secs]

58123.716: [GC[YG occupancy: 368254 K (512000 K)]58123.717: [Rescan (parallel) , 0.2551403 secs]58123.972: [weak refs processing, 0.0149601 secs]58123

.987: [class unloading, 0.3664999 secs]58124.354: [scrub symbol table, 0.2478379 secs]58124.603: [scrub string table, 0.0268882 secs] [1 CMS-remark: 2

030311K(2531328K)] 2398565K(3043328K), 1.1023869 secs] [Times: user=5.66 sys=0.11, real=1.10 secs]

58124.819: [CMS-concurrent-sweep-start]

58125.523: [GC 58125.523: [ParNew: 448514K->38014K(512000K), 0.1271674 secs] 2472868K->2063735K(3043328K), 0.1276991 secs] [Times: user=1.57 sys=0.01,

real=0.13 secs]

58126.907: [CMS-concurrent-sweep: 1.937/2.088 secs] [Times: user=5.80 sys=0.07, real=2.09 secs]

58126.909: [CMS-concurrent-reset-start]

58126.951: [CMS-concurrent-reset: 0.042/0.042 secs] [Times: user=0.04 sys=0.00, real=0.04 secs]

58128.948: [GC 58128.949: [ParNew: 447614K->46857K(512000K), 0.1323817 secs] 2463404K->2063979K(3043328K), 0.1328720 secs] [Times: user=1.54 sys=0.01,

real=0.13 secs]

58132.450: [GC 58132.451: [ParNew: 456457K->40349K(512000K), 0.1694101 secs] 2473579K->2059096K(3043328K), 0.1699432 secs] [Times: user=2.40 sys=0.01,

real=0.17 secs]

58136.153: [GC 58136.154: [ParNew: 449949K->39679K(512000K), 0.1694896 secs] 2468696K->2060115K(3043328K), 0.1701838 secs] [Times: user=2.28 sys=0.01,

real=0.17 secs]

58139.679: [GC 58139.679: [ParNew: 449279K->37786K(512000K), 0.1319183 secs] 2469715K->2059800K(3043328K), 0.1324555 secs] [Times: user=1.57 sys=0.01,

real=0.13 secs]

58143.143: [GC 58143.144: [ParNew: 447386K->45807K(512000K), 0.1184929 secs] 2469400K->2069109K(3043328K), 0.1192081 secs] [Times: user=1.32 sys=0.01,

real=0.12 secs]

58146.689: [GC 58146.689: [ParNew: 455407K->40078K(512000K), 0.1266728 secs] 2478709K->2064879K(3043328K), 0.1271572 secs] [Times: user=1.51 sys=0.01,

real=0.13 secs]

58149.653: [GC 58149.654: [ParNew: 449678K->39412K(512000K), 0.1464457 secs] 2474479K->2065844K(3043328K), 0.1470831 secs] [Times: user=2.15 sys=0.01,

real=0.15 secs]

58149.804: [GC [1 CMS-initial-mark: 2026432K(2531328K)] 2073143K(3043328K), 0.0816621 secs] [Times: user=0.08 sys=0.00, real=0.08 secs]

58149.887: [CMS-concurrent-mark-start]

58153.255: [GC 58153.256: [ParNew: 449012K->46885K(512000K), 0.1606395 secs] 2475444K->2074547K(3043328K), 0.1613918 secs] [Times: user=2.08 sys=0.01,

real=0.16 secs]

58155.721: [CMS-concurrent-mark: 5.643/5.833 secs] [Times: user=14.00 sys=0.53, real=5.83 secs]

58155.722: [CMS-concurrent-preclean-start]

58155.792: [CMS-concurrent-preclean: 0.068/0.071 secs] [Times: user=0.07 sys=0.00, real=0.07 secs]

58155.793: [CMS-concurrent-abortable-preclean-start]

I got the heap dump also, but users are able to access it, though a bit slower.

Any suggestions....

regards.

Please attach full log file to message.

Hi,

What is the JVM that you are using. How much memory do you have on the server . How many server nodes do you have in your landscape.

SAP JVM or HP JVM.

Note 1506939 - Recommendations of Concurrent Mark & Sweep for HPJVM

Thanks

Rishi Abrol

Hi all,

I use SAP JVM 4.1

As mentioned earlier, I have 2 server nodes on the portal server. Also the portal server is a solaris container. It uses the memory from the global zone (main server) which has 256 GB. The main server has other containers which shares the same memory.

I'm attaching an extract of DefaultTrace0.log at time of Full GC.

regards.

Please attach std_server0.out log to message not defaultTraceX.log.

Hi,

Try to add one more server node and see if it takes care of the load.

You can go up to 4 server node to get the best result.

I think that the portal has more load than what it is sized too.

Thanks

Rishi abrol

Hi all,

I'm attaching the std_server0.out file.

Maybe I need to add one additional server node....because as I said earlier, on the ECC side I have some reports that use the ADS of the portal server to display as PDF. I have some 600 users. So maybe when several users are running these PDF reports, it's putting extra strain on the Portal....

Pls give your suggestions.

regards.

Thanks. In addition please attach also dev_server0 log file.

Hi,

I've attached the dev_server0 file also.

regards.

Hi,

Just got the visual view of the Std_server.out and found that the memory suddenly peaks.

[160] 12:16:34

***Warning: Potential spurious wakeup occurred at object <0xfffffffe7c6b3208> (a java.lang.Object), awaking thread "SAPEngine_System_Thread[impl:5]_65" tid=0x0000000105f55000.

"SAPEngine_System_Thread[impl:5]_65" cpu=4246.35 [reset 4246.35] ms allocated=30582056 B (29.17 MB) [reset 30582056 B (29.17 MB)] defined_classes=37

io= file i/o: 93991/14346 B, net i/o: 15690/1193 B, files opened:0, socks opened:0 [reset file i/o: 93991/14346 B, net i/o: 15690/1193 B, files opened:0, socks opened:0 ]

prio=3 tid=0x0000000105f55000 nid=0xa0 / 160 lwp-id=160 runnable [_thread_in_vm (_running), stack(0xfffffffe4a300000,0xfffffffe4a500000)] [0xfffffffe4a4ff000]

at java.lang.Object.wait(J)V(Native Method)

- waiting on <0xfffffffe7c6b3208> (a java.lang.Object)

at java.lang.Object.wait()V(Object.java:429)

at com.sap.jms.server.destinationcontainer.AgentThreadSystem.run()V(AgentThreadSystem.java:148)

- locked <0xfffffffe7c6b3208> (a java.lang.Object)

at com.sap.engine.frame.core.thread.Task.run()Ljava/lang/Object;(Task.java:64)

at com.sap.engine.core.thread.impl5.SingleThread.execute()V(SingleThread.java:83)

at com.sap.engine.core.thread.impl5.SingleThread.run()V(SingleThread.java:156)

Can you please check what is the size of the PDF that the user downloads.

Below is the graph for heap usage after GC collection.

These lines are interesting.

allocated=30582056 B (29.17 MB) [reset 30582056 B (29.17 MB)] defined_classes=37

io= file i/o: 93991/14346 B, net i/o: 15690/1193 B, files opened:0, socks opened:0 [reset file i/o: 93991/14346 B, net i/o: 15690/1193 B

Thanks

Rishi Abrol

Hi Rishi,

Thanks for the analysis. What is the tool you have used to display the graphs etc?

We have many PDF forms that the users are running. There is one when the user saves it, it is around 500KB. I guess if several users are running this report, then may be it can be a cause for concern.

I need to check the reports individually....

But is it normal to show something like 29.17MB memory consumed? What may be causing this?

regards.

Hi,

This is one of the tools called as jmeter.

Second thing i think PDF size 500 kb is quite large . What is this report.

So for example as you said that you have 600 user i take 10% as the concurrent users so that is 60 users concurrent.

If i think that there are 5000 records in the pdf and it take 50MB so it will take 256 mb memory.

256*60= 15 gb memory is required.

This is rough figure just to give an idea.

Thanks

Rishi Abrol

Hi Rishi,

Is it Apache Jmeter? Is it installed on windows? It seems very useful.

Regarding the issue, I'm also thinking these PDF reports might be causing this. The report consists of many fields and text and can go over 5 pages. Don't know whether the developers can further optimise it.

Well I'm thinking of 2 solutions:

1) On the ECC Side, I point the ADS RFC to another java server instead of the live portal server. In this way, portal users won't face performance degradation issues.

2) I create a third server node with max heap set to 3GB also.

What do you think?

regards.

Hi,

No its an just Jmeter tool. You might not find it .

Please go ahead and try to add node.

But another thing i would first check with developer that how much time does it take to generate the pdf and download it and if some thing can be done to reduce the size .

Another thing that i fould was an old note for the related issue in the std_server.out.

Note 804048 - Unnecessary high CPU load and JMS exceptions

This is old and not for your release but you can try to raise message for sap and see what help they can provide.

Thanks

Rishi Abrol

Can you attach OOM heap dump also?

Hi all,

Roman,

The heap dump is around 3 gb so would be difficult to attach.

Is there any other info I can provide?

regards.

Ok. What is amount of RAM available to host? Are both server nodes on the same host? Is DBMS on the same host? If yes what is amount of RAM allocated to it? Can you attach dev_server1 log file also?

Hi all,

Roman,

I mentioned earlier, the portal server is a solaris container that uses the memory from the global zone which has 256 GB memory. DBMS is on the same host (container).

I've attached the dev_server1 file.

I haven't yet created the 3rd server node. I'm investigating whether developers can reduce the pdf forms or can enable some kind of caching.

In the case I need to add the 3rd server node, can someone please outline the steps to do it (so that I don't miss anything)

By the way, yesterday I got the issue again (OOM heap dump) and had to restart the instance manually. I'm attaching the std_server0 and std_server1 files.

regards.

Hi,

Just try to add the server node in the config tool.

You might have to revisit all the parameters in the new server node as the server node will be created with the default parameter based on the bootstap. So check that all the parameters are correct. Please also check if you have done additional customizing related to logoff url or thing with in your node. Please alos check that.

Thanks

Rishi abrol

Ok Rishi, thanks.

In fact, I'll create the 3rd node same as the 2 other nodes, i.e with 3GB heap size.

Can you please tell what you mean by 'done additional customizing related to logoff url or thing with in your node' - do you mean changing the logoff pages etc....

regards

Without OOM heap dump it's hard to solve the problem and impossible to find the root cause of issue.But try the following: increase -Xms and -Xmx to 4096M. You can reset back:

-XX:MaxPermSize=512M

-XX:PermSize=512M

Make sure that you have enough free main memory to avoid paging. After some time attach std_server0 and std_server1 log files again for analysis.

Hi,

What i really mean is that any setting that you have made in the node should be check in the new node.

Sorry about the logoff page as it is in the global accross the server node . I was talking about the configuration related to specific node like Configuring the Portal Start Page

Visual Administrator -> Cluster -> Server_X ->Services -> HTTP Provider

Thanks

Rishi Abrol

Ask a Question

Top Q&A Solution Author

User

Count

By Category

Related Content

Activity Groups

Industry Groups

Influence and Feedback Groups

Interest Groups

Location Groups

Customer Only Groups

Forums

Related Resources

Products

Learning and Support

About

My SAP Profile

My SAP Profile

Full GC on java server nodes

Know the answer?

Need more details?