Problem
IQ sometimes gets aborted silently logs on Linux.
In this case there are no stacktraces and error messages.
Cause
An OOM Killer (Out of memory Killer) kills a process when the no available free memory on system.
If IQ process is killed by OOM Killer, IQ will crash without any error and stacktrace.
We can see the "out of memory" within message log.
[message log]
Nov 12 12:48:00 BDDBBETLMV01 kernel: iqsrv16 invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Nov 12 12:48:00 BDDBBETLMV01 kernel: iqsrv16 cpuset=/ mems_allowed=0
Nov 12 12:48:00 BDDBBETLMV01 kernel: Pid: 29293, comm: iqsrv16 Not tainted 2.6.32-279.el6.x86_64 #1
Nov 12 12:48:00 BDDBBETLMV01 kernel: Call Trace:
Nov 12 12:48:00 BDDBBETLMV01 kernel: [<ffffffff810c4971>] ? cpuset_print_task_mems_allowed+0x91/0xb0
Nov 12 12:48:00 BDDBBETLMV01 kernel: [<ffffffff811170e0>] ? dump_header+0x90/0x1b0
Nov 12 12:48:00 BDDBBETLMV01 kernel: [<ffffffff812146fc>] ? security_real_capable_noaudit+0x3c/0x70
Nov 12 12:48:00 BDDBBETLMV01 kernel: [<ffffffff81117562>] ? oom_kill_process+0x82/0x2a0
Nov 12 12:48:00 BDDBBETLMV01 kernel: [<ffffffff8111745e>] ? select_bad_process+0x9e/0x120
Nov 12 12:48:00 BDDBBETLMV01 kernel: [<ffffffff811179a0>] ? out_of_memory+0x220/0x3c0
...
...
Nov 12 12:48:00 BDDBBETLMV01 kernel: Out of memory: Kill process 29081 (iqsrv16) score 950 or sacrifice child
Nov 12 12:48:00 BDDBBETLMV01 kernel: Killed process 29081, UID 501, (iqsrv16) total-vm:24941028kB, anon-rss:3586296kB, file-rss:1332kB
Resolution
It's not an IQ issue, It's resource issue.
There are a couple of solutions on it.
1) Increase the physical memory or decrease the value of iqmc/iqtc/-ch/-cl
2) Decrease the number of concurrent jobs
HTH
Regards
Gi-Sung Jang
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
12 | |
12 | |
9 | |
9 | |
8 | |
6 | |
6 | |
6 | |
6 | |
5 |