High CPU debugging in JVM using Native available t...

Pradeep_Kumar76 · ‎2026 Feb 27

High CPU debugging in JVM using `top -H` + multiple thread dumps (with real snapshots)

1) Capture thread-level CPU at 5–10s intervals

Find the Java PID first:

ps -ef | grep java
# or
pgrep -f 'java|tomcat|catalina'

Now capture per-thread CPU:

PID=<pid>

# refresh every 5 seconds (good default)
top -H -p $PID -d 5

# or capture 6 samples at 10-second interval (for evidence)
for i in {1..6}; do
  date
  top -b -n 1 -H -p $PID | head -n 40
  sleep 10
done | tee top_threads_${PID}.log

Snapshot (example `top -H -p <pid>`)

This is what you’re looking for (the key column is the Linux thread id a.k.a. LWP/TID):

  PID   LWP  USER   PR  NI   VIRT   RES   SHR S  %CPU  %MEM   TIME+  COMMAND
2176  2294  tomcat  20   0  18.3g  6.1g  1.2g R  98.7   9.8  12:34.56 java
2176  2301  tomcat  20   0  18.3g  6.1g  1.2g R  75.3   9.8   9:12.01 java

Here, 2294 (LWP) is the hottest thread at that instant.

2) Take 3–6 thread dumps in the same window (5–10s apart)

Use one of these (pick what your access allows):

# best: jcmd (ships with the JDK)
jcmd $PID Thread.print -l > threaddump_1.txt
sleep 10
jcmd $PID Thread.print -l > threaddump_2.txt
sleep 10
jcmd $PID Thread.print -l > threaddump_3.txt

Alternatives:

# jstack (also fine)
jstack -l $PID > threaddump_1.txt

# kill -3 writes to stdout/stderr of the process (often catalina.out)
kill -3 $PID

3) Correlate `top` thread id (decimal) to JVM thread `nid` (hex)

In HotSpot thread dumps, the native thread id is printed as nid=0x....

From top, you get LWP/TID in decimal. Convert it to hex:

TID_DEC=2294
printf "0x%x\n" $TID_DEC
# => 0x8f6   (example)

Now search in the thread dump for that nid:

grep -n "nid=0x8f6" -n threaddump_*.txt

4) Read the stack and compare across multiple dumps

You already stated the most important rule: if the stack is not moving across multiple dumps, it’s a strong signal that code path is burning CPU (tight loop / heavy regex / busy spin / excessive logging / crypto / serialization, etc.).

Snapshot (real example from your thread dump)

In your dump, the thread:

name: Log4j2-TF-1-AsyncLogger...
nid=0x8f6
state: RUNNABLE
stack shows heavy regex activity inside your custom PII obfuscator:

shows nid=0x8f6 and the stack going through java.util.regex and LMSThreadNamePIIObfuscator.format(LMSThreadNamePIIObfuscator.java:42).

A second view of the same hot path shows the same thread still in regex-heavy matching (lots of Pattern$GroupHead.match, loop/tail/greedy char property frames).

That’s a classic “CPU sink” pattern: regex backtracking + large input + frequent invocation.

Snapshot (why multiple dumps matter)

In another dump timestamp, the same nid=0x8f6 thread is WAITING on the disruptor wait strategy, meaning it was not the hot CPU thread at that moment (it’s blocked/parked).

So the correct conclusion comes only after correlating:

top -H hottest LWP at that exact time, and
matching nid in dumps taken in the same window.

A clean workflow you can paste into your runbook

Step A: Collect evidence (tight timebox)

PID=<pid>

# 1) capture top thread CPU for 60 seconds
for i in {1..6}; do
  date "+%F %T"
  top -b -n 1 -H -p $PID | head -n 60
  sleep 10
done > topH_${PID}.log

# 2) capture 3 thread dumps during same minute
for i in 1 2 3; do
  date "+%F %T" > threaddump_${i}.txt
  jcmd $PID Thread.print -l >> threaddump_${i}.txt
  sleep 10
done

Step B: Correlate quickly

From topH_${PID}.log, pick the top LWP (decimal).
Convert: printf "0x%x\n" <LWP>
grep "nid=0x..." in all thread dumps.
Compare stacks:
- same method(s) repeating ⇒ likely CPU culprit
- stack changes ⇒ transient / contention / periodic workload

Step C: Decide “what kind of CPU”

Use these simple heuristics:

Hot RUNNABLE in app code → fix code path (algorithm/regex/loop/logging).
Hot RUNNABLE in GC/VM threads → check GC logs, allocation rate, humongous objects; may need heap/GC tuning.
Hot JIT compiler threads → warmup / new code paths / too many generated classes / instrumentation.
Hot agent threads (APM/security) → verify agent overhead/config; consider sampling settings.

Snapshot (agent overhead exists in your dump)

Your thread dump shows very high CPU time reported for OneAgent threads (examples include oneagentautosensor with very large CPU time).
That doesn’t automatically mean “agent is the root cause”, but it’s an important branch in the investigation: if top -H consistently points to those agent LWPs, you focus on APM configuration rather than app logic.

What to do when you find a “stuck” hot stack

Using your PII-obfuscation / regex example:

If you consistently see java.util.regex.Pattern...match() and Matcher.find() leading into LMSThreadNamePIIObfuscator.format(...) :
1. Check the regex pattern for catastrophic backtracking (nested quantifiers, ambiguous alternations).
2. Reduce input size (truncate thread names / sanitize earlier).
3. Cache precompiled patterns (avoid compiling per call).
4. Lower the rate of logging or move expensive formatting off the hot path (e.g., avoid heavy lookups/regex in layout pattern for every event).
5. Confirm by re-running the same evidence loop and showing reduced %CPU for that LWP.

Optional “best next step” after thread dumps

Thread dumps + top -H are the traditional, reliable first pass. When you need exact method-level CPU attribution, add JFR (low overhead on JDK 11+):

jcmd $PID JFR.start name=cpu settings=profile duration=60s filename=/tmp/cpu.jfr

This will usually confirm (or refute) whether the top CPU is coming from regex/logging, GC, crypto, JSON serialization, Kafka polling, or instrumentation.

By Category

Related Content

Activity Groups

Industry Groups

Influence and Feedback Groups

Interest Groups

Location Groups

Customer Only Groups

Forums

Related Resources

Products

Learning and Support

About

My SAP Profile

My SAP Profile