From below statisticsserver trace, look at the memory consumption for statisticsserver highlighted in red. Pay attention to the PAL (process allocation limit), AB (allocated byte) and U (used) value. When U value is close, equal or bigger then PAL value, this indicates out of memory occurred.
[27787]{-1}[-1/-1] 2014-09-25 16:10:22.205322 e Memory ReportMemoryProblems.cpp(00733) : OUT OF MEMORY occurred.
Failed to allocate 32816 byte.
Current callstack:
1: 0x00007f2d0a1c99dc in MemoryManager::PoolAllocator::allocateNoThrowImpl(unsigned long, void const*)+0x2f8 at PoolAllocator.cpp:1069 (
2: 0x00007f2d0a24b900 in ltt::allocator::allocateNoThrow(unsigned long)+0x20 at memory.cpp:73 (libhdbbasis.so)
3: 0x00007f2cf78060dd in __alloc_dir+0x69 (libc.so.6)
4: 0x00007f2d0a247790 in System::UX::opendir(char const*)+0x20 at SystemCallsUNIX.cpp:126 (libhdbbasis.so)
5: 0x00007f2d0a1016dc in FileAccess::DirectoryEntry::findFirst()+0x18 at SimpleFile.cpp:511 (libhdbbasis.so)
6: 0x00007f2d0a1025da in FileAccess::DirectoryEntry::DirectoryEntry(char const*)+0xf6 at SimpleFile.cpp:98 (libhdbbasis.so)
7: 0x00007f2d0a04872f in Diagnose::TraceSegmentCompressorThread::run(void*&)+0x26b at TraceSegment.cpp:150 (libhdbbasis.so)
8: 0x00007f2d0a0c0dcb in Execution::Thread::staticMainImp(void**)+0x627 at Thread.cpp:475 (libhdbbasis.so)
9: 0x00007f2d0a0c0f6d in Execution::Thread::staticMain(void*)+0x39 at Thread.cpp:543 (libhdbbasis.so)
Memory consumption information of last failing ProvideMemory, PM-INX=103393:
Memory consumption information of last failing ProvideMemory, PM-INX=103351:
IPMM short info:
GLOBAL_ALLOCATION_LIMIT (GAL) = 200257591012b (186.50gb), SHARED_MEMORY = 17511289776b (16.30gb), CODE_SIZE = 6850695168b (6.37gb)
PID=27562 (hdbnameserver), PAL=190433938636, AB=2844114944, UA=0, U=1599465786, FSL=0
PID=27674 (hdbcompileserve), PAL=190433938636, AB=752832512, UA=0, U=372699315, FSL=0
PID=27671 (hdbpreprocessor), PAL=190433938636, AB=760999936, UA=0, U=337014040, FSL=0
PID=27746 (hdbstatisticsse), PAL=10579663257, AB=10512535552, UA=0, U=9137040196, FSL=0
PID=27749 (hdbxsengine), PAL=190433938636, AB=3937583104, UA=0, U=2352228788, FSL=0
PID=27743 (hdbindexserver), PAL=190433938636, AB=155156312064, UA=0, U=125053733102, FSL=10200547328
Total allocated memory= 198326363056b (184.70gb)
Total used memory = 163214166171b (152gb)
Sum AB = 173964378112
Sum Used = 138852181227
Heap memory fragmentation: 17% (this value may be high if defragmentation does not help solving the current memory request)
Top allocators (ordered descending by inclusive_size_in_use).
1: / 9137040196b (8.50gb)
2: Pool 8130722166b (7.57gb)
3: Pool/StatisticsServer 3777958248b (3.51gb)
4: Pool/StatisticsServer/ThreadManager 3603328480b (3.35gb)
5: Pool/StatisticsServer/ThreadManager/Stats::Thread_3 3567170192b (3.32gb)
6: Pool/RowEngine 1504441432b (1.40gb)
7: AllocateOnlyAllocator-unlimited 887088552b (845.99mb)
8: Pool/AttributeEngine-IndexVector-Single 755380040b (720.38mb)
9: AllocateOnlyAllocator-unlimited/FLA-UL<3145728,1>/MemoryMapLevel2Blocks 660602880b (630mb)
10: AllocateOnlyAllocator-unlimited/FLA-UL<3145728,1> 660602880b (630mb)
1: Pool/RowEngine/RSTempPage 609157120b (580.93mb)
12: Pool/NameIdMapping 569285760b (542.91mb)
13: Pool/NameIdMapping/RoDict569285696b (542.91mb)
14: Pool/RowEngine/LockTable 536873728b (512mb)
15: Pool/malloc 429013452b (409.13mb)
16: Pool/AttributeEngine 253066781b (241.34mb)
17: Pool/RowEngine/Internal 203948032b (194.50mb)
18: Pool/malloc/libhdbcs.so 179098372b (170.80mb)
19: Pool/StatisticsServer/LastValuesHolder 167034760b (159.29mb)
20: Pool/AttributeEngine/Delta 157460489b (150.16mb)
Top allocators (ordered descending by exclusive_size_in_use).
1: Pool/StatisticsServer/ThreadManager/Stats::Thread_3 3567170192b (3.32gb)
2: Pool/AttributeEngine-IndexVector-Single 755380040b (720.38mb)
3: AllocateOnlyAllocator-unlimited/FLA-UL<3145728,1>/MemoryMapLevel2Blocks 660602880b (630mb)
4: Pool/RowEngine/RSTempPage 609157120b (580.93mb)
5: Pool/NameIdMapping/RoDict 569285696b (542.91mb)
6: Pool/RowEngine/LockTable 536873728b (512mb)
7: Pool/RowEngine/Internal 203948032b (194.50mb)
8: Pool/malloc/libhdbcs.so 179098372b (170.80mb)
9: Pool/StatisticsServer/LastValuesHolder 167034760b (159.29mb)
10: StackAllocator 116301824b (110.91mb)
11: Pool/AttributeEngine/Delta/LeafNodes 95624552b (91.19mb)
12: Pool/malloc/libhdbexpression.so 93728264b (89.38mb)
13: Pool/AttributeEngine-IndexVector-Sp-Rle 89520328b (85.37mb)
14: AllocateOnlyAllocator-unlimited/ReserveForUndoAndCleanupExec 84029440b (80.13mb)
15: AllocateOnlyAllocator-unlimited/ReserveForOnlineCleanup 84029440b (80.13mb)
16: Pool/RowEngine/CpbTree 68672000b (65.49mb)
17: Pool/RowEngine/SQLPlan 63050832b (60.12mb)
18: Pool/AttributeEngine-IndexVector-SingleIndex 57784312b (55.10mb)
19: Pool/AttributeEngine-IndexVector-Sp-Indirect 56010376b (53.41mb)
20: Pool/malloc/libhdbcsstore.so 55532240b (52.95mb)
[28814]{-1}[-1/-1] 2014-09-25 16:09:19.284623 e Mergedog Mergedog.cpp(00198) : catch ltt::exception in mergedog watch thread run(
😞 exception 1: no.1000002 (ptime/common/pcc/pcc_MonitorAlloc.h:59)
Allocation failed
exception throw location:
You can refer 2 solutions below if the HANA system is not ready to switch to embedded stasticsserver for any reason.
1) If statistic server is down and inaccessible, you need to kill hdbstatisticsserver pid in OS. Statisticsserver will be restarted immediately by hdb daemon.
2) Check memory consumed by statisticsserver:
3) Check whether the statistics server deletes the old data, go to Catalog -> _SYS_STATISTICS -> TABLES and randomly check table starting with GLOBAL* and HOST* and sort by snapshot_id ascendingly. Ensure the oldest date identical to the retention period.
Alternatively, you can run command: select min (snapshot_id) from _SYS_STATISTICS.<TABLE>
Eg:
4) Check the retention period of each tables in Configuration -> Statisticsserver -> statisticsserver_sqlcommands
eg:
30 days for HOST_WORKLOAD
5) If old data more than 30 days (or we want to delete old data by shorten the retention period), follow 1929538 - HANA Statistics Server - Out of Memory -> Option 1:
Create the procedure using the file attached on note 1929538 and run call set_retention_days(20);
6) Once done, you’ll see old data with more than 20days get deleted :
Memory consumption for statisticsserver reduced:
Also, the min snapshot_id get updated, which is 20days before the retention period:
7) You can reset the retention period to default value anytime if you want, by calling call set_retention_days(30);or restore every SQL command to default in statisticsserver_sqlcommands.
i) Follow 1929538 - HANA Statistics Server - Out of Memory and increase allocationlimit for statisticsserver. This can be done only when statisticsserver is up and accessible. Otherwise, you need to kill and restart it.
One good script HANA_Histories_RetentionTime_Rev70+ from Note 1969700 - SQL statement collection for SAP HANA provides a good overview of Retention time.
My 2 cents worth, for any statisticsserver OOM error, always check the memory usage of statisticsserver to ensure obselete data get deleted after retention period instead of increasing the allocation limit for statisticsserver blindly.
Additionally, you also can refer to 2084747 - Disabling memory intensive data collections of standalone SAP HANA statisticsserver to disable data collection that consume high memory.
Hope it helps,
Thanks,
Nicholas Chang
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
5 | |
3 | |
3 | |
3 | |
3 | |
3 | |
3 | |
3 | |
3 | |
3 |