cancel
Showing results for 
Search instead for 
Did you mean: 

Issue with long running PAL Function on HANA

BachnerA
Discoverer
0 Kudos
2,450

We are facing an issue with a long running PAL Function (PAL Additive Model Forecast) on HANA.

The PAL function is called via AMDP script in an end routine of a BW transformation.
For 99% of our data the routine is running about 3 sec and everthing works fine, but for a specific subset of data the function is running 1 hour and then throws a time out on the scriptserver.

The function call looks like this:

 

call _SYS_AFL.PAL_ADDITIVE_MODEL_ANALYSIS(:lt_data, :holiday_tbl, :param_tbl, lt_model);

 

 lt_data:

FISCYEARSUM_ST
2018-01-0116
2019-01-015
2020-01-0133
2021-01-0111

lt_holiday_tbl: empty

param_tbl:

NAMEINT_VALUEDOUBLE_VALUESTRING_VALUE
GROWTH  linear

Active SQL-STATEMENTS on HANA DB:

 

call _SYS_AFL.PAL_ADDITIVE_MODEL_ANALYSIS("SYS"."_SYS_SS2_TMP_TABLE_119235207_LT_DATA_2_DA206DDCEB223042BE5C8EB168FEB187_3", "SYS"."_SYS_SS2_TMP_TABLE_119235207_HOLIDAY_TBL_2_DA206DDCEB223042BE5C8EB168FEB187_3", "SYS"."_SYS_SS2_TMP_TABLE_119235207_PARAM_TBL_2_DA206DDCEB223042BE5C8EB168FEB187_3", ?)
CALL "SAPHANADB"."/BIC/DP3UWF3KQQQ0HRQ1J6MG_A=>GLOBAL_END"('DTP_0WB0ZSP4QW51W5L6UF79IKB20', 'ODP_HANA', 'VCVP_AI_SALES_SHARE           ODP_HANA', 'DTASRC', 'EAXBPC04', 'ADSO', '20240822082208000034000', "SAPHANADB"."_SYS_SS2_TMP_TABLE_3_119235207_3367455460654702955_1", ?)
INSERT INTO "SAPHANADB"."/BIC/AEAXBPC041" ("RECORDMODE", "FISCYEAR", "PLANALG", "PLANMONTH", "PROD_SHARE", "SALESORG", "SUM_ST", "SUM_TOTAL", "ZVPBEPROD", "ZVPPLANKU" ,   "REQTSN" , "RECORD" , "DATAPAKID")  SELECT  "0RECORDMODE" "RECORDMODE",  "FISCYEAR" "FISCYEAR",  "PLANALG" "PLANALG",  "PLANMONTH" "PLANMONTH",  "PROD_SHARE" "PROD_SHARE",  "SALESORG" "SALESORG",  "SUM_ST" "SUM_ST",  "SUM_TOTAL" "SUM_TOTAL",  "ZVPBEPROD" "ZVPBEPROD",  "ZVPPLANKU" "ZVPPLANKU" ,'20240822082208000034000' AS "REQTSN",  ROW_NUMBER() OVER ( ORDER BY  "SALESORG", "#SOURCE#.4.SQL__PROCEDURE__SOURCE__RECORD" ) AS "RECORD" , '000001' AS "DATAPAKID" FROM "SAPHANADB"."/1BCAMDP/0BW:DAP:TR_0WB0ZSP4QW51W5L6UF79IKB20" ('PLACEHOLDER'=('$$DATUM$$.FORMULA','20240822'), 'PLACEHOLDER'=('TR_0WB0ZSP4QW51W5L6UF79IKB20.$$dtis_cross_join$$','"#DTISCROSS_ERROR" != ''X'''), 'PLACEHOLDER'=('TR_0WB0ZSP4QW51W5L6UF79IKB20.$$error_stack_semantic_join$$','"REQTSN" = ''20240822082208000034000'''), 'PLACEHOLDER'=('TR_0WB0ZSP4QW51W5L6UF79IKB20.$$filter$$','( ( "REQTSN" = ''20240822082208000034000'' ) )'), 'PLACEHOLDER'=('TR_0WB0ZSP4QW51W5L6UF79IKB20.$$i_error_handling$$','FALSE'), 'PLACEHOLDER'=('TR_0WB0ZSP4QW51W5L6UF79IKB20.$$i_req_dtp_name$$','DTP_0WB0ZSP4QW51W5L6UF79IKB20'), 'PLACEHOLDER'=('TR_0WB0ZSP4QW51W5L6UF79IKB20.$$i_req_logsys$$','ODP_HANA'), 'PLACEHOLDER'=('TR_0WB0ZSP4QW51W5L6UF79IKB20.$$i_req_requid$$','20240822082208000034000'), 'PLACEHOLDER'=('TR_0WB0ZSP4QW51W5L6UF79IKB20.$$i_req_src_name$$','VCVP_AI_SALES_SHARE           ODP_HANA'), 'PLACEHOLDER'=('TR_0WB0ZSP4QW51W5L6UF79IKB20.$$i_req_src_type$$','DTASRC'), 'PLACEHOLDER'=('TR_0WB0ZSP4QW51W5L6UF79IKB20.$$i_req_tgt_name$$','EAXBPC04'), 'PLACEHOLDER'=('TR_0WB0ZSP4QW51W5L6UF79IKB20.$$i_req_tgt_type$$','ADSO'), 'PLACEHOLDER'=('$$keydate$$','20240822'),'PLACEHOLDER' = ('ce_settings', '{"enable_partitioned_execution": "0" }')  )   ORDER BY "SALESORG", "#SOURCE#.4.SQL__PROCEDURE__SOURCE__RECORD"

 

 

BW-DTP (successful execution with other selection):

BachnerA_0-1724316228041.png

BW-DTP (time out on srciptserver):

BachnerA_1-1724318721422.png

scritpserver.log:

[17541]{349749}[9/967925937] 2024-08-22 11:22:17.000634 i TraceContext TraceContext.cpp(01367) : UserName=, ApplicationUserName=, ApplicationName=ABAP:, ApplicationSource=CL_SQL_STATEMENT==============CP:321, Client=111, StatementHash=4e5161bdd5d9c1209af3a9b2e13cf6f6, EppRootContextId=5D07CA1F0E4C4645A11D1F467F9B08F5, EppTransactionId=E04DBA5667800330E0066C67F643FB8C, EppConnectionId=2DEBC340605F11EF9CDFF8940A6ED1A2, EppConnectionCounter=14, EppComponentName=, EppAction=BIDTPR_20240822082208000034_1, StatementExecutionID=844437896413266

[17541]{349749}[9/967925937] 2024-08-22 11:22:17.000634 e AFL_PAL_COMMONFU PALObj.cpp(06096) : %%% [error: 73001066] [correlation id: 0xc03c810d100b0036] Error occurred while executing PAL function: PAL error[73001066]:Time out: the algorithm has executed with 3600 seconds.

[17541]{349749}[9/967925937] 2024-08-22 11:22:17.000674 E AFL_PAL_COMMONFU PALObj.cpp(06096) : [correlation id: 0xc03c810d100b0036] exception 1: no.73001066 (/data/jenkins_prod/workspace/all4qq0thb/s/AFL/PAL/palCommonFunction.cpp:35) TID: 17541

PAL error[73001066]:Time out: the algorithm has executed with 3600 seconds.

exception throw location:

0: 0x00007fc16348218b in AFL_PAL::TimeOutWatcher::assertTimeOut() const+0x57 at /data/jenkins_prod/workspace/all4qq0thb/s/AFL/PAL/palCommonFunction.cpp:35 (libaflpal.so)

1: 0x00007fc165216fb6 in double stan::model::log_prob_grad<true, false, AFL_PAL::pal_prophet::StanModelDecorator<prophet_model_namespace::prophet_model> >(AFL_PAL::pal_prophet::StanModelDecorator<prophet_model_namespace::prophet_model> const&, std::vector<double, std::allocator<double> >&, std::vector<int, std::allocator<int> >&, std::vector<double, std::allocator<double> >&, std::ostream*) [clone .isra.0]+0x1c2 at /data/jenkins_prod/workspace/all4qq0thb/s/AFL/PAL/TimeSeries/Prophet/PALProphet.cpp:36 (libaflpal.so)

2: 0x00007fc16525639c in stan::optimization::ModelAdaptor<AFL_PAL::pal_prophet::StanModelDecorator<prophet_model_namespace::prophet_model> >::operator()(Eigen::Matrix<double, -1, 1, 0, -1, 1> const&, double&, Eigen::Matrix<double, -1, 1, 0, -1, 1>&)+0x88 at /data/jenkins_prod/workspace/all4qq0thb/hm/c/dependencies/afl/stan-2.19.1-sap0-linuxx86_64/src/stan-2.19.1/src/stan/optimization/bfgs.hpp:354 (libaflpal.so)

3: 0x00007fc165256794 in int stan::optimization::WolfLSZoom<stan::optimization::ModelAdaptor<AFL_PAL::pal_prophet::StanModelDecorator<prophet_model_namespace::prophet_model> >, double, Eigen::Matrix<double, -1, 1, 0, -1, 1> >(double&, Eigen::Matrix<double, -1, 1, 0, -1, 1>&, double&, Eigen::Matrix<double, -1, 1, 0, -1, 1>&, stan::optimization::ModelAdaptor<AFL_PAL::pal_prophet::StanModelDecorator<prophet_model_namespace::prophet_model> >&, Eigen::Matrix<double, -1, 1, 0, -1, 1> const&, double const&, double const&, double const&, double const&, Eigen::Matrix<double, -1, 1, 0, -1, 1> const&, double, double, double, double, double, double, double const&) [clone .isra.0]+0x1b0 at /data/jenkins_prod/workspace/all4qq0thb/hm/c/dependencies/afl/stan-2.19.1-sap0-linuxx86_64/src/stan-2.19.1/src/stan/optimization/bfgs_linesearch.hpp:146 (libaflpal.so)

4: 0x00007fc165256ff9 in int stan::optimization::WolfeLineSearch<stan::optimization::ModelAdaptor<AFL_PAL::pal_prophet::StanModelDecorator<prophet_model_namespace::prophet_model> >, double, Eigen::Matrix<double, -1, 1, 0, -1, 1> >(stan::optimization::ModelAdaptor<AFL_PAL::pal_prophet::StanModelDecorator<prophet_model_namespace::prophet_model> >&, double&, Eigen::Matrix<double, -1, 1, 0, -1, 1>&, double&, Eigen::Matrix<double, -1, 1, 0, -1, 1>&, Eigen::Matrix<double, -1, 1, 0, -1, 1> const&, Eigen::Matrix<double, -1, 1, 0, -1, 1> const&, double const&, Eigen::Matrix<double, -1, 1, 0, -1, 1> const&, double const&, double const&, double const&, double const&, double const&)+0x3a5 at /data/jenkins_prod/workspace/all4qq0thb/hm/c/dependencies/afl/stan-2.19.1-sap0-linuxx86_64/src/stan-2.19.1/src/stan/optimization/bfgs_linesearch.hpp:283 (libaflpal.so)

5: 0x00007fc165257227 in stan::optimization::BFGSMinimizer<stan::optimization::ModelAdaptor<AFL_PAL::pal_prophet::StanModelDecorator<prophet_model_namespace::prophet_model> >, stan::optimization::LBFGSUpdate<double, -1>, double, -1>::step()+0x143 at /data/jenkins_prod/workspace/all4qq0thb/hm/c/dependencies/afl/stan-2.19.1-sap0-linuxx86_64/src/stan-2.19.1/src/stan/optimization/bfgs.hpp:205 (libaflpal.so)

6: 0x00007fc16525d1bc in int stan::services::optimize::lbfgs<AFL_PAL::pal_prophet::StanModelDecorator<prophet_model_namespace::prophet_model> >(AFL_PAL::pal_prophet::StanModelDecorator<prophet_model_namespace::prophet_model>&, stan::io::var_context&, unsigned int, unsigned int, double, int, double, double, double, double, double, double, int, bool, int, stan::callbacks::interrupt&, stan::callbacks::logger&, stan::callbacks::writer&, stan::callbacks::writer&) [clone .isra.0]+0x408 at /data/jenkins_prod/workspace/all4qq0thb/hm/c/dependencies/afl/stan-2.19.1-sap0-linuxx86_64/src/stan-2.19.1/src/stan/services/optimize/lbfgs.hpp:123 (libaflpal.so)

7: 0x00007fc16525f238 in void AFL_PAL::pal_stan::execute<AFL_PAL::pal_prophet::StanModelDecorator<prophet_model_namespace::prophet_model> >(AFL_PAL::pal_prophet::StanModelDecorator<prophet_model_namespace::prophet_model>&, stan::io::array_var_context&, AFL_PAL::pal_stan::stan_parameters&, AFL_PAL::pal_stan::stan_model&, std::vector<unsigned long, std::allocator<unsigned long> >&)+0x11a4 at /data/jenkins_prod/workspace/all4qq0thb/s/AFL/PAL/Utilities/StanCommon.hpp:156 (libaflpal.so)

8: 0x00007fc16521c34f in AFL_PAL::pal_prophet::PALProphet::execute()+0x261b at /data/jenkins_prod/workspace/all4qq0thb/s/AFL/PAL/Utilities/StanCommon.hpp:246 (libaflpal.so)

9: 0x00007fc16340e458 in CPAL::ADDITIVE_MODEL_ANALYSIS_ANY(ltt::smartptr_handle<TRexCommonObjects::InternalTable> const&, ltt::smartptr_handle<TRexCommonObjects::InternalTable> const&, ltt::smartptr_handle<TRexCommonObjects::InternalTable> const&, ltt::smartptr_handle<TRexCommonObjects::InternalTable>&) const+0x54 at /data/jenkins_prod/workspace/all4qq0thb/s/AFL/PAL/PALObj.cpp:6096 (libaflpal.so)

10: 0x00007fc163591e77 in ADDITIVE_MODEL_ANALYSIS_ANY__Wrapper+0x153 at AFL/PAL/PALWrapper.h:24572 (libaflpal.so)

11: 0x00007fc3116de508 in ljit::lib::AFLLibFunction::invoke(ljit::llang::LlangRuntimeContext*, void**, ljit::lib::AFLLibFunctionInfo const*)+0x24 at AFL/PackageManager/ComponentClass/AFLPM_ApplicationFunctionExecutor.hpp:118 (libhdbljitlib.so)

12: 0x00007fc213b7716c in ljit::dynamic/main-export+0x128 at afl/cen__SYS_AFL:_SYS_SS_CE_102741987_vers3_lang12_type1_[ FISCYEAR:TIMESTAMP SUM_ST:DECIMAL(18,3) ][ TS:TIMESTAMP HOLIDAY:VARCHAR(1) ][ NAME:VARCHAR(6) INT_VALUE:INT DOUBLE_VALUE:DOUBLE STRING_VALUE:VARCHAR(6) ]_140503528138752:3_TMP/3:0 (<unknown>)

13: 0x00007fc310589c68 in ljit::types::LlangManagedInvoke::invoke()+0xb4 at llvm/llangci/ljit_llangci_runtime.h:148 (libhdbljittypes1.so)

14: 0x00007fc30fdb4c88 in TrexCalculationEngine::ceLjitPop::executePopLlang(ltt_adp::vector<TrexCalculationEngine::ceDataSet*, ltt::integral_constant<bool, true> > const&, ltt_adp::vector<ltt::smartptr_handle<TRexCommonObjects::InternalTable>, ltt::integral_constant<bool, true> >&, ltt_adp::vector<ltt_adp::vector<TrexCalculationEngine::PlanViewAttribute, ltt::integral_constant<bool, true> >, ltt::integral_constant<bool, true> > const&, TRexCommonObjects::TRexApiError&, Executor::ExecutionInfo const&)+0x2724 at calculationEngine/pops/cePopCustomLjit.cpp:570 (libhdbcalcenginepops.so)

15: 0x00007fc30fdae2c3 in TrexCalculationEngine::ceLjitPop::executePopInternal(ltt_adp::vector<Executor::PlanData*, ltt::integral_constant<bool, true> > const&, ltt_adp::vector<Executor::PlanData*, ltt::integral_constant<bool, true> > const&, TRexCommonObjects::TRexApiError&, Executor::ExecutionInfo const&)+0x6e0 at calculationEngine/pops/cePopCustomLjit.cpp:136 (libhdbcalcenginepops.so)

16: 0x00007fc30fd55ac0 in TrexCalculationEngine::cePlanOperation::executePop(ltt_adp::vector<Executor::PlanData*, ltt::integral_constant<bool, true> > const&, ltt_adp::vector<Executor::PlanData*, ltt::integral_constant<bool, true> > const&, TRexCommonObjects::TRexApiError&, Executor::ExecutionInfo const&)+0x20 at calculationEngine/pops/cePopBase.cpp:1369 (libhdbcalcenginepops.so)

17: 0x00007fc3323d2d0e in Executor::X2::runPopTask(Executor::X2::PopTaskInfo&, int&, ltt::allocator&, ltt::allocator&)+0x2bea at executor/X2.cpp:2874 (libhdbexecutor.so)

18: 0x00007fc3323c8550 in Executor::X2::runPopJob(Executor::X2Job*)+0x70 at ltt/ext/hnd/releasable_handle.hpp:444 (libhdbexecutor.so)

19: 0x00007fc3323e1ba7 in Executor::X2Job::run(Execution::Context&, Execution::JobObject&)+0x423 at executor/X2.cpp:5954 (libhdbexecutor.so)

20: 0x00007fc31424ec0b in Execution::JobObjectImpl::run(Execution::JobWorker*)+0x13d7 at Basis/Execution/impl/JobExecutorRunJob.cpp:772 (libhdbbasis.so)

21: 0x00007fc31425658a in Execution::JobWorker::runJob(ltt::smartptr_handle<Execution::JobObjectForHandle>&)+0x8a6 at Basis/Execution/impl/JobExecutorThreads.cpp:379 (libhdbbasis.so)

22: 0x00007fc314258278 in Execution::JobWorker::run(Execution::ThreadRC&)+0x744 at Basis/Execution/impl/JobExecutorThreads.cpp:1382 (libhdbbasis.so)

23: 0x00007fc314298fd4 in Execution::Thread::staticMainImp(Execution::Thread*)+0x610 at Basis/Execution/impl/Thread.cpp:612 (libhdbbasis.so)

24: 0x00007fc31429f5bf in Execution::pthreadFunctionWrapper(Execution::Thread*)+0x1eb at Basis/Execution/impl/ThreadInterposition.cpp:684 (libhdbbasis.so)

25: 0x00007fc313c5d6ea in start_thread+0xd8 (libpthread.so.0)

26: 0x00007fc31358458f in __GI___clone+0x3d (libc.so.6)

exception type information:

- 0: public ltt::exception@0x00007fc313b34290

[17541]{349749}[9/967925937] 2024-08-22 11:22:17.025428 e ceExecute cePopCustomLjit.cpp(00580) : Llang Runtime Error: Exception::SQLException423: exception 73001066: PAL error[73001066]:Time out: the algorithm has executed with 3600 seconds.

 

at main (line 4) (_SYS_AFL.AFLPAL:ADDITIVE_MODEL_ANALYSIS_ANY:)

[17541]{349749}[9/967925937] 2024-08-22 11:22:17.025444 e ceExecute cePopCustomLjit.cpp(00605) : _SYS_AFL.AFLPAL:ADDITIVE_MODEL_ANALYSIS_ANY: [423] (range 3) AFL error exception: exception 73001066: PAL error[73001066]:Time out: the algorithm has executed with 3600 seconds.

 

Has anyone faced a similar issue?
Can anyone explain why this happens or has an advice how to avoid the long runtime of the function?

 

roland_szajko
Product and Topic Expert
Product and Topic Expert
0 Kudos

Is it also running to a timeout when you call the procedure directly from the database?

DO BEGIN 
lt_data = <fill your data from lt_data>; 
lt_holiday = <provide the empty lt_holiday table>; 
lt_param = <fill the table from param_tbl>;
CALL _SYS_AFL.PAL_ADDITIVE_MODEL_ANALYSIS (:lt_data, :lt_holiday, :lt_param, lt_model); 
INSERT INTO PAL_ADDITIVE_MODEL_ANALYSIS_MODEL SELECT * FROM :lt_model; 
END; 
SELECT * FROM PAL_ADDITIVE_MODEL_ANALYSIS_MODEL;
BachnerA
Discoverer
0 Kudos

Yes, also when running it directly on hanadb, I still get the timeout.

BachnerA_0-1724332275465.png

 

roland_szajko
Product and Topic Expert
Product and Topic Expert
0 Kudos
In that case, the problem apparently lies within the procedure itself and it is not related to the BW DTP/TRFN framework. I am from BW so we need to wait for someone with experience in this specific HANA procedure.
View Entire Topic
zhengwang
Associate
Associate
I'm from PAL team. Thank you for sharing your finding with us. After analyzing the issue, we have determined that the optimizer is likely stuck in local optima, preventing it from converging and exiting properly. To address this, we have implemented a new mechanism that proactively detects and intervenes in such situations, making the algorithm more robust. Initial testing on our development branch looks promising, and we aim to begin the downport process next week.
Vitaliy-R
Developer Advocate
Developer Advocate
0 Kudos
Are there any hyperparameters that would help the current algorithm not to stuck in that local optima? Thank you.