cancel
Showing results for 
Search instead for 
Did you mean: 

Data Services Job server starts job multiple times

strigas1
Explorer
3,156

Dear all,

we are experiencing a very weird issue:

We are on DS 4.2 SP10

Jobs are scheduled with the Data Service Scheduler

Sometimes (can be different jobs) the jobs start multiple times (2 times or more; sometimes within seconds, sometimes within minutes) it does not happen every day and in the Task Scheduler there is only one task per schedule.

The jobs sometimes fail or are successful. Problem is if a job is successful and ran 3 times we get tripled lines which is causing a huge issue.

Here is a screenshot of a job that ran tiwce times.

Have any one of you experienced something similar?

Would really appreciate some help as this is causing a lot of pain

Kind Regards

Chris

0 Kudos

Hi Christopher,

We have faced the same issue of double triggering of Batch Jobs in Prod, as well as in Test.

Reason:

During PROD/TEST Source systems downtime, we have unscheduled the jobs and rescheduled for the Next job run. This has created duplicate job entries in DSP/DST Server Task Scheduler.

Tasks to perform:

1. Delete duplicate job entries from the DSP/DST server task scheduler.

2. After the downtime, make sure to activate the job instead of rescheduling.

NOTE:

If still the jobs are triggering in duplicate then we might need to follow the SAP Note: 1250124, where we need to delete the complete schedules and .bat files for specific jobs and recreate those jobs again.

strigas1
Explorer

Hi Sravanth,

the job entries only exist once in the task scheduler.

The issue does not occur every day, sometimtes the job runs 2 times, sometimes 1 time sometimes 3 or 4 times.

I have deactivated and deleted, recreated jobs many times and it never solved the issue.

Kind Regards

Chris

Accepted Solutions (0)

Answers (8)

Answers (8)

0 Kudos

Wenjie, If you go to the properties of the job in designer you can select the single instance option which will only allow one instance to run. This will stop two instances from running at the same time, but will not fix the AL_RWJoblauncher from starting the job multiple times.

I've recently ran into issues where the history cleanup that is done by EIM Adaptive Processing services cause locking on the repository database on MS SQL Server which ends up causing a job submission request to timeout. Once the lock is released the job actually starts running even though submission request has timeout. If the job submission is retired after the first timeout you will now see two job running on the job server after the database lock is released.

Try turning off history cleanup and see if your multiple submission issue goes away.

Thx,

Adam

former_member664034
Discoverer
0 Kudos

Hi Jim,

May I ask what is the "RWJobLaunch" configuration you updated? Is it this below:

[int]

........

RWJobLaunchSemaphoreNumber = 4

Thanks

Wenjie Wu

former_member664034
Discoverer
0 Kudos

Hi Jim,

I was at a similar version as yours, we were on 4.2 SP08, now we are at 4.2SP12. This is where we saw the schedule duplication(run twice). I am talking with SAP still on potential solutions, based on my testing: if I create a brand new repo and migrate the jobs there, the issue does not happen yet.(maybe it will come out later....?)

Another work around to avoid the duplicate run is to add the below code into a script at the beginning of the job, you may also try it.

*****************************

$L_Count = sql( 'REPO_DATASTORE_NAME', 'select count(0) from ( select DENSE_RANK() OVER(ORDER BY START_TIME desc) as "rnk", STATUS from AL_HISTORY where service = \'JOB_NAME\' ) where "rnk" = 2 AND STATUS = \'S\' '); if ($L_Count <> 0) begin raise_exception( 'Earlier instance of this job is still executing'); end

*******************************

Thanks

Wenjie

former_member213365
Active Participant
0 Kudos

I gave up on finding a solution within the Data Services configuration. There was a AW Job Launcher parameter in the DSConfig.txt file that I tried and that didn't work either.

The "solution" is that the job now checks to see if a prior instance is already running. I encapsulated the logic in a custom function. If the function returns a non-null value then another instance of the job is already running.

Variable declarations:

$LV_Run_ID - varchar(20)

$LV_Job_Name - varchar(256) based on the size of the column in al_history

$LV_Prior_Run_ID - varchar(20)

This could be done with no variables but it helps to have them in case you need to debug.

Create a Datastore that points to the local repository that the job will be running out of. Change the Datastore name used in the sql() call below to the name of your Datastore. Pasting the text into the SAP site resulted in mangled formatting. You'll have to adjust it yourself or get the file from the attachment.get-job-prior-instance.txt
# ############################################################################# Function: Get_Job_Prior_Instance # Date : 03/16/2020 # Author : Jim Egan (ProKarma, Inc.) # Purpose : Find the run ID of a running instance of the same job # : that was started before this instance. # # Modifications # Date : # Author : # Purpose : # ############################################################################# # Get metadata for this instance of the job $LV_Run_ID = job_run_id(); $LV_Job_Name = job_name(); # Get the run ID of the same name of the job that is running (status of S or SR) and has been started within the past 30 seconds. # If there is no other job that started before this instance then the return value will be NULL. # Status S is for jobs started normally, SR is for jobs started with the "Enable Recovery" option. $LV_Prior_Run_ID = sql('Runtime_Repo', 'SELECT MAX(OBJECT_KEY) FROM AL_HISTORY WHERE OBJECT_KEY < [$LV_Run_ID] AND SERVICE = {$LV_Job_Name} AND STATUS IN (\'S\',\'SR\') AND START_TIME >= (SYSDATE - (30 / 60 / 60 / 24))');

return $LV_Prior_Run_ID;

I call the above function in a Conditional transform. If the return is null then the job continues with the normal flow located within the TRUE branch of the conditional. If the return if not null then the FALSE branch of the conditional is executed and I print a message to the log saying that another instance is already running, then the job ends without throwing an exception.

former_member213365
Active Participant
0 Kudos

Production server is DS 4.2 SP13 on Windows Server 2016. This was a new install performed the prior week to move up from an existing production system that was on 4.2 SP08. The new environment uses the same source and target databases as well as the same repository database as the old environment. Besides the difference in DS versions, the only other difference is that the old environment used Windows Server 2012 R2. This is a single job server, it is not part of a server group.

Any time a job is initiated from a scheduled job in the Management Console, it runs twice. These jobs are triggered through the Windows Task Scheduler at 12:10 am.

If I manually start the job through the Management Console it runs only once.

If I manually start the job through the Windows Task Scheduler (using the existing task that was created by the Management Console) it runs only once.

I used the Export Execution Command to create a .bat file and then scheduled execution of the .bat through the Windows Task Scheduler. The job runs twice.

I've tried deleting the schedule and recreating it. That didn't help. I verified the entry in AL_SCHED_INFO and AL_MACHINE_INFO looks good too.

The prior version (SP08) would have this problem once a year at most. The new install has the problem EVERY DAY.

former_member664034
Discoverer
0 Kudos

Hi Chris, Jessica,

Thanks for create this log on the BODS Scheduling issue. Is there any solution for the issue? I have faced the same issue but haven't found the root cause yet.

1. When deactivate/reactivate the jobs, the issue disappeared for a day, then the issue started to happen at random time on random jobs. Mostly it happened from midnight to 5am when most of our daily jobs started - more resources(CPU/Memory) usage during this time.

2. I confirmed there is no duplicated entries in Windows Task Scheduler or in the .bat/.txt file location.

If you have found the solution. Could you please share with me? Thank you very much!

Email:wenjiewu@hotmail.ca

Thanks

Wenjie Wu

former_member466847
Active Participant
0 Kudos

Chris,
Based on that error, the job server is failing to connect during that request handshake. If the job schedule is failed to trigger the AL_RWJoblauncher will attempt to launch the job again, which may explain the duplicate entries.

If you would like further troubleshooting, please create a ticket and provide the following logs
- AL_RWJoblauncher.log found at <InsallDir>\logs\

- Server_event.log found at <InsallDir>\logs\<job server>\server_eventlog_<date>.txt

- ATL and Execution logs. <InsallDir>\logs\<job server>\repository\


If you are utilizing a server group please review KBA 2477561 - How to troubleshoot SAP Data Services Server Group

former_member466847
Active Participant
0 Kudos

Good Morning,
Thank you for providing the Data Services version and the screenshot of the multiple executions.
Can you please review the following KBA's for assistance in resolving your multiple executions.
https://launchpad.support.sap.com/#/notes/2132053

https://launchpad.support.sap.com/#/notes/1250124

https://launchpad.support.sap.com/#/notes/1572605

strigas1
Explorer
0 Kudos
0

Hi Jessica,

thank you for those, but they did not help to solve my issue. What I have found out form the log files is that whenever there is a multiplication of a job the log records this line:

JobServer: Error <RWSecureSocketError: in RWSecureSocket::send: SYSTEM_ERROR> while processing request <2> from client <172.16.53.10>. (BODI-850260)

Any ideas on this?

KR Chris