Application Development Blog Posts
Learn and share on deeper, cross technology development topics such as integration and connectivity, automation, cloud extensibility, developing at scale, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 
TomVanDoo
Active Contributor
17,416

You may have been in a situation where you have to process multiple objects. When you do this sequentially, it'll take some time before that processing finishes. One solution could be to schedule your program in background, and just let it run there.

But what if you don't have the luxury to schedule your report in background, or what if the sheer amount of objects is so large that it would take a day to process all of them, with the risk of overrunning your nightly timeframe and impacting the daily work.

Multi Threading

It would be better if you could actually just launch multiple processing blocks at the same time. Each block could then process a single object and when it finishes off, release the slot so the next block can be launched.

That could mean that you could have multiple objects updated at the same time. Imagine 10 objects being processed at once rather than just one object sequentially. You could reduce your runtime to only 10% of the original report.

It's actually not that hard. If you create a remote enabled function module, containing the processing logic for one object, with the necessary parameters, you can simply launch them in a new task. That creates a new process (you can monitor it in transaction SM50) which will end as soon as your object is processed.

Here's a piece of pseudo-code to realise this principle.


data: lt_object type whatever. "big *** table full of objects to update





while lt_object[] is not initial.


     loop at lt_object assigning <line>.



          call function ZUPDATE starting new task


               exporting <line>


               exceptions other = 9     



          if sy-subrc = 0.


               delete lt_object.


          endif.


     endloop.


endwhile.






Something like that.

Notice how there's a loop in a loop, to make sure that we keep trying until every object has been launched to a process. Once an object has been successfully launched, remove it from the list.

Queue clog

But there's a catch with that approach. As long as the processing of an individual object doesn't take up too much time, and you have enough DIAlog processes available, things will work fine. As soon as a process ends, it's freed up to take on a new task.

But what if your processes are called upon faster than they finish off? That means that within a blink of an eye, all your processes will be taken up, and new tasks will be queued. That also means that no-one can still work on your system, because all dialog processes are being hogged by your program.

* notice how the queue is still launching processes, even after your main report has already ended.

You do not want that to happen.

First time that happened to me was on my very first assignment, where I had to migrate 200K Maintenance Notifications. I brought the development system to its knees on multiple occasions.

The solution back then was double the amount of Dialog processes. One notification process finished fast enough before the main report could schedule 19 new tasks, so the system never got overloaded.

Controlled Threading

So what you want, is to control the number of threads that can be launched at any given time. You want to be able to say that only 5 processes may be used, leaving 5 more for any other operations. (That means you could even launch these mass programs during the day!)

But how do you do that?

Well, you'll have to receive the result of each task, so you can keep a counter of active threads and prevent more from being spawned as long as you don't want them to.

caller:


data: lt_object type whatever. "big *** table full of objects to update





while lt_object[] is not initial.


     loop at lt_object assigning <line>.


          call function ZUPDATE starting new task


               calling receive on end of task


               exporting <line>


               exceptions other = 9     


     


          if sy-subrc = 0.


               delete lt_object.


               add 1 to me->processes


          endif.


     endloop.


endwhile.





receiver


RECEIVE RESULTS FROM FUNCTION 'ZUPDATE'.


substract 1 from me->processes





This still just launches all processes as fat as possible with no throtling. It just keeps the counter, but we still have to do something with that counter.

And here's the trick. There's a wait statement you can use to check if the number of used processes is less than whatever you specify.

But this number is not updated after a receive, unless you logical unit of work is updated. And that is only done after a commit, or a wait statement.

But wait, we already have a wait statement, won't that update it?

Why yes, it will, but than it's updated after you waited, which is pretty daft, cause then you're still not quite sure whether it worked.

so here's a trick to get around that.

caller:


data: lt_object type whatever. "big *** table full of objects to update




while lt_object[] is not initial.


     loop at lt_object assigning <line>.




          while me->processes >= 5.


               wait until me->processes < 5.


          endwhile.




          call function ZUPDATE starting new task


               calling receive on end of task


               exporting <line>


               exceptions other = 9     




          if sy-subrc = 0.


               delete lt_object.


               add 1 to me->processes


          endif.


     endloop.


endwhile.





That'll keep the number of threads under control and still allow you to achieve massive performance improvements on mass processes!

Alternatives

Thanks to robin.vleeschhouwer for pointing out the Destination groups. Starting your RFC in a specific destination group, your system administrators can control the number of processes in that group. The downside is that it's not as flexible as using a parameter on your mass-processing report, and you have to run everything past your sysadmins.

Another sweet addition came from shai.sinai under the form of bgRFC. I have to admit that I actually never even heard of that, so there's not much I can say at this point in time. Except, skimming through the doco, it looks like something pretty nifty.

12 Comments
RobinV
Participant
0 Kudos

Hi Tom,

 

Thanks for the information.

There are different types of RFC's. You are talking about a Parallel RFC (pRFC) this is an extension of an asynchronous RFC (aRFC). An option for load balancing is to use a destination group:

 

call function 'function'

starting new task 'task'

destination in group 'group'

exporting ...

exceptions ...

 

You can setup the destination group in transaction RZ12.

 

For more information please look at the SAP help:

Asynchronous RFC

 

Best regards,

 

Robin Vleeschhouwer

 

Edit: It is also wise to use the SPBT function modules for parallel processing

Former Member
0 Kudos

Hi,

I would recommend you to check also the "newer" technique: bgRFC.

TomVanDoo
Active Contributor
0 Kudos

Right you are with the destination groups. Only problem is, they're controlled by BC, so it's a fixed number of threads.

 

The bgRFC is new to me. I'll have to take a look an play around with it a bit.

Thanks for the additions!

I'll weave them in.

former_member182550
Active Contributor

Queue clog.....

 

<chuckles>

 

Yup.  Been there - done that.  And you're sitting there thinking 'hmmm... this is taking a while' whilst around you other people are starting to smack their mouse or curse how slow the system has become and you suddenly realise that you might have something to do with that..... and you start to sweat..... a lot.....

 

Panic sets in when you go see the basis guys and they can't even log in   30 minutes later and still nothing is moving so the plug is pulled.

 

Result ??  You don't do that again AND you put a count on the number of parallel processes that can be spawned in one go.

 

Rich

raymond_giuseppi
Active Contributor
0 Kudos

Where is SPBT_INITIALIZE?

former_member182550
Active Contributor
0 Kudos

FUNCTION SPBT_INITIALIZE.

*"       IMPORTING

*"             VALUE(GROUP_NAME) LIKE  RZLLITAB-CLASSNAME

*"                             DEFAULT SPACE

*"       EXPORTING

*"             VALUE(MAX_PBT_WPS) TYPE  I

*"             VALUE(FREE_PBT_WPS) TYPE  I

*"       EXCEPTIONS

*"              INVALID_GROUP_NAME

*"              INTERNAL_ERROR

*"              PBT_ENV_ALREADY_INITIALIZED

*"              CURRENTLY_NO_RESOURCES_AVAIL

*"              NO_PBT_RESOURCES_FOUND

*"              CANT_INIT_DIFFERENT_PBT_GROUPS

  DATA: RC TYPE  I,

        LEN TYPE I,

        DO_MONITOR LIKE TRUE VALUE FALSE,

        FUNCTION_NAME LIKE TFDIR-FUNCNAME VALUE 'SPBT_INITIALIZE'.

  data: l_seed       type i,

        l_time_stamp type TIMESTAMPL.

*

* falls PBT-Umgebung noch nicht initialisiert wurde, dann den Gruppen-

* namen merken. Ansonsten -> Fehler

*

 

What version of SAP are you running ?

 

Rich

raymond_giuseppi
Active Contributor
0 Kudos

I was not looking in SAP, but in this document parallelization without an actual check of number of free processes...

oliver_wurm
Active Participant
0 Kudos

Thank you for sharing this but one thing I really don't understand. I think the following construct loops forever:

 

   while me->processes ≤ 5

     wait until me->processes < 5.

   endwhile.

 

Regards

Oliver

former_member182550
Active Contributor
0 Kudos

When you create a parallel task (ie using the STARTING NEW TASK clause)  you should also specify a procedure to perform when the task is complete - a call back if you will.  This is specified using the 'PERFORMING xxx  ON END OF TASK clause.

 

This procedure then gets the results of the parallel task back using the RECEIVE command.

 

So,  in the procedure where you start the task you check your count of tasks  If it's more than a maximum then I generally enqueue_sleep for a while.  Once the number of tasks falls below the maximum you start your task and increment the task count.

 

In the End Of Task procedure which is called asychronously apart from handling the task results you decrement the task counter.  So yes,  if you implement parallelism properly your loop will terminate because at some time your me->processes (which must be global) will fall below your maximum process count.

 

Regards

 

Rich

oliver_wurm
Active Participant
0 Kudos

This is very clear but I still think (if using a WHILE ... ENDWHILE) it must be

 

   while me->processes ≥ 5

     wait until me->processes < 5.

   endwhile.

 

The WAIT Statement itself should do the same Job ...

 

Regards

Oliver

former_member182550
Active Contributor
0 Kudos

Oliver Wurm wrote:


 


This is very clear but I still think (if using a WHILE ... ENDWHILE) it must be


 


   while me->processes ≥ 5


     wait until me->processes < 5.


   endwhile.


 


The WAIT Statement itself should do the same Job ...


 


Regards


Oliver



Sorry - I need to either a) keep my glasses on,  or b) set this font to gigantic... I see what you are saying now.... < verses > .... hummm.....

 

Rich

 

 

 


0 Kudos
I do follow your instrution.

But occur error:

"Field "ME" is unknown. It is neither in one of the specified tables nor defined by a "DATA" statement.

Please help, thank you so much.

GiangNH.
Labels in this area