This is part 2 of a blog in 3 parts (at the moment) on how IdM manages queue processing and the audits and logs created during processing
Part 1: Tables, views and processing logic: http://scn.sap.com/community/netweaver-idm/blog/2014/02/05/on-queue-processing-or-the-lack-thereof
Part 2: Viewing the current queues: http://scn.sap.com/community/netweaver-idm/blog/2014/02/10/on-queue-processing-or-the-lack-thereof-p...
Part 3: Post execution audits and logs: http://scn.sap.com/community/netweaver-idm/blog/2014/02/21/on-queue-processing-what-just-happened-pa...
Feedback is most welcome, and additions as well as corrections can be expected.
Edit 20140224: Oracle versions of queries added.
One of the most important things to do in case of a productive stand-still or issue is to get an overview of what's in the different queues.
Link evaluations, approvals and workflows have separate queues and processing of them is done by different threads in the dispatcher(s).
Jobs are simply set to state=1 and scheduletime < now in the mc_jobs table.
As mentioned above, jobs do not really have a queue. They are scheduled to run by having scheduletime set and state set to 1. The dispatcher will start runtime(s) to process jobs if the mc_dispatcher_check procedure returns 1 or more standard jobs to run. The java runtime will use the procedure mc_getx_job to reserve a job from the available jobs. Once running the state in mc_jobs changes to 2.
Just to clarify, a job sits outside the Identity Store(s) in a job folder and usually works with bulk processing and contains 1 or many passes. Actions are inside the workflow of an Identity Store and can only contain 1 pass and process 1 entry at a time. To slightly confuse the matter, the configuration of an action task is a job, in the mc_jobs table, and the logs it creates are stored in the mc_logs table. There's a link between the task in mxp_tasks to mc_jobs on mxp_tasks.jobguid = mc_jobs.jobgui.
With this knowledge a query listing jobs and provisioning actions that are running can look like this:
SQL Server:
select name,case when provision=1 then 'Action' else 'Job' end type, CurrentEntry, Current_Machine from mc_jobs with(nolock) where state = 2
Oracle:
select name,case when provision=1 then 'Action' else 'Job' end type, CurrentEntry, Current_Machine from mc_jobs where state = 2;
This produces output like this:
Note that the CurrentEntry column in mc_jobs is updated every 100 entry, or every 30 seconds by the runtimes.
The provisioning queue is based on the table mxp_provision. To process parts of the queue a dispatcher must first set a semaphore that indicates that other dispatchers should keep away from processing the same type of task. This is done by setting a semaphore (basically its own Id as owner along with a timestamp) in the mc_semaphore table. The timestamp is updated as the thread is processing entries, and a semaphore whose timestamp is older than 300 seconds is considered dead. This means that if you have conditional statements taking a very long time to run so that the dispatcher thread is not able to update the timestamp within 300 seconds the semaphore is released and another dispatcher will start processing conditional statements as well. That means trouble because the two threads risk running the same conditional mskey,action,audit combination!
The provisioning queue is divided into views according to the threads in the dispatcher: mxpv_grouptasks_ordered/mxpv_grouptasks_unordered/mxpv_grouptasks_conditional/mxpv_grouptasks_switch/mxpv_grouptasks_approval/
mxpv_grouptasks_attestation
These views will at most contain 1000 entries from a Top 1000 limiter. As mentioned in part #1, actions that are to be processed by runtime engines are picked up by a procedure and has no view.
This queue contains assignments that need to be evaluated. Any mxi_link entry with mcCheckLink < now is in this queue. This includes role/privilege assignments and entry references such as manager.
The dispatcher processes this from the view mxpv_links. This view will contain 0 entries in normal situations, up to 1000 under load. To get the real number of links that need evaluations you can run:
SQL Server:
SELECT count(mcUniqueId) FROM mxi_link with(NOLOCK) WHERE (mcCheckLink < getdate()) AND (mcLinkState IN (0,1))
Oracle:
SELECT count(mcUniqueId) FROM mxi_link WHERE (mcCheckLink < sysdate) AND (mcLinkState IN (0,1))
To see if a specific user has privileges that are queued for evaluation, or if a privilege has entries where it's state is still to be evaluated:
SQL Server:
-- Assignments to evaluate for 'User Tony Zarlenga'
SELECT count(mcUniqueId) FROM mxi_link with(NOLOCK) WHERE (mcCheckLink < getdate()) AND (mcLinkState IN (0,1)) and
mcThisMskey in (select mcmskey from idmv_entry_simple where mcMskeyValue = 'User Tony Zarlenga')
-- User assignments to evaluate for privilege 'PRIV.WITH.APPROVAL'
SELECT count(mcUniqueId) FROM mxi_link with(NOLOCK) WHERE (mcCheckLink < getdate()) AND (mcLinkState IN (0,1)) and
mcOtherMskey in (select mcmskey from idmv_entry_simple where mcMskeyValue = 'PRIV.WITH.APPROVAL')
Oracle:
-- Assignments to evaluate for 'User Tony Zarlenga'
SELECT count(mcUniqueId) FROM mxi_link WHERE (mcCheckLink <sysdate) AND (mcLinkState IN (0,1)) and
mcThisMskey in (select mcmskey from idmv_entry_simple where mcMskeyValue = 'User Tony Zarlenga')
-- User assignments to evaluate for privilege 'PRIV.WITH.APPROVAL'
SELECT count(mcUniqueId) FROM mxi_link WHERE (mcCheckLink < sysdate) AND (mcLinkState IN (0,1)) and
mcOtherMskey in (select mcmskey from idmv_entry_simple where mcMskeyValue = 'PRIV.WITH.APPROVAL')
Runtime actions are listed in the provisioning queue with actiontype=0. Combined with state=2 (ready to run) and exectime < now the entry is ready to be processed by a runtime. A very basic query listing number of entries ready for processing by different actions is:
SQL Server:
select count(P.mskey) numEntries,P.actionid, t.taskname from mxp_provision P with(NOLOCK), mxp_tasks T with(NOLOCK)
where P.ActionType=0 and T.taskid = P.ActionID
group by p.ActionID,t.taskname
Oracle:
select count(P.mskey) numEntries,P.actionid, t.taskname from mxp_provision P , mxp_tasks T
where P.ActionType=0 and T.taskid = P.ActionID
group by p.ActionID,t.taskname
Unless you have a lot of actions with delay before start configured actions will usually have an exectime in the past. This query will create a simple result showing the entries that can be processed by runtimes:
In most cases this is only part of the full picture. You really want to know if a runtime is actually working on those entries as well. Let's add mc_jobs and mc_job_state to the query to get a bit more detail:
SQL Server:
select count(P.mskey) numEntries,P.actionid, t.taskname,js.name as jobState
from mxp_provision P with(NOLOCK)
inner join mxp_tasks T with(NOLOCK) on T.taskid = P.ActionID
left outer join mc_jobs J with(NOLOCK) on J.JobGuid = T.JobGuid
left outer join mc_job_state JS with(NOLOCK) on j.State = JS.State
where P.ActionType=0 and P.state=2 and T.taskid = P.ActionID
group by p.ActionID,t.taskname,js.name
Oracle:
select count(P.mskey) numEntries,P.actionid, t.taskname,js.name as jobState
from mxp_provision P
inner join mxp_tasks T on T.taskid = P.ActionID
left outer join mc_jobs J on J.JobGuid = T.JobGuid
left outer join mc_job_state JS on j.State = JS.State
where P.ActionType=0 and P.state=2 and T.taskid = P.ActionID
group by p.ActionID,t.taskname,js.name
The current reason for my system not processing anything is getting clearer:
No actions are running so something is blocking or stopping the runtimes from starting and I know to look at the dispatcher. Since I've manually stopped it it's no big surprise and troubleshooting is simple.
If you think that not enough runtimes are being started and see situations like this:
You should look at item 5 in the checklist below and also have a look at the properties and policy of the dispatcher
Max rt engines to start determines how many runtimes a dispatcher starts when it finds X actions ready to run in the queue. In this case, even if 100 are ready to run it will only start 1 in this check interval (see the picture to the right). Max concurrent rt engines controls how many runtimes a dispatcher will have active at the same time. 1 is always reserved for the Windows Runtime though. So my system is now limited to a single active java runtime at any time. Max loops for rt engine is also a very useful setting. Starting a java runtime processes and loading all the classes can often take a second or three and in a low-load scenario this can be the slowest operation in the system. This setting tells the runtime that once it's done with an action/job it should enter a small loop with a 1 second delay to check for additional actions that are available. This also increases performance as it is independent of the check interval(see two pictures below) | |
Also notice the global setting for max concurrent rt engines. If you have 3 dispatchers that can run 50 simultaneous runtimes each you can still limit the total active runtime count to 100 for instance. | |
The check interval controls how frequently the dispatcher connects to the database to check for available tasks, actions and jobs. A general recommendation is to increase this in systems with multiple dispatchers so that the average interval is around 5 seconds. So when running 2 dispatcher, check interval is 10 on both, with 4 dispatcher interval is 20 and so on. |
By increasing the Max concurrent rt engines setting to 10, and max rt engines to start to 3 the situation is quickly changed to the point where it's difficult to create a screenshot:
A quick checklist for troubleshooting:
I would usually start off with the following query that lists the number of entries per task, per state and including the state of the linked Job for action tasks.
SQL Server:elect
count(P.mskey) numEntries,t.taskid,t.taskname,A.Name ActionType,s.name StateName, ISNULL(JS.Name,'not an action task') JobState
from
mxp_provision P with(nolock)
inner join mxp_Tasks T with(nolock) on T.taskid = P.actionid
inner join mxp_state S with(nolock) on S.StatID = P.state
inner join MXP_ActionType A with(nolock) on A.ActType=P.ActionType
left outer join mc_jobs J with(nolock) on J.JobGuid = T.JobGuid
left outer join mc_job_state JS with(nolock) on j.State = JS.State
group by
t.taskid,T.taskname, A.name, S.name, JS.Name
order by
A.name, S.name
Oracle:
select
count(P.mskey) numEntries,t.taskid,t.taskname,A.Name ActionType,s.name StateName, NVL(JS.Name,'not an action task') JobState
from
mxp_provision P
inner join mxp_Tasks T on T.taskid = P.actionid
inner join mxp_state S on S.StatID = P.state
inner join MXP_ActionType A on A.ActType=P.ActionType
left outer join mc_jobs J on J.JobGuid = T.JobGuid
left outer join mc_job_state JS on j.State = JS.State
group by
t.taskid,T.taskname, A.name, S.name, JS.Name
order by
A.name, S.name
I've started my dispatcher test task described in Part #1 for 1000 entries The query above gives me a result like this during the processing
(click to enlarge) | A quick explanation of some of the type/state combinations and what would process them Action Task/Ready To Run: Action that is ready to be processed by a runtime + JobStatus: The state of the job linked to the Action Task. If it's Idle it means a runtime has not picked this up yet. Conditional, Switch and (un)Ordered Tasks are processed by dispatchers that have a policy that allows Handle Tasks. Ready to run for conditional or switch task means i'ts ready for evaluation Ready to run for Ordered/Unorderd task means the workflow can be expanded into the queue Expanded OK means the workflow at this level is expanded Waiting generally means that its waiting for a sub-process or child event to finish |
Since the mc_jobs table contains a column named CurrentEntry we can also see how many entries running actions have processed using:
SQL Server:
select
count(P.mskey) numEntries,t.taskid,t.taskname,A.Name ActionType,s.name StateName,
case when js.name = 'Running' then 'Running, processed:'+cast(ISNULL(J.CurrentEntry,0) as varchar) else js.name end state
from
mxp_provision P with(nolock)
inner join mxp_Tasks T with(nolock) on T.taskid = P.actionid
inner join mxp_state S with(nolock) on S.StatID = P.state
inner join MXP_ActionType A with(nolock) on A.ActType=P.ActionType
left outer join mc_jobs J with(nolock) on J.JobGuid = T.JobGuid
left outer join mc_job_state JS with(nolock) on j.State = JS.State
group by
t.taskid,T.taskname, A.name, S.name, case when js.name = 'Running' then 'Running, processed:'+cast(ISNULL(J.CurrentEntry,0) as varchar) else js.name end
order by
A.name, S.name
Oracle:
select
count(P.mskey) numEntries,t.taskid,t.taskname,A.Name ActionType,s.name StateName,
case when js.name = 'Running' then 'Running, processed:'||to_char(NVL(J.CurrentEntry,0)) else js.name end as Jstate
from
mxp_provision P
inner join mxp_Tasks T on T.taskid = P.actionid
inner join mxp_state S on S.StatID = P.state
inner join MXP_ActionType A on A.ActType=P.ActionType
left outer join mc_jobs J on J.JobGuid = T.JobGuid
left outer join mc_job_state JS on j.State = JS.State
group by
t.taskid,T.taskname, A.name, S.name,case when js.name = 'Running' then 'Running, processed:'||to_char(NVL(J.CurrentEntry,0)) else js.name end
order by
A.name, S.name
The result is quite useful as it's now possible to see how many entries the actions that are running have processed so far (click to see):
This will have to do for revision one. If I get time to add more this week I will, but there is patches and SPs to work on as well.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
13 | |
10 | |
7 | |
7 | |
7 | |
6 | |
6 | |
5 | |
5 | |
5 |