cancel
Showing results for 
Search instead for 
Did you mean: 

Scheduled reports causing slowness in XIr2

Former Member
0 Kudos
69

Is it common that once a certain number of scheduled reports are running that the system slows down dramatically? Our config is:

2 W2K boxes in a cluster, each has 2 dual-cores

Tomcat/Java

Oracle

Beyond having limited the max jobs allowed in the job servers to 10 each (therefore max 20 concurrent running scheduled reports), what else can be done?

More to the point, what exactly is causing the slowness? We've monitored the CMSs and they seem fine, nowhere near maxing out memory or processing. However, the Unix box hosting the Oracle CMS is taking a major beating while these 10-15 scheduled reports are running. Does anyone know the cause? Is this situation improved any in r3.0 or r3.1?

How have you more experienced folks addressed this problem? We have 4k+ users and obviously need to beef up the architecture, but for now we're taking a beating on this.

Thanks,

DOn

Accepted Solutions (1)

Accepted Solutions (1)

Former Member
0 Kudos

Stratos, thanks for such a quick reply. We're at SP3/FP5, unless I'm mistaken. All reports are Webi, and many run in a minute or two, others longer, but none more than 15-20 minutes, I don't think. (Haven't examined them all.)

Have not monitored load on the data sources, no. My rationale there has been that if I have 100+ concurrent users all using the "main" data source, I felt the issue was not the data source since 15 more queries against it should not create an issue. I could be wrong.

The Oracle db for the respository is on a separate Unix box, yes. Oracle version for the CMS is 10g, I believe. I don't think we've generated stats for awhile, but when we did early on (when we noticed the system getting so slow during scheduled reports), my dba reported this:

"The report shows 82.15% of time spent on CPU operation. Most of the activity is taking place in memory. Most events are running well with the exception of "Execute to Parse". This can only be improved by increasing sql re-use through bind variables. You will notice the top executed query is the CMS insert into CMS_FRONTIER5. That query is taking an average of 5 seconds to parse and execute. Most queries are from CMS. Why are there so many calls to the repository? The database is in choose mode and no tables are analyzed."

Stratos, note that even while 200 users are on in Webi at one time, we do NOT see system latency like we do when we have 10 scheduled reports running at once. Also, our Webi users are 80% developers, not just consumers. Does the CMS_FRONTIER5 ring any bells? Our dba has tuned the db according to the XIr2 sizing guide.

Again, many thanks!

0 Kudos

Hi Don,

could be facing a cluster synchronisation problem here. But first I would recommend to generate the statistics for your DB repository.

How many job servers do you start on each cluster node? Can you consider adding some more, even if this does not comply with the sizing guide?

Check also in the logging directory (<BOBJ installation root>\BusinessObjects Enteprprise 11.5\logging) to see if your job/report servers generate any error messages.

Which is your target format for the scheduled reports? Are you using other destinations like email, unmanaged discs etc?

Regards,

Stratos

0 Kudos

Hi Don,

in order to avoid system slow down for interactive users, I would recommend to use also server groups in order to separate the "on-demand" processing servers from the one used for scheduled reports.

Regards,

Stratos

Former Member
0 Kudos

Stratos,

We have created server groups, yes, but only for large reports. I think at the moment we're underpowered with our two servers and 4000 users, usually around 150-200 concurrent. Even if we created another server group for scheduled reports only, I'm not sure how that would help, as everyone is sharing the same resources and the bottleneck APPEARS to be the massive number of writes to the CMS, which I don't really understand.

-


Timestamp ProcessID ThreadID Message

[Sun May 03 16:22:07 2009] 2948 2960 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:ctResourceBundle:Message not found for id: : 344 [ResourceBundle.cpp;310]).

[Sun May 03 16:22:07 2009] 2948 2960 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:ctResourceBundle:Message not found for id: : 344 [ResourceBundle.cpp;310]).

[Sun May 03 18:42:07 2009] 2948 4088 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:ctResourceBundle:Message not found for id: : 344 [ResourceBundle.cpp;310]).

[Sun May 03 18:42:07 2009] 2948 4088 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:ctResourceBundle:Message not found for id: : 344 [ResourceBundle.cpp;310]).

[Sun May 03 18:45:48 2009] 2948 4296 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:ctResourceBundle:Message not found for id: : 344 [ResourceBundle.cpp;310]).

[Sun May 03 18:45:48 2009] 2948 4296 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:ctResourceBundle:Message not found for id: : 344 [ResourceBundle.cpp;310]).

[Sun May 03 19:02:46 2009] 2948 2728 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:ctResourceBundle:Message not found for id: : 344 [ResourceBundle.cpp;310]).

[Sun May 03 19:02:46 2009] 2948 2728 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:ctResourceBundle:Message not found for id: : 344 [ResourceBundle.cpp;310]).

[Sun May 03 20:32:43 2009] 2948 2472 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:ctResourceBundle:Message not found for id: : 344 [ResourceBundle.cpp;310]).

[Sun May 03 20:32:43 2009] 2948 2472 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:ctResourceBundle:Message not found for id: : 344 [ResourceBundle.cpp;310]).

[Sun May 03 20:45:15 2009] 2948 2844 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:DocExpressStorage:File not found in storage : D:\Program Files\Business Objects\BusinessObjects Enterprise 11.5\Data\ROCKA0ACDCBOXI2\ROCKA0ACDCBOXI2.Web_IntelligenceReportServer(2)\sessions\_AQgrR5mDcftNo.TZTEHvoTs\ [kdgDocExpressStorage.cpp;360]).

[Sun May 03 20:45:15 2009] 2948 2844 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:DocExpressStorage:m_strDomain : DX_ [kdgDocExpressStorage.cpp;361]).

[Sun May 03 20:45:15 2009] 2948 2844 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:DocExpressStorage:m_strFile : 1938443.wid [kdgDocExpressStorage.cpp;362]).

[Sun May 03 20:45:15 2009] 2948 2844 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:DocExpressStorage:l_strPath : [kdgDocExpressStorage.cpp;363]).

[Sun May 03 20:45:15 2009] 2948 2844 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:DocExpressStorage:l_strIndex : [kdgDocExpressStorage.cpp;364]).

[Sun May 03 20:45:15 2009] 2948 2844 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:DocExpressStorage:l_strFileName : [kdgDocExpressStorage.cpp;365]).

[Sun May 03 20:45:15 2009] 2948 2844 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:DocExpressStorage:l_strTmpName : [kdgDocExpressStorage.cpp;366]).

[Sun May 03 20:45:15 2009] 2948 2844 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:DocExpressStorage:l_strRepFileNoExt : [kdgDocExpressStorage.cpp;367]).

[Sun May 03 20:45:15 2009] 2948 2844 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:DocExpressStorage:l_strTmpFileNoExt : [kdgDocExpressStorage.cpp;368]).

[Sun May 03 20:45:15 2009] 2948 2844 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:dg_storeCE:OpenState failed : D:\Program Files\Business Objects\BusinessObjects Enterprise 11.5\Data\ROCKA0ACDCBOXI2\ROCKA0ACDCBOXI2.Web_IntelligenceReportServer(2)\sessions\_AQgrR5mDcftNo.TZTEHvoTs\rdx1_2948.tmp [kdgstoreCE.cpp;572]).

[Sun May 03 20:45:15 2009] 2948 2844 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:dg_storeCE:ZipStgCreateDocfile : 2 [kdgstoreCE.cpp;579]).

[Sun May 03 20:45:15 2009] 2948 2844 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:dg_storeCE:OpenStateFromFile failed : D:\Program Files\Business Objects\BusinessObjects Enterprise 11.5\Data\ROCKA0ACDCBOXI2\ROCKA0ACDCBOXI2.Web_IntelligenceReportServer(2)\sessions\_AQgrR5mDcftNo.TZTEHvoTs\rdx1_2948.tmp [kdgstoreCE.cpp;432]).

[Sun May 03 20:45:33 2009] 2948 4900 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:DocExpressStorage:File not found in storage : D:\Program Files\Business Objects\BusinessObjects Enterprise 11.5\Data\ROCKA0ACDCBOXI2\ROCKA0ACDCBOXI2.Web_IntelligenceReportServer(2)\sessions\_AQgrR5mDcftNo.TZTEHvoTs\ [kdgDocExpressStorage.cpp;360]).

[Sun May 03 20:45:33 2009] 2948 4900 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:DocExpressStorage:m_strDomain : DX_ [kdgDocExpressStorage.cpp;361]).

[Sun May 03 20:45:33 2009] 2948 4900 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:DocExpressStorage:m_strFile : 1938443.wid [kdgDocExpressStorage.cpp;362]).

[Sun May 03 20:45:33 2009] 2948 4900 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:DocExpressStorage:l_strPath : [kdgDocExpressStorage.cpp;363]).

[Sun May 03 20:45:33 2009] 2948 4900 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:DocExpressStorage:l_strIndex : [kdgDocExpressStorage.cpp;364]).

[Sun May 03 20:45:33 2009] 2948 4900 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:DocExpressStorage:l_strFileName : [kdgDocExpressStorage.cpp;365]).

[Sun May 03 20:45:33 2009] 2948 4900 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:DocExpressStorage:l_strTmpName : [kdgDocExpressStorage.cpp;366]).

[Sun May 03 20:45:33 2009] 2948 4900 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:DocExpressStorage:l_strRepFileNoExt : [kdgDocExpressStorage.cpp;367]).

[Sun May 03 20:45:33 2009] 2948 4900 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:DocExpressStorage:l_strTmpFileNoExt : [kdgDocExpressStorage.cpp;368]).

[Sun May 03 20:45:33 2009] 2948 4900 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:dg_storeCE:OpenState failed : D:\Program Files\Business Objects\BusinessObjects Enterprise 11.5\Data\ROCKA0ACDCBOXI2\ROCKA0ACDCBOXI2.Web_IntelligenceReportServer(2)\sessions\_AQgrR5mDcftNo.TZTEHvoTs\rdx2_2948.tmp [kdgstoreCE.cpp;572]).

[Sun May 03 20:45:33 2009] 2948 4900 assert failure: (.\TraceLog.cpp:1644). (false : TraceLog: **ASSERT:dg_storeCE:ZipStgCreateDocfile : 2 [kdgstoreCE.cpp;579]).

-


Currently each box has one job server configured, and each box has 4 Webi report servers. We don't run Deski. Destinations are either email or user inbox, no disk or FTP.

Regards,

Don

Edited by: Don Davis on May 4, 2009 8:19 AM

Former Member
0 Kudos

Stratos, yes, we have set up server groups, right now just for people who are scheduling large reports that kick out extracts >10mb, not really just for scheduled reports. Would it make a difference for our configuration? (2 notes, 1 Webi job server per box, 4 report servers per box.) It seems that the bottleneck is the CMS getting hammered, since there appears to be no other load on the box in terms of CPU or memory usage. I'll get with the dba.

How do I determine if there is a cluster synchronisation issue?

Destinations are email or user inbox only, no ftp or unmanaged disk.

Also, yes, lots of "assert failures" for WebiReport logs on both machines. Any ideas? I can include here if that would help.

Don

0 Kudos

Hi Don,

in order to check if this is a cluster sync problem, then you must shutdown one node and see how the other node behaves. In fact I would recommend to do this:

1) check if there are any time differences between the system clocks of the two servers.

2) Shutdown on of your BOBJ servers and see how the system behaves.

I know that Step 2 may be difficult to do on a production system but it may be worth it.

Another option is to increase the number of job servers in order to overcome potential contention problems at this level.

Talking about contention problems I have another idea: Have you checked what your email server (SMTP) is doing? From what I heard modern spam filters simulate bottlenecks in order to discourage senders of mass mails. Just ask your postmaster. Maybe this is also a bottleneck here.

Regards,

Stratos

Former Member
0 Kudos

Stratos, just checked, the nodes had a time difference of 2.5 minutes between them. Trying to have the admins fix that now.

Can't shut down either of the boxes at the moment, but I'll keep that in mind next scheduled maintenance day.

I'll check the SMTP server also.

Can you elaborate why you think another job server per node might help? Do you think they bottleneck may be caused by each job server being overwhelmed?

Currently we have one job server per node, and each job server allows 10 jobs. Total of 20 concurrent scheduled reports. I think the lowest setting for a report job server is 5.

Which is more efficient?

1 report job server per node that allows 10 concurrent jobs

or

2 report job servers per node, each allowing 5 concurrent jobs?

I'm not sure why or how it could matter, except that I do know we don't see a load in terms of CPU or memory on either node, even when there are 15-20 scheduled reports running and the system is terribly slow.

Again, many thanks!

Don

0 Kudos

Hi Don,

the servers should be synchronized. The deviation of 2.5 minutes can cause cluster sync problems. Could be the reason why the system is slow.

"No load" does not mean "no contention". Especially if contention takes place outside the BOBJ system (see Email server or Oracle).

Theoretically 1 job server should be enough. Still it is worthy to try if contention is taking place within one process or withtin the BO cluster. If starting more that 1 job servers does help then the problem lies in the architecture of the job server itself.

Regards,

Stratos

Former Member
0 Kudos

Stratos, ok, let me get the time issue fixed, check the db and SMTP servers, and see what's what. Back to you in a few days, and thanks a million for your assistance and insight!

Don

Answers (1)

Answers (1)

0 Kudos

Hi Don,

what is your current Service Pack / Fix Pack level. What kind of reports are you talking here?

Have you monitored the load caused by the scheduled reports on your data sources? Is the Oracle DB for your CMS repository running on a separate machine? Which verison of Oracle DB are you using there? Have you tried to generate statistics (Oracle-side) on the Oracle user used for the DB repository?

Are you talking about report bursting (1 report is scheduled but this is distributed to N users) here?

Regards,

Stratos