In this document I would like to share some experiences and basic steps of troubleshooting intermittently appearing issues on such BI platforms which are having more than one processing servers or nodes configured. I will focus on the most often used report types such as WebIntelligence, Crystal Reports and Dashboards and only for on demand refresh errors.
Introduction – the scalability
SAP BusinessObjects Business Intelligence Platform services can be vertically scaled – using the advantage of the multi core CPU-s on the same machine - to take full advantage of the hardware they are running on, and can be horizontally scaled to take advantage of multiple server machines over a network environment.
For example, you can run several processing on the same machine (vertical scaling) or you can also run several processing servers on separate machines (horizontal scaling).
In these well sized or properly scaled environments are commonly used as production environments, where restarting or stopping BI Platform services cannot be restarted based on ad-hoc requests, can be done in the maintenance time slots. Troubleshooting of intermittent errors is difficult, since one of the most common used settings for reports to use the first available server on demand requested for report processing.
The 3i steps for troubleshooting
1. Identification – the environment details and proper processing workflow
As a best practice to execute or generate a System Inspection (SI) Report from the Platform Support Tool. The SI report gives a high level overview of the BI Platform and collects information about the BI landscape such as server settings, command line arguments, memory settings, and performance metrics. (The tool and more information can be found here: http://wiki.scn.sap.com/wiki/display/BOBJ/SAP+BI+Platform+Support+Tool)
The proper workflow is mandatory and needs to be identified, which will be executed on the BI Platform while viewing or refreshing the report. In the following section I have collected the available workflows by reporting application types (the complete list of workflows can be found at http://scn.sap.com/docs/DOC-8292)
View a dashboard when the query result is in the cache process flow
View a dashboard when the query result is not in the cache process flow
Since the proper workflow identified, the next step is try to find which node or processing services and servers failing. When the landscape contains several nodes, and there are several processing servers working to complete the on-demand requests raised by the business users it is hard to find the server which failing in the processing or having the incorrect / inconsistent configuration.
In the Business Intelligence Platform, for a report a dedicated resource group can be assigned for on-demand and scheduled processing. Out of the box setting is, when a report is executed on the platform the system is turns to the first available resource (server) for processing the report.
When a report execution is dedicated to a specific server group, which contains a set of processing servers, than we speak about report execution is isolated, since we are exactly know which server is takes in place at processing on which node.
Please follow one of these articles for find the settings and steps for report isolation:
To do the investigation the best bet is to change the servers in the isolation group until the error occurs constantly. When the specific node or processing services and servers has been found where the issue or observed behavoiur can be always reproduced, than the services can be traced by individually by setting the trace level to high in the properties or may using the End-to-End (E2E) trace process of the SAP BI Support tool.
For closure, i think to create server groups is a good and easy way to start troubleshooting intermittently appearing issues with reports in BI platforms 4.x. With the these 3 steps the report can quickly isolated in production environments, and the issues can be localized.