Introduction:
SAP DataSphere enables you to store to perform sophisticated analysis on large volumes of intricate data. These complex scenarios can sometimes lead to less-than-ideal performance times for end-users. However, there are things you can do when designing models that will help SAP DataSphere run at optimal performance levels. In this post, I'll share some best practices and tips to help keep things perform smoothly.
Implementation Best Practices and Guidelines
Spaces
Spaces provide a way to partition SAP Data Warehouse Cloud tenant into independent virtual work environments for departments, LOB’s, data domains.
Spaces are virtual work environments and are required before you start to model data.
1. It is recommended to use more than one space within tenant
2. Setup multiple dedicated spaces for those areas to further segregate access to particularly sensitive data sets
3. Cross sharing across spaces follows the concept to store data once and use it in many different contexts
Connections
SAP Data Warehouse Cloud supports remote connections to various source systems. The full details about the Supported Connection Type Features are described in the SAP Help documentation.
1. One connection per source system would be used to avoid redundancies for remote tables
2. If there is need for scenarios such as restriction or exposure of specific data then design based of the user specific authorizations or analytical privileges to the specific tables and/or views
Remote Tables
Tables sourced of S/4 HANA CDS Views or S/4 Tables can be created as remote tables.
1. It is recommended to have the remote tables based of the CDS views as compared to the direct of S/4 Tables unless there is an exception to a use case but wisely do so.
2. It is suggested to use option for real-time replication or (scheduled) single snapshots based of the Full Load .vs. Delta approach business use case. This is implied when there is a need for persistency for further corporate harmonized layer
3. It is also suggested to apply filters at the remote table configuration for performance purposes based of the use case scenarios. Note, no filters can be applied for SDA based connections.
4. It is suggested to partition the table based on the persistency of the remote result set in Datasphere to help optimize the load performance as opposed to Federation leads to higher network load
ODQMON T – Code can be used to validate or monitor the load from the SAP GUI of the source system for a real time replication
Data Flows
Data Flow enables ETL functionality to perform the transformation and prepare for a harmonization layer / corporate memory layer
1. Avoid performing complex calculations or transformations on federated data
2. Data Type transformations are suggested to be implicitly performed on Datasphere front or underlying CDS views or source front considering the complexity and possibilities
3. Use filter nodes or script (python data frames) for transformations and/or multiple calculated, complex transformations, and restricted key measures/dimensions wherever possible
4. Persist the data when heavy transformations or calculations involved.
Data Modeling
Typical Data Modelling always recommends having reusable views instead of one huge or giant view.
1. Use a where/Filter condition on a joined result set instead of an inner join to achieve filtering. This performs better.
2. Avoid use where conditions in the join ON-clause to filter data in a join rather perform the filtering prior to the join
3. Enforce data aggregation by reducing the dimensions (group by dimensions) and using less granular ones
4. Calculation of key performance indicators (KPIs) and other derived measures should be done in SAP Data sphere wherever possible
5. Master data incl. text associations and hierarchies should be modelled independently and common across the functional areas.
6. SQL View /Graphical View persistency is suggested with use case of any complex or heavy business logic involved for the result set that explodes to any reporting tool to experience better performance to the user. However, caveat on the frequency of the persistency based of the use case of the business reporting needs.
References
Please refer to the blog
https://blogs.sap.com/2023/03/29/sap-datasphere-life-cycle-management-and-deployment-options/ for connection related options.
Summary
This blog illustrates to jump start the development and implementation of SAP DataSphere Application Design. Intention is to fabricate the experiences gained over the projects implementation that serves as guidelines. These are based on the features available as of the version available and would undergo continuous enhancement as soon as we learn more.
Please share your feedback or thoughts in a comment section and follow my profile for similar content.
Helpful Links:
https://community.sap.com/topics/datasphere
https://blogs.sap.com/tags/73555000100800002141/
https://answers.sap.com/tags/73555000100800002141
http://help.sap.com