Key Concept
There are three main types of data repositories from reporting & analytics perspective – data warehouse, data mart and data lake. It is important to distinguish between the three repositories since each one serves a different purpose and follow distinct design principles. Technically SAP HANA can be configured and designed to support any of the three repositories (without going into the cost benefit analysis) however it is important to understand and adhere to the unique design principles of the three repositories while configuring and modeling the HANA database. In part one of this three part series I’ll focus on the use of HANA as a data mart.
Introduction
Since its launch SAP HANA has become an integral part of the reporting and analytics landscape for thousands of SAP customers. It is being used to provide access to transactional data for near real time reporting, accelerate SAP ECC transactions and optimize complex data processing operations. Based on my personal experience of working on multiple HANA implementation projects I strongly felt the need to have a reference framework during the design phase. The objective of this reference framework would be to ensure adherence to some pre-agreed core design principles while modeling HANA to support any of the data repositories that I mentioned earlier. The reference framework can be used by the project team to make informed decisions during the design phase, it can also be used as a checklist for performing doing quality reviews of the build.
Before we dive into the details of a sample reference framework let’s quickly level set the on the basic definition of a data mart. A data mart is as subset of data warehouse that is focused on a single subject or discipline like sales, manufacturing, procurement etc. It provides easy access to curated, integrated and de-normalized data to the users for a variety of reporting needs. The data stored in a data mart is non-volatile and highly governed to provide consistent, re-producible results every time it is queried.
Reference framework while modeling SAP HANA as a data mart.
The above definition has some key words like governed, integrated and non-volatile which defines the core characteristics of a data mart hence it is recommended that the project team should put together a framework consisting of key design and modeling considerations for SAP HANA to ensure optimum use of the technology. The design framework could be used as a tool to enforce consistency across all data models in HANA and would provide a common checklist to perform the quality reviews of the design. Below I have outlined a sample framework that you can use as a starting point during the design phase of your project, it can be refined over time to make it more suitable for your organization.
Governance
- Wikipedia style data dictionary: Setup a Wikipedia style data dictionary for the data mart, provide access to both business and IT resources to submit, review and edit content for the dictionary.
- Data Lineage: Consider configuring SAP HANA ESS (Enterprise Semantic Services) for developers and business users to build knowledge graphs. There is an excellent blog on this topic on blogs.sap.com.
- Impact Assessment Matrix: To maintain the integrity of data models in SAP HANA it is very important to perform an impact assessment before approving enhancements or changes to existing data models that are live in production.
- Change Review Board would facilitate the change request review and approval process for efficient demand management.
- Planned release cycles: Having a release calendar helps to set the expectation with business on the upcoming delivery schedule.
Non volatile
- Data snapshots & Physicalized layers: Data marts should be able to provide same consistent response to user queries over any duration of time which may require physicalizing the output of complex calculations and data flows in HANA. The architects should make informed decisions on what intermediate or final results would be saved in physical tables to provide consistent output from HANA data mart.
- Validated and reconciled datasets helps to establish the credibility of data marts among business users. It is highly recommended to setup guidelines on how data will be validated and reconciled across the HANA models during the design phase.
- Business approved mappings & data transformation rules: Many times there is need to transform and align the data using standard definitions and hierarchies of key data elements. It is recommended to define and document the mapping ownership and workflows to maintain the integrity of data in your HANA data mart.
- Controlled batch job schedule: Data extraction from source systems and its progression across different layers of the data mart is often automated using a third party tool. It is important to have controls in place to track and maintain the entire catalog of batch jobs along with dependencies. Any change to the dependency, sequence or addition of new jobs should be thoroughly tested in lower environments before promoting the revised schedule to production.
- Scripted Regression Testing – There is a high likelihood that some or several models in the data mart would be enhanced over time hence it is important to have well documented test plan and scenarios for doing regression testing in quality and pre-production systems.
Agile
- Create De-normalized semantic layer: Data mart should provide easy and fast access to curated data for answering variety of business questions, this should be a key guideline while designing the calculation views in SAP HANA and can be achieved by creating de-normalized, semantic layer for efficient reporting and analysis purposes.
- Use Graphical views as much as possible: There will be need for design trade-offs during the course of the project to build models to support complex, business requirements however as a best practise data marts should consist of graphical models as much as possible to keep them nimble. When faced with the dilemma to select between graphical vs. scripted views team should adopt the principle of 'why not' graphical view to stick with the guiding principle.
- Enable prototyping with production like data in the development system, it will significantly speed up the design phase and would help to avoid the need for big enhancements during user acceptance testing.
- Consider using star joins in calculation views to combine key figures from multiple tables, provide linkage to multiple dimensions including hierarchy enabled dimensions and keep the model simple. The calculation engine in HANA has been optimized to handle such workload efficiently.
Conclusion
It is important for project teams to setup a reference framework for establishing design guidelines to make the best use of SAP HANA as a data mart. The reference framework can complement a detailed architecture guide and can be used as an effective tool to level set the same, consistent guidelines for the entire project team.