SAP Data Intelligence 3.1, on-premise edition is now available.
Within this blog post, you will find updates on the latest enhancements in 3.1. We want to share and describe the new functions and features of SAP Data Intelligence for the Q4 2020 release.
If you would like to review what was made available in the previous release, please have a look at this blog post.
This section will give you only a quick preview about the main developments in each topic area. All details will be described in the following sections for each individual topic area.
SAP Data Intelligence 3.1
Connectivity & Integration
This topic area focuses mainly on all kinds of connection and integration capabilities which are used across the product - for example in the Metadata Explorer or on operator level in the Pipeline Modeler.
Connectivity to SAP HANA Cloud
Support of SAP HANA Cloud in the Metadata Explorer, Connection Management, Pipeline Modeler (incl. SAP HANA, Connectivity, Flowagent and Structured Data Operators) which allows to consume and persist data in SAP HANA Cloud and the data lake.
Support SAP BW connection through cloud connector
Now you can connect to on premise SAP Business Warehouse (SAP BW, SAP BW on HANA and SAP BW/4 HANA) systems that are running behind a firewall using the SAP Cloud Platform, Cloud Connector.
Data Preview on Consumer Operators
New Structured File Consumer and Structured Table Consumer operators now provide an option to preview the data of a chosen file or table within the Pipeline Modeler:
Integration with cloud solutions from SAP and ABAP (legacy) systems
New Application Consumer is now available, which supports reading from OData, CDI, SCP – Open Connectors.
New Application Producer is now available, which supports:
BW Service – writing into Advanced Datastore objects under /Datastore folder in the BW connection. This only works for SAP BW4/HANA.
OData Service – OData as target for OData services that are write-enabled
Flowagent file producer operators enhancements
Flowagent file producers provide a new option to define maximum package size (records) that will be included per file until a new file is created automatically.
Metadata & Governance
In this topic area you will find all features dealing with discovering metadata, working with it and also data preparation functionalities. Sometimes you will also find information about newly supported systems.
Import metapedia terms from SAP Information Steward into the business glossary of SAP Data Intelligence
Increase your return on investment in SAP Information Steward by accessing metapedia terms from SAP Data Intelligence. Support a federated solution approach for SAP Information Steward and SAP Data Intelligence.
Import rules from SAP Information Steward into SAP Data Intelligence
Regain data management investment made in SAP Information Steward to add value to SAP Data Intelligence for product cost savings and efficiencies.
Automate the extraction of lineage information from pipeline and data preparation
Increase efficiency with automation of extraction of lineage and publishing to the catalog. Enrich catalog with lineage information in the datasets processed by pipeline and data preparation.
Link Metadata Explorer artifacts with business glossary
Associate all functionalities related to metadata explorer including fact sheets, rules, and lineages with the business glossary, which is the common vocabulary used for business.
Support UNION ALL in data preparation
It now allows merging of multiple datasets with UNION ALL option.
Support aggregation of values in a column in data preparation
Feature parity with agile data preparation. Increased functionality for business users and business analysts to perform data preparation in SAP Data Intelligence.
Support multiple glossaries to match SAP Information Steward
Support businesses with more than glossaries, which are collections of categories (which are collections of business terms). Feature parity with SAP Information Steward and flexibility to define business glossary across Line of Business.
Support right outer join in data preparation
Join a preparation with another preparation (no self-join) while retaining all of the records of the right preparation even if unmatched, such as with a right outer join.
Support union to remove duplicates in data preparation
Union a preparation with another preparation (no self-union) while removing duplicate records, such as with distinct union.
Data preview of JSON, PDF and Image files
New support preview of JSON, PDF and image files in factsheet.
The screen below shows preview of a JSON file:
Advanced Rule Script Editing
New advanced mode to create an advanced script with parameters, operators, and functions buttons for a new rule or an imported rule is supported.
Rule Filtering Result Enhancements
Now, when specifying filtering within a rule, the results screen provides a specific count and percentage at the rule level for the number of records that have been filtered.
Rename enrich result column
Data preparation now allows renaming the output columns by double-clicking the column name that you want to change and entering a new name.
Source and target file mapping
Now it allows to map source with the target or resulted file when data preparation is run on multiple source files generating multiple result files.
Redesigned fact sheet UI to have 360-degree views of data without switching context
You can now view all the information about a dataset, better organized into relevant sections - as a fact sheet including an overview of its metadata, a preview of the data, the lineage, tagging; also view the related terms, rules, rulebooks of the dataset add comments, descriptions, tags to dataset to extend relationships and information.
This topic area covers new operators or enhancements of existing operators. Improvements or new functionalities of the Pipeline Modeler and the development of pipelines.
Enhanced Graph Snippet functionality
New enhanced capabilities of the graph snippets to support more concepts of design-time configuration of a graph and to simplify the creation process including:
Support for editing existing graph snippets
Support for group configuration in a snippet, e.g. group multiplicity
Support to use a SVG image file as an icon for a graph snippet
Support for adding an additional description next to parameters
Resetting all configuration parameters
Definition of shared parameters that can be used by multiple operator configurations of the same graph snippet
Now users can change an operator and save it as a new version. Multiple versions of the same operator can exist and statuses for versions can be identified as active or deprecated. This lets operator owners release new versions to users, while keeping the option of using the deprecated operators.
Run pipeline in debug mode
Users can now run a pipeline in debug mode to be able to see the messages between operators. In Debug Mode, the pipeline runtime view shows tracepoints for each edge in the pipeline graph that allows to open a Wiretab (message viewer) to see the edge traffic.
Import/export files from Data Pipelines Modeler
User can now directly import and export files from the Data Pipelines Modeler repository browser. It is also possible to directly export solutions.
Admin features for schedules
Now an administrator can search and manage the schedules of all users of the tenant to control workload generation.
Archiving of pipelines
Pipelines can now be archived (formerly cleaned-up) to free tenant resources but still have audit information about past executions.
Improved UI style
Style updates for improved user experience and alignment with UI5 standards.
This topic area includes all improvements, updates and way forward for Machine Learning in SAP Data Intelligence.
Usability improvements in Machine Learning applications
Several usability improvements in existing Machine Learning applications have been implemented:
Show run tags in Metrics Explorer (see screenshots below)
Show run name in Metrics Explorer
Improved Error Message for Duplicate Scenarios in Machine Learning Scenario Manager
Tracking SDK: fetch runs under Run Collection
As part of conducted Model trainings which usually comes with multiple runs and with respective groupings under specific run collections, it is now possible to run all objects grouped under a specific run collection.
Multi-Model Serving capabilities
It is possible to deploy and run multiple models in a model server (in a single node). As a consequence, the end user can save inference costs. Moreover, sharing of GPU resources is possible as well.
Content templates for content delivery
Content template packages are collections of SAP Data Intelligence resources (pipelines and notebooks) that can be imported into an SAP Data Intelligence tenant using standard import capabilities in SAP Data Intelligence. Content template packages can be used to speed up implementation of ML scenarios.
SAP HANA Python Client API for Machine Learning algorithms: support for Time Series algorithms
HANA ML operators now offer integration with SAP HANA's PAL (Predictive Analytics Library) and APL (Automated Predictive Library) Time Series analysis tasks for selected algorithms. In addition, requests to the HANA ML inference operator can now include inference parameters when applicable.
The new HANA ML Forecast operator enables the use of Time Series algorithms from SAP HANA's PAL (Predictive Analytics Library) and APL (Automated Predictive Library) in a combined fit and predict step, without persisting trained models.
Improvements for HANA ML operators, which now support:
outputting the result of the HANA ML inference and HANA ML Forecast in JSON format in row-based order
writing the HANA ML inference or HANA ML forecast result directly in a HANA table
Store ML scenarios in central Solution Repository
Now, a more convenient and consistent way is provided to backup ML scenarios created in the ML Scenario Manager.
Improvements for supporting large artifacts
In a pipeline, it is now possible to register / retrieve artifacts in batches which helps to process larger volumes easily.
Way forward for ML scenarios in SAP Data Intelligence
Focus of SAP Data Intelligence is put on ML Orchestration as well as on selected ML Operationalization use cases. For additional information on changes in SAP Data Intelligence Data Science tooling, please refer to SAP Note 2958072
Boundary conditions for ML reference scenarios in SAP Data Intelligence:
Data Integration & Data Management is a crucial matter (with a particular focus on SAP applications)
Focus is on orchestration of data-driven ML processes & operationalization of selected ML scenarios
This topic area includes all services that are provided by the system - like administration, user management or system management.
User resource quotas
Users can now be granted resource quotas for pipelines and application usage to allow for fair resource distribution among the users. The following resources can be limited for users and groups using the policy framework:
Number of Kubernetes Pods
Application start policies
New resources have been added to the policy framework to permit access to the individual SAP Data Intelligence applications (Connection Manager, Pipeline Modeler, Meta-data Explorer, etc.). This allows administrators to create new roles of users with specific permissions to use applications.
IdP support for system command-line client (vctl)
Users can now authenticate to vsystem with the system command-line client (vctl) using their credentials from an external Identify Provider (IdP).
Improved resource efficiency of system applications
The Launchpad and System Management application do now need less system resources when used by several of users by sharing the underlying instances to all tenant users.
Simplified application lifecycle
Applications in SAP Data Intelligence can now be re-started, if needed, instead of stopping and starting them on demand. It is also ensured that only a single logical instance is running per tenant or user (depending on the type of the application).
Import/export to Solution Repository
Users can now import and export solutions to/from the Solution Repository for reliable content sharing. Solutions can be installed to the tenant directly from the Solution Repository.
Solution import with conflict resolution
When importing solutions to the User Workspace the conflict resolution will detect existing files and ask how to resolve the conflict. This works with imports from Solution Repository and from the file system.
Tenant resource quotas
Administrators can create Memory, CPU, and Kubernetes Pod quotas for their SAP Data Intelligence tenant.
Deployment & Delivery
Within this focus area, all functions and features which are dealing with the setup process, installation or deployment will be described.
Optional deployment of SAP Vora
It is now optional to deploy SAP Vora when creating a new SAP Data Intelligence cluster.
Minimum cluster size is now 2 dynamic nodes (down from 3), which results in a lower TCO.
Integrated Backup & Recovery
SAP Data Intelligence can be configured for point-in-time backups of the complete system state (pipelines, connections, users, Vora tables, system configuration).
Backups can be stored on all major object stores (S3, ADLS, GCS, OSS, Ceph)
Backups are taken online (system is fully operational)
These are the new functions, features and enhancements in SAP Data Intelligence 3.1, on-premise edition release.
We hope you like them and, by reading the above descriptions, have already identified some areas you would like to try out.