Technology Blog Posts by Members
cancel
Showing results for 
Search instead for 
Did you mean: 
NarasinghaPrasadPatro
Participant
3,418

Hello Friends,

Welcome back to my other serios of blog post on an exciting feature available in SAP Datasphere, i.e., the data flows in SAP Datasphere. In this blog post, we will understand this concept with an example. 

First, let us understand the different ETL objects available in SAP Datasphere, and in this blog, we will

mainly discuss dataflows using Python scripts.

Datasphere ETL Objects:

IL11.PNG

Data Flows 

Data flow is a key component of SAP Datasphere as it allows users to perform complex data transformations, enrich data, and structure it for reporting and analytics.

In this blog, we will explore the basic features and steps to create a data flow in SAP Datasphere.

A Data Flow is a graphical ETL (Extract, Transform, Load) tool that allows users to:

✔ Extract data from various sources (SAP and non-SAP).
✔ Apply transformations such as filtering, aggregations, joins, and calculated columns.
✔ Load transformed data into target tables for reporting and analytics.

IL11.PNG

 

Remote Tables:

IL11.PNG

Replication flows:

IL11.PNG

IL11.PNG

Transformation flows:

IL11.PNG

Task chains:

IL11.PNG

Let us discuss an example about dataflows using a Python script operator. Rest operator examples are written by others in their blog posts.

Now get set and go.

Please understand the different operators provided in SAP Datasphere. Currently, SAP Datasphere does not have a full-fledged ETL capacity, just like SAP BODS does. But in future releases, SAP data intelligence will be added as a feature that can be used in E2E ETL scenarios.

IL11.PNG

IL11.PNG

Python Script 

IL11.PNG

Supported Python Libraries as of now in SAP Datasphere:

Nympy and Panda. 

https://pandas.pydata.org/docs/user_guide/index.html#user-guide

https://numpy.org/learn/  

The Python script operator area is not as mature as the Jupyter notebook and is not so user-friendly. If you face any syntax error, it is a little tricky to find out the error unless and until we execute the dataflow to see the exact error in the data integration monitor. Hope SAP can integrate the Jupiter notebook IDE with SAP DataSphere with many more libraries to support so that the data science and AI capabilities can be further explored.

I will cover the error handling part and the running/scheduling of dataflow in a separate topic under data integration monitor topic.

Target table and options.

IL11.PNG

Now, deploy and run the dataflow. This is how we create a dataflow in SAP Datasphere. Also, we can schedule the dataflow, or we can integrate with a task chain and schedule the task chain as well with other flows.

Points to be noted and limitations with dataflows are :

When we create a new remote table, it is created virtually using the data federation mechanism. Original data stays in the remote source system, and SAP Datasphere just points to that table.

We cannot use the remote table directly inside a dataflow. We need to create a view on top of that remote table and then consume the view inside a dataflow.

Kindly test this scenario from your side as well. Thank you for reading this blog post, and I hope you liked the content.

watch out for next set of topics in coming days

Thanks,

Narasingha

4 Comments