ETL Objects in SAP Datasphere and Dataflows Examp...

NarasinghaPrasadPatro · ‎2025 Apr 07

Hello Friends,

Welcome back to my other serios of blog post on an exciting feature available in SAP Datasphere, i.e., the data flows in SAP Datasphere. In this blog post, we will understand this concept with an example.

First, let us understand the different ETL objects available in SAP Datasphere, and in this blog, we will

mainly discuss dataflows using Python scripts.

Datasphere ETL Objects:

Data Flows

Data flow is a key component of SAP Datasphere as it allows users to perform complex data transformations, enrich data, and structure it for reporting and analytics.

In this blog, we will explore the basic features and steps to create a data flow in SAP Datasphere.

A Data Flow is a graphical ETL (Extract, Transform, Load) tool that allows users to:

✔ Extract data from various sources (SAP and non-SAP).
✔ Apply transformations such as filtering, aggregations, joins, and calculated columns.
✔ Load transformed data into target tables for reporting and analytics.

Remote Tables:

Replication flows:

Transformation flows:

Task chains:

Let us discuss an example about dataflows using a Python script operator. Rest operator examples are written by others in their blog posts.

Now get set and go.

Please understand the different operators provided in SAP Datasphere. Currently, SAP Datasphere does not have a full-fledged ETL capacity, just like SAP BODS does. But in future releases, SAP data intelligence will be added as a feature that can be used in E2E ETL scenarios.

Python Script

Supported Python Libraries as of now in SAP Datasphere:

Nympy and Panda.

https://pandas.pydata.org/docs/user_guide/index.html#user-guide

https://numpy.org/learn/

The Python script operator area is not as mature as the Jupyter notebook and is not so user-friendly. If you face any syntax error, it is a little tricky to find out the error unless and until we execute the dataflow to see the exact error in the data integration monitor. Hope SAP can integrate the Jupiter notebook IDE with SAP DataSphere with many more libraries to support so that the data science and AI capabilities can be further explored.

I will cover the error handling part and the running/scheduling of dataflow in a separate topic under data integration monitor topic.

Target table and options.

Now, deploy and run the dataflow. This is how we create a dataflow in SAP Datasphere. Also, we can schedule the dataflow, or we can integrate with a task chain and schedule the task chain as well with other flows.

Points to be noted and limitations with dataflows are :

When we create a new remote table, it is created virtually using the data federation mechanism. Original data stays in the remote source system, and SAP Datasphere just points to that table.

We cannot use the remote table directly inside a dataflow. We need to create a view on top of that remote table and then consume the view inside a dataflow.

Kindly test this scenario from your side as well. Thank you for reading this blog post, and I hope you liked the content.

watch out for next set of topics in coming days

Thanks,

Narasingha

By Category

Related Content

Activity Groups

Industry Groups

Influence and Feedback Groups

Interest Groups

Location Groups

Customer Only Groups

Forums

Related Resources

Products

Learning and Support

About

My SAP Profile

My SAP Profile

ETL Objects in SAP Datasphere and Dataflows Example