Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
XaviPolo
Active Contributor
6,245
SAP DWC will soon incorporate Data Flow functionality, enabling the definition of more advanced ETL flows that complement existing data federation and replication services.

In the data flows we will be able to use a series of standard transformations without the need of programming knowledge and in a graphic way, but we also have the possibility of creating transformations based on scripts.

And what is the difference between Data Views and Data Flows? mainly that the first ones are oriented to create views that transform the data at the moment they are read without having persistence (although this will change in the future) obtaining a single output structure, while the second ones transform and persist the changes in one or multiple structures.

In a Data Flow we will use views or tables that we may already have in our DWC or use the connections to get data from other systems, in that case we should first create all the necessary connections in our space.


Create connection


 

Creating Data Flows


In Data Builder, where we create the tables, views and E/Rs, we can now find a new "Data Flow" object, which has its own Data Flow Builder editor.


Create Data Flow


 

Here we will have access to the different tables and views of DWC or data sources that we have connected in our space. We will be able to add these sources or destinations to the Data Flow with drag and drop.


Sample Data Flow


 

To these origins we will connect with standard transformations like:

  • Join

  • Union

  • Projection

  • Aggregation


With projections you can choose which fields to move to the next step, apply filters and create calculated fields with the help of 84 functions grouped in the categories of conversion, date, mathematics, etc.


Expressions


 

Script Transformation


This transformation allows the use of a scripting language to perform the required custom transformations. The first supported language is Python3 (more will be supported in the future) and allows the use of the famous Pandas and NumPy libraries, which will make it easier for us to apply some techniques of the data scientists to our data flows. Although we can't use all of Python, for example we can't import other libraries or perform I/O actions such as saving to file or to a DB, or use an http connection to download or send data.

 


Script Transformation


 

Basically the data that enters into the transformation is a dataframe to which we can apply all the transformation possibilities that Pyhton3 (currently 3.6) and Pandas gives us, such as pivoting data, stacks, cross tabulations, etc.

 


Pandas Pivot



Pandas Stack


 

An existing table has to be indicated as the destination of the transformations, so unlike SAP Data Services or HANA SDI, Data Flow in the beta does not have the option of creating the tables directly from the data flow, although they can be created manually in the Data Builder.

 


Target Table


 

Later in the Data Flow you can add the table, select it as target and configure the data insertion mode (APPEND or TRUNCATE)


Append


 

With that, the Data Flow has origins, transformations and destiny, so it can be executed.

 

Running Data Flows


When a Data Flow is executed, its status can be monitored in the Data Flow Monitor, within the Data Integrator Monitor.

 


Data Flow Monitor


 

It is possible to see the history of the executions, as well as more information if an error occurs.

 


Monitor Errors


 

In the beta it is not yet possible to program the Data Flows to be executed on a delayed basis, but this option will be available when the product is in GA.

 

Conclusions


The tested version is the Beta, and although it may change things in the final version it gives us an idea of how things will work, the look and the components we will be able to find.

Many things are missing to be a complete ETL, but we are talking about the first version, and I would dare to say that it brings the ETL world closer to business users to make some basic transformations.

I have been pleasantly surprised by the inclusion of Python as a scripting language, as this will allow complex transformations to be made.

I think that in the near future Data Flow will be able to help us in the SAP DWC project that we are currently developing, so we hope that it will soon reach all SAP DWC customers and that new functionalities will be added.

 

This post first appeared on Linkedin
16 Comments
enioterrs
Explorer
0 Kudos
Hi Xavier, excellent content! Thanks for sharing. Are you using the partner tenant?

BR

Enio Terra
XaviPolo
Active Contributor
0 Kudos
No, this is a specific beta tenant to test Data flow functionality for a DWC project in one of our customers.
JuanCLazaro
Product and Topic Expert
Product and Topic Expert

Really nice blog Xavi, you can edit the target table creation comment. 🙂 Now you can create them on the fly inside the dataflow. Step by step we will be adding more and more functionality so DWC becomes closer to a tool like Data Services.

karenave
Explorer
0 Kudos
With data flows, do you have the functionality of indicating whether a field in a table wants to be transferred or not and thus perform reduction of sending records to a data source? For example if an extractor data source has 150 columns but I really only need 20 which will reduce the volume of data to remote query or replication. In BW currently the data source has a check in the fields where one tells you whether or not to transfer information, I have not been able to check this functionality documented in the DWC issues or in the roadmap.
XaviPolo
Active Contributor
0 Kudos


In DF there is a component named “Projection” that allows to exclude fields from source, and select only those that you need.

Basically it converts the “select *” into “select <selected fields>”

karenave
Explorer
0 Kudos
HI Xavier Polo, thank you very much for your clarification.

Can I also ask you a question about modeling and data source changes?
XaviPolo
Active Contributor
0 Kudos
Of course, but I recommend that you fill out the question form, this way you will have more visibility, and others will be able to respond, and perhaps the question and its answers will help others.

Ask a Question - SAP Q&A

You can always mention with @
karenave
Explorer
0 Kudos
ok, I'm going to write the question as you indicate and I'll mention it to you.
karenave
Explorer
0 Kudos
Hi Xavier, I asked the question in the community but I can't find you to mention you.
This is is link

https://answers.sap.com/questions/13199276/opcion-replace-table-and-datasource-on-views-of-dw.html

Thank you so much.
karenave
Explorer
0 Kudos
Hi Xavier

I was doing a test exercise with the data flow, I can't find how to do the Update behavior in tables.
For example I am loading a table of master data, when I send a double record by id I get an error and do not insert anything. so I have no way to do update behavior to the data.
try the behavior of APPEND or TRUNCATE the same thing happens.

I have also not found how to do a delete * from where some specific table.

Thanks a lot.
XaviPolo
Active Contributor
0 Kudos
If your data source has duplicates and the ID field is set to Key in DWC, it will give you an error regardless of whether or not you delete the target data, and the entire transaction will be reversed.

If the duplicate occurs because there is already an ID in the target table, setting the TRUNCATE option will delete the data from the table before loading the new data so you should not get an error.

Make sure it is not the first one.

 
karenave
Explorer
0 Kudos

Hi Xaver.





Yes, as you say, I have an id in the table and I have transactional data in the source table. I am trying to obtain a master data based on a transactional table.

If I remove the key from the target table, it automatically inserts records successfully, but for example, the same client inserts me 4 times.

The behavior that I am trying to replicate is that of a traditional BW where it can indicate overwrite behavior.


XaviPolo
Active Contributor
0 Kudos

What comes to mind is that you use a view as a source instead of the table, since in the view you can put a table and activate the DISTINCT option. Something that is not available when you add the table to a DF.

karenave
Explorer
0 Kudos
Hi Xavier.

Ok I'm going to take this test and I'll tell you how it goes.

Today they asked me a question about the surrogate keys? this is a concept in traditional databases that were trying to turn into a data warehouse. But here in DWC how is this concept understood? because in SAP BW he created them automatically and they are the SIDs.
0 Kudos
Hello Xavier,

We have created a dataflow that loads data from our source (table) to a target. The source data is increasing every day, as records are being inserted.

So, with our dataflow we only want to load new data from this table. For now, with the DWC functions (Append, truncate) we can't do it, is there any way to do this?

Thanks!

 

Maria
XaviPolo
Active Contributor
0 Kudos
Hello Luis,

If you are talking about a table to which you are adding or updating records, the easiest way is to see if real-time replication is supported. But if you are doing it with dataflows you probably don't have that option.

There is no single way to do it, it all depends on the data source and what options you have supported in DWC. But none of them are "automatic", or are configured by touching a button 😞

The first thing is that you should see how to identify the changes in the source table, for example by a LAST_UPDATE field, or maybe if you only insert records there is an ID that is sequential. If you have such a field, the simplest technique is to do an initial load and carry all the data. Now using the target table you can look up the MAX(LAST_UPDATE) or MAX(ID), i.e. the time when the last record was updated or inserted at source, or its identifier if you use a sequence.

For the delta load, the idea is to filter the records in the source table to have the LAST_UPDATE (or sequential ID) greater than your MAX in the target table.
The problem is that in DF you can't do a date aggregation with MAX (only count), so you will have to do an extra step.
You create a DWC view using the target table, calculate the max of the date, and do a join with the source table like SOURCE.LAST_UPDATE > QUERYMAX.LAST_UPDATE.

Then in DF, you use that DWC view as source and save in the target table with the APPEND and activating the "Update Records by Primary Key" option.

Is it a lot of work? yes, isn't it 100% optimal and you have to check the pushdown of the join of the DWC view? yes, but you have the opportunity to use your imagination to look for this kind of alternatives 🙂

Kind regards,

 
Labels in this area