Technology Blog Posts by Members
cancel
Showing results for 
Search instead for 
Did you mean: 
Abhishek_Hazra
Active Contributor
5,309

Loop control is an effective programming concept that helps in dealing with repetitive execution of a logic over individual segments of an entire or sub dataset. While working with SQL, there are set-based operations and window functions that can help avoiding multiple execution of code blocks in loops. However, there can be use cases where we would end up using loops to achieve the desired outcome in certain deliverables. The objective of this blog post is to explore the possibilities of a less discussed topic so far - using conditional loops in Datasphere (based on a hypothetical use case).
I have taken a scenario where we have 2 measures, namely closing stock & inbound business units per article & store combination over a period of time ie. 16.02.2025 - 26.02.2025. Now I want to derive a calculated measure called interim new stock which should be derived as present day's inbound quantity added up with previous day's closing stock quantity for same article-site combination. (For the sake of simplification, I have not taken any outbound movement into consideration :)).
Let's say I have a table with below inbound & closing stock information in Datasphere within my space in Datasphere.

 

Calendar_DaySiteArticleUnitPurchase_UnitInbound_BUnClosing_Stock
2025-02-16S5673475896901EACA1720
2025-02-16S54610464863EACA1324
2025-02-16S54692105721EACA2112
2025-02-17S5673475896901EACA1123
2025-02-17S54610464863EACA1627
2025-02-17S54692105721EACA21219
2025-02-18S5673475896901EACA1219
2025-02-18S54610464863EACA1123
2025-02-18S54692105721EACA2721
2025-02-19S5673475896901EACA1312
2025-02-19S54610464863EACA1118
2025-02-19S54692105721EACA22436
2025-02-20S5673475896901EACA16040
2025-02-20S54610464863EACA1920
2025-02-20S54692105721EACA21240
2025-02-21S5673475896901EACA1134
2025-02-21S54610464863EACA1117
2025-02-21S54692105721EACA2341
2025-02-22S5673475896901EACA1620
2025-02-22S54610464863EACA1618
2025-02-22S54692105721EACA21032
2025-02-23S5673475896901EACA14860
2025-02-23S54610464863EACA11221
2025-02-23S54692105721EACA2327
2025-02-24S5673475896901EACA1248
2025-02-24S54610464863EACA1520
2025-02-24S54692105721EACA2720
2025-02-25S5673475896901EACA11043
2025-02-25S54610464863EACA1924
2025-02-25S54692105721EACA2321
2025-02-26S5673475896901EACA1432
2025-02-26S54610464863EACA11527
2025-02-26S54692105721EACA21230

I need a new target table which will have last 4 days' information from above table & a new calculated field called New_Interim_Stock which needs to be derived based on previous day's closing stock + present day's inbound quantity. For example, on 2025-02-26, for article 10464863 in site S546 the new interim stock would be 24+15 = 39. Below, I will try to demonstrate how we can do it applying loops in Datasphere through different approaches.

 

Approach #1 - using SQLScript view

SAP Datasphere supports the SQLScript syntax when the view editor is chosen as SQLScript (Table Function). Below I have attached the view definition script below with a conditional for loop which iterates over last 4 days of the dataset along with the calculated field New_Interim_Stock. I have created two different date variable pointers for handling current day (for inbound quantity) & previous day (for closing stock quantity) & a self join of the same dataset restricted by above date variables individually. The conditional for loop iterates over the dataset backwards starting from present day until it completes the calculation over the last four days.

 

declare control_date date;
declare today date default current_date;
declare date_var_inbound nvarchar(8);
declare date_var_stock nvarchar(8);
declare counter int default 1;
declare result_table table (
        "Calendar_Day" date,
        "Site" nvarchar(100),
        "Article" nvarchar(100),
        "Unit" nvarchar(100),
        "Purchase_Unit" nvarchar(100),
        "Inbound_BUn" int,
        "Closing_Stock" int,
        "New_Interim_Stock" int
        );
        
for counter in 1..4 do
    control_date := add_days(:today, -:counter);
    date_var_inbound := to_nvarchar(:control_date, 'YYYYMMDD');
    date_var_stock := to_nvarchar(add_days(:control_date, -1),'YYYYMMDD');

    insert into :result_table 
            select 
                inb."Calendar_Day",
                inb."Site",
                inb."Article",
                inb."Unit",
                inb."Purchase_Unit",
                inb."Inbound_BUn",
                inb."Closing_Stock",
                sto."Closing_Stock" + inb."Inbound_BUn" "New_Interim_Stock"
            from "inbound_stock_data" inb
            left join "inbound_stock_data" sto
            on  inb."Site" = sto."Site"
            and inb."Article" = sto."Article"
            where inb."Calendar_Day" = :date_var_inbound
            and   sto."Calendar_Day" = :date_var_stock;
end for;

return
select * from :result_table order by "Calendar_Day";

 

When the view is deployed, we can preview the data & validate output based on the data in source table.
Data preview of SQLScript viewData preview of SQLScript view

Note : The output is from the data previewing option of the view, which executes the conditional loop behind the scenes to get the output every time we preview data. For larger datasets, an ideal solution would be to create a data flow with this view as a source & a target table with the output structure in 'Append' data write mode. This will ensure that the output data is persisted & querying the same is free from the calculation latency.

 

Approach #2 - using Data Flow Script operator

In this approach, I created a dataflow with the source data table as a source, a target table for persisting the calculated result data & in between I have used the Script operator. The scripting language used here is Python.

Dataflow Scripting.JPG

I have attached the Python script below. The logic here is that I have sliced the dataset for the last 4 days backwards from maximum date available in the dataset ie. today. To begin with this time, I created with two Pandas dataframes. One of them is df_first, which is used to handle the oldest day of the sliced dataset. Because the sliced dataset has the first Calendar_Day as 23.02.2025, so the closing stocks for 22.02.2025 need to be fetched from the original dataset which is held in the df_first dataframe. On the other hand, df is sliced, grouped based on Article & Site & then fed to grouped_df dataframe. grouped_df dataframe is then consumed in the for loop alongside two date variables to point to the present day's data row & previous day's data row. The derivation of New_Interim_Stock is made within the if-else block, where the first day in the data slice is handled using the df_first dataframe based on article-site combination check. Finally the field New_Interim_Stock is updated within the df dataframe & a sorted result dataset is returned based on Calendar_Day.

 

def transform(data):

    df_first = pd.DataFrame(data)
    df_first['Calendar_Day'] = pd.to_datetime(df_first['Calendar_Day'], format='%Y-%m-%d')
    df_first['Calendar_Day'] = pd.to_datetime(df_first['Calendar_Day']).dt.date
    df = df_first.copy()

    today = df['Calendar_Day'].max()
    relevant_days = [today - pd.Timedelta(days = date_diff) for date_diff in range(0,4)]
    df = df[df['Calendar_Day'].isin(relevant_days)]

    for (article, site), group in df.groupby(['Article', 'Site']):
        grouped_df = group.sort_values(by = 'Calendar_Day', ascending = False)

        for date_diff in range(0,4):
            current_day = today - pd.Timedelta(days = date_diff)
            current_day_row = grouped_df[grouped_df['Calendar_Day'] == current_day]

            prev_day = current_day - pd.Timedelta(days = 1)
            prev_day_row = grouped_df[grouped_df['Calendar_Day'] == prev_day]
   
            if not prev_day_row.empty:
                new_stock = prev_day_row['Closing_Stock'].values[0] + current_day_row['Inbound_BUn'].values[0]
            else:
                prev_day_row_from_first = df_first[
                (df_first['Calendar_Day'] == prev_day) & 
                (df_first['Article'] == article) & 
                (df_first['Site'] == site)
                ]
                new_stock = prev_day_row_from_first['Closing_Stock'].values[0] + current_day_row['Inbound_BUn'].values[0]

            df.loc[(df['Article'] == article) & (df['Site'] == site) & (df['Calendar_Day'] == current_day), 'New_Interim_Stock'] = new_stock

    data = df.sort_values(by = 'Calendar_Day')

    return data

 

After the Data Flow is saved & deployed, execute the same & preview the persisted output data in the target table.

Output of Script operatorOutput of Script operator

 

Approach #3 - using Open SQL procedures

The last option is basically utilising the conventional HANA studio experience. In Datasphere you can create procedures in your open SQL schema if you have a DB user for the space in context or a DB usergroup user to work with. Using the DB user credentials we can launch the database explorer or open the integrated HANA DB explorer console in VS Code. After that we can create procedures similar to HANA. The logic here basically follows the same structure that I used in the first approach with the SQLScript view.

Remember that in this case the output table has to be manually created within the same open SQL schema attached to the space where the procedure has write access. The open SQL schemas are access restricted, so for others to view the content of the output table, the table needs to be imported & deployed in the space by dragging into a view editor console for the first time use. Also for the same reason, if you have a local table in a space, the table might not be accessible for reading inside the procedure from your open SQL schema unless explicitly granted the select rights. In this case, the easiest alternative is to create a consumption enabled view on the local table & then it will be accessible by the open SQL schemas attached to the space.

Below is the script for the procedure. After the procedure is created, I simply called it in the console to insert the output data in loops to the target table. For everyday usage, the procedure can be included in a task chain.

 

create procedure 
"PROC_FILL_NEW_INTERIM_STOCK"
language sqlscript as
begin

declare control_date date;
declare today date default current_date;
declare date_var_inbound nvarchar(8);
declare date_var_stock nvarchar(8);
declare counter int default 1;

for counter in 1..4 do
    control_date := add_days(:today, -:counter);
    date_var_inbound := to_nvarchar(:control_date, 'YYYYMMDD');
    date_var_stock := to_nvarchar(add_days(:control_date, -1),'YYYYMMDD');

    insert into table_proc_new_interim_stock
            select 
                inb."Calendar_Day",
                inb."Site",
                inb."Article",
                inb."Unit",
                inb."Purchase_Unit",
                inb."Inbound_BUn",
                inb."Closing_Stock",
                sto."Closing_Stock" + inb."Inbound_BUn" "New_Interim_Stock"
            from "Inbound_Stock_View" inb
            left join "Inbound_Stock_View" sto
            on  inb."Site" = sto."Site"
            and inb."Article" = sto."Article"
            where inb."Calendar_Day" = :date_var_inbound
            and   sto."Calendar_Day" = :date_var_stock;
end for;

end

 

Simply call & execute the procedure to insert the result dataset in target table.

 

call "PROC_FILL_NEW_INTERIM_STOCK";

 

To consume the table back in Datasphere build application, import & deploy the same & create a view on top of it.

Procedure based output.JPG

As a blog-post wind up, I can add a final comment based on my experience.

  • When the source data is not persisted & accessed from remote source, the dynamic filter push down control of Script operator is not as strong as SQLScript view or DB Procedure, where you can control the dynamic filter push down by scripting in specific manner. But remember that this observation can vary based upon the connection type being used between Datasphere & the remote source system.

Cheers,
Abhi 

1 Comment