Loop control using SQLScript view, Data Flow scrip...

Abhishek_Hazra · ‎2025 Feb 27

Loop control is an effective programming concept that helps in dealing with repetitive execution of a logic over individual segments of an entire or sub dataset. While working with SQL, there are set-based operations and window functions that can help avoiding multiple execution of code blocks in loops. However, there can be use cases where we would end up using loops to achieve the desired outcome in certain deliverables. The objective of this blog post is to explore the possibilities of a less discussed topic so far - using conditional loops in Datasphere (based on a hypothetical use case).
I have taken a scenario where we have 2 measures, namely closing stock & inbound business units per article & store combination over a period of time ie. 16.02.2025 - 26.02.2025. Now I want to derive a calculated measure called interim new stock which should be derived as present day's inbound quantity added up with previous day's closing stock quantity for same article-site combination. (For the sake of simplification, I have not taken any outbound movement into consideration :)).
Let's say I have a table with below inbound & closing stock information in Datasphere within my space in Datasphere.

Calendar_Day	Site	Article	Unit	Purchase_Unit	Inbound_BUn	Closing_Stock
2025-02-16	S567	3475896901	EA	CA1	7	20
2025-02-16	S546	10464863	EA	CA1	3	24
2025-02-16	S546	92105721	EA	CA2	1	12
2025-02-17	S567	3475896901	EA	CA1	1	23
2025-02-17	S546	10464863	EA	CA1	6	27
2025-02-17	S546	92105721	EA	CA2	12	19
2025-02-18	S567	3475896901	EA	CA1	2	19
2025-02-18	S546	10464863	EA	CA1	1	23
2025-02-18	S546	92105721	EA	CA2	7	21
2025-02-19	S567	3475896901	EA	CA1	3	12
2025-02-19	S546	10464863	EA	CA1	1	18
2025-02-19	S546	92105721	EA	CA2	24	36
2025-02-20	S567	3475896901	EA	CA1	60	40
2025-02-20	S546	10464863	EA	CA1	9	20
2025-02-20	S546	92105721	EA	CA2	12	40
2025-02-21	S567	3475896901	EA	CA1	1	34
2025-02-21	S546	10464863	EA	CA1	1	17
2025-02-21	S546	92105721	EA	CA2	3	41
2025-02-22	S567	3475896901	EA	CA1	6	20
2025-02-22	S546	10464863	EA	CA1	6	18
2025-02-22	S546	92105721	EA	CA2	10	32
2025-02-23	S567	3475896901	EA	CA1	48	60
2025-02-23	S546	10464863	EA	CA1	12	21
2025-02-23	S546	92105721	EA	CA2	3	27
2025-02-24	S567	3475896901	EA	CA1	2	48
2025-02-24	S546	10464863	EA	CA1	5	20
2025-02-24	S546	92105721	EA	CA2	7	20
2025-02-25	S567	3475896901	EA	CA1	10	43
2025-02-25	S546	10464863	EA	CA1	9	24
2025-02-25	S546	92105721	EA	CA2	3	21
2025-02-26	S567	3475896901	EA	CA1	4	32
2025-02-26	S546	10464863	EA	CA1	15	27
2025-02-26	S546	92105721	EA	CA2	12	30

I need a new target table which will have last 4 days' information from above table & a new calculated field called New_Interim_Stock which needs to be derived based on previous day's closing stock + present day's inbound quantity. For example, on 2025-02-26, for article 10464863 in site S546 the new interim stock would be 24+15 = 39. Below, I will try to demonstrate how we can do it applying loops in Datasphere through different approaches.

Approach #1 - using SQLScript view

SAP Datasphere supports the SQLScript syntax when the view editor is chosen as SQLScript (Table Function). Below I have attached the view definition script below with a conditional for loop which iterates over last 4 days of the dataset along with the calculated field New_Interim_Stock. I have created two different date variable pointers for handling current day (for inbound quantity) & previous day (for closing stock quantity) & a self join of the same dataset restricted by above date variables individually. The conditional for loop iterates over the dataset backwards starting from present day until it completes the calculation over the last four days.

declare control_date date;
declare today date default current_date;
declare date_var_inbound nvarchar(8);
declare date_var_stock nvarchar(8);
declare counter int default 1;
declare result_table table (
        "Calendar_Day" date,
        "Site" nvarchar(100),
        "Article" nvarchar(100),
        "Unit" nvarchar(100),
        "Purchase_Unit" nvarchar(100),
        "Inbound_BUn" int,
        "Closing_Stock" int,
        "New_Interim_Stock" int
        );
        
for counter in 1..4 do
    control_date := add_days(:today, -:counter);
    date_var_inbound := to_nvarchar(:control_date, 'YYYYMMDD');
    date_var_stock := to_nvarchar(add_days(:control_date, -1),'YYYYMMDD');

    insert into :result_table 
            select 
                inb."Calendar_Day",
                inb."Site",
                inb."Article",
                inb."Unit",
                inb."Purchase_Unit",
                inb."Inbound_BUn",
                inb."Closing_Stock",
                sto."Closing_Stock" + inb."Inbound_BUn" "New_Interim_Stock"
            from "inbound_stock_data" inb
            left join "inbound_stock_data" sto
            on  inb."Site" = sto."Site"
            and inb."Article" = sto."Article"
            where inb."Calendar_Day" = :date_var_inbound
            and   sto."Calendar_Day" = :date_var_stock;
end for;

return
select * from :result_table order by "Calendar_Day";

When the view is deployed, we can preview the data & validate output based on the data in source table.
Data preview of SQLScript view

Note : The output is from the data previewing option of the view, which executes the conditional loop behind the scenes to get the output every time we preview data. For larger datasets, an ideal solution would be to create a data flow with this view as a source & a target table with the output structure in 'Append' data write mode. This will ensure that the output data is persisted & querying the same is free from the calculation latency.

Approach #2 - using Data Flow Script operator

In this approach, I created a dataflow with the source data table as a source, a target table for persisting the calculated result data & in between I have used the Script operator. The scripting language used here is Python.

Dataflow Scripting.JPG

I have attached the Python script below. The logic here is that I have sliced the dataset for the last 4 days backwards from maximum date available in the dataset ie. today. To begin with this time, I created with two Pandas dataframes. One of them is df_first, which is used to handle the oldest day of the sliced dataset. Because the sliced dataset has the first Calendar_Day as 23.02.2025, so the closing stocks for 22.02.2025 need to be fetched from the original dataset which is held in the df_first dataframe. On the other hand, df is sliced, grouped based on Article & Site & then fed to grouped_df dataframe. grouped_df dataframe is then consumed in the for loop alongside two date variables to point to the present day's data row & previous day's data row. The derivation of New_Interim_Stock is made within the if-else block, where the first day in the data slice is handled using the df_first dataframe based on article-site combination check. Finally the field New_Interim_Stock is updated within the df dataframe & a sorted result dataset is returned based on Calendar_Day.

def transform(data):

    df_first = pd.DataFrame(data)
    df_first['Calendar_Day'] = pd.to_datetime(df_first['Calendar_Day'], format='%Y-%m-%d')
    df_first['Calendar_Day'] = pd.to_datetime(df_first['Calendar_Day']).dt.date
    df = df_first.copy()

    today = df['Calendar_Day'].max()
    relevant_days = [today - pd.Timedelta(days = date_diff) for date_diff in range(0,4)]
    df = df[df['Calendar_Day'].isin(relevant_days)]

    for (article, site), group in df.groupby(['Article', 'Site']):
        grouped_df = group.sort_values(by = 'Calendar_Day', ascending = False)

        for date_diff in range(0,4):
            current_day = today - pd.Timedelta(days = date_diff)
            current_day_row = grouped_df[grouped_df['Calendar_Day'] == current_day]

            prev_day = current_day - pd.Timedelta(days = 1)
            prev_day_row = grouped_df[grouped_df['Calendar_Day'] == prev_day]
   
            if not prev_day_row.empty:
                new_stock = prev_day_row['Closing_Stock'].values[0] + current_day_row['Inbound_BUn'].values[0]
            else:
                prev_day_row_from_first = df_first[
                (df_first['Calendar_Day'] == prev_day) & 
                (df_first['Article'] == article) & 
                (df_first['Site'] == site)
                ]
                new_stock = prev_day_row_from_first['Closing_Stock'].values[0] + current_day_row['Inbound_BUn'].values[0]

            df.loc[(df['Article'] == article) & (df['Site'] == site) & (df['Calendar_Day'] == current_day), 'New_Interim_Stock'] = new_stock

    data = df.sort_values(by = 'Calendar_Day')

    return data

After the Data Flow is saved & deployed, execute the same & preview the persisted output data in the target table.

Output of Script operator

Approach #3 - using Open SQL procedures

The last option is basically utilising the conventional HANA studio experience. In Datasphere you can create procedures in your open SQL schema if you have a DB user for the space in context or a DB usergroup user to work with. Using the DB user credentials we can launch the database explorer or open the integrated HANA DB explorer console in VS Code. After that we can create procedures similar to HANA. The logic here basically follows the same structure that I used in the first approach with the SQLScript view.

Remember that in this case the output table has to be manually created within the same open SQL schema attached to the space where the procedure has write access. The open SQL schemas are access restricted, so for others to view the content of the output table, the table needs to be imported & deployed in the space by dragging into a view editor console for the first time use. Also for the same reason, if you have a local table in a space, the table might not be accessible for reading inside the procedure from your open SQL schema unless explicitly granted the select rights. In this case, the easiest alternative is to create a consumption enabled view on the local table & then it will be accessible by the open SQL schemas attached to the space.

Below is the script for the procedure. After the procedure is created, I simply called it in the console to insert the output data in loops to the target table. For everyday usage, the procedure can be included in a task chain.

create procedure 
"PROC_FILL_NEW_INTERIM_STOCK"
language sqlscript as
begin

declare control_date date;
declare today date default current_date;
declare date_var_inbound nvarchar(8);
declare date_var_stock nvarchar(8);
declare counter int default 1;

for counter in 1..4 do
    control_date := add_days(:today, -:counter);
    date_var_inbound := to_nvarchar(:control_date, 'YYYYMMDD');
    date_var_stock := to_nvarchar(add_days(:control_date, -1),'YYYYMMDD');

    insert into table_proc_new_interim_stock
            select 
                inb."Calendar_Day",
                inb."Site",
                inb."Article",
                inb."Unit",
                inb."Purchase_Unit",
                inb."Inbound_BUn",
                inb."Closing_Stock",
                sto."Closing_Stock" + inb."Inbound_BUn" "New_Interim_Stock"
            from "Inbound_Stock_View" inb
            left join "Inbound_Stock_View" sto
            on  inb."Site" = sto."Site"
            and inb."Article" = sto."Article"
            where inb."Calendar_Day" = :date_var_inbound
            and   sto."Calendar_Day" = :date_var_stock;
end for;

end

Simply call & execute the procedure to insert the result dataset in target table.

call "PROC_FILL_NEW_INTERIM_STOCK";

To consume the table back in Datasphere build application, import & deploy the same & create a view on top of it.

Procedure based output.JPG

As a blog-post wind up, I can add a final comment based on my experience.

When the source data is not persisted & accessed from remote source, the dynamic filter push down control of Script operator is not as strong as SQLScript view or DB Procedure, where you can control the dynamic filter push down by scripting in specific manner. But remember that this observation can vary based upon the connection type being used between Datasphere & the remote source system.

Cheers,
Abhi

By Category

Related Content

Activity Groups

Industry Groups

Influence and Feedback Groups

Interest Groups

Location Groups

Customer Only Groups

Forums

Related Resources

Products

Learning and Support

About

My Account

My Account

Loop control using SQLScript view, Data Flow script operator & Open SQL procedure in SAP Datasphere