Introduction

CLTGravesen · ‎2025 Apr 29

Introduction

SAP Datasphere is subset of the new SAP Business Data Cloud offering, which aims to enables every data professional to deliver seamless and scalable access to mission-critical business data.

Datasphere offers a host of connectors to all kinds of sources, SAP and non-SAP alike. One of the most versatile is the "Generic OData" connection, which allows consumption of a wide range of sources.

In this blog, we will see how we can utilize the Data Flow feature to consume complex OData responses.

Note: This is a rather niche guide to the Script Operator in Data Flows. If you were looking for more high-level information, I would recommend the official SAP Datasphere Documents, or the Datasphere tag to browse blog posts.

Summary

The Generic OData connector in Datasphere can't "flatten" a nest structure, and will end up storing nested structures as strings. The script operator in Data Flows allows developers to handle complex OData entities with Python, but can be an cumbersome to say the least.

Your best option is to consider some sort of middleware or integration platform between the source and Datasphere, that better allows you to handle OData entities better.

Prerequisites

If you want to follow along, you will need:

Access to a Datasphere Tenant
Access to a development space, and the correct authorizations.
A basic idea of Datasphere objects: Data Flows and Local Tables
An OData source (I will use the OData.org reference services, so you should be able to reproduce the results)
An understanding of the Python module Pandas, and dataframes
A deep and seething hate for Regular Expressions, but still see their value. I also hate regular expressions.

Disclaimer: I don't like this solution any more than you do

I want to be transparent going into this: I don't think this is a very good or elegant solution.
In an ideal world, I would prefer some sort of integration platform, CIP or something else, between the OData service and Datasphere to correctly unpack and transform the data.

If you don't have that, you may consider the following.

Step 1: Establish the connections

In your development space, you can set up a connection to the services.
OData.org exposes both V2 and V4 OData services.

Clicking on the service will provide you with the service URL:

In Datasphere, create a Generic OData Connection:

Step 2: Consume /Suppliers in a Data Flow

Add the /Suppliers to a new Data Flow and preview the data.

Note: You may get an error, that the source tables is not "Deployed".
If you do, deploy the table and preview the data again.

Step 3 (Optional) : Understanding the problem

If you try to consume the service in a service like postman, or just through a Python request, you will get the following JSON response:

You can see that there are two dictionaries here. Properties, which contain the keys (1) ID, (2) Name, (3) Address, and (4) Concurrency, and then another under Address, which contains Street, City, State, ZipCode and Country.

There are many ways to interpret this data, but the easiest would be to unpack it into the following table:

ID	Name	Address_Street	Address_City	Address_State	Address_ZipCode	Country	Concurrency
0	Exotic Liquids	NE 228th	Sammamish	WA	98074	USA	0

As you can see, Address is unpacked into it's four columns, and we get a lovely, simple table.

However, the table deployed in Step 2, in Datasphere looks like this:

Which is an absolute nightmare. Extracting the data in SQL views (graphical or otherwise) will be substring-hell, as there are no JSON interpreters in the HANA SQLScript.

Step 4: Add a Script Operator

Alright, now let's transform the data with a Python Script.

Firstly, add a script to your canvas and connect the source.

Step 5: Add the new columns to the script operator

The schema of the script operator is not dynamic. That means you will need to manually add the required columns to the operator:

Step 6: Reorder the columns

The absolute 'funniest' quirk is that the output of the script is based on the order of the columns, not the names / keys.

That means, if your resulting DataFrame has this schema:

ID

Name

Address_Street

Address_City

Address_State

Address_ZipCode

Country

Concurrency

The columns must be in the same order.
Otherwise the value of ID will be written to the "State" column.
Fun stuff 🙂

Drag and drop until your columns are in the required order:

Note: This problem may very well be solved in the future, or it may be working-as-designed and always just be a massive pain in the donkey.

Step 7: Let's code away!

Alrighty, so now we have the setup in place, we can begin to create a Python script to solve our problem.

Script goals

The Script, in simple terms, have to do two things:

Take the JSON string from the input, and convert it to a dictionary or a list of dictionaries.
Read the values from the result of step 1 and write them to the dataframe.

Limitations

However, the Python Environment in Datasphere is highly restricted and does not allow for additional imports.
At the time of writing, the following libraries are supported:

time
dateutil.parser
datetime
calendar
random
math
re
Pandas (except I/O, like read_csv, write csv, ect.)
Numpy (Except I/O like read_file, from_file ect.)
buildints (Except I/O like print, breakpoint, input, open, import, exec, eval, ect.)

You can see the full list of supported and disabled features in the documentation.

Step 7.1: The code editor

The base code editor looks like this:

As your may notice, the script takes the form of a function - Transform - that takes one variable: data. The source data will be read just like you see it in the Data Preview, as a Pandas DataFrame called 'data'.
A section has been marked in two strings of #, which is where you will add your code.

Note: At the time of writing, the editor has no code-completion, any sort of help or utilities, and does not allow you to place breakpoints or write an output to the console - or anywhere.
This makes debugging extremely tedious.

Step 7.2: Script - Convert string to a Dictionary or List object.

The JSON data is stored in Datasphere as a string, and we have to transform that without the JSON module, which would make the exercise trivial.

So, will do the following:

Find all the 'key' : 'value' pairs in the data - make sure we can also handle nested dictionaries.
Loop through the pairs and add them to a dictionary.

To accomplish step 1, we will use regular expressions that can identify the pairs. The following formats are supported:

'Key' : 'Value' (both with and without whitespaces around the : is supported)
"key" : "Value" (both with and without whitespaces around the : is supported)

That is, whether your data uses ' or " is okay.

r'["\'](\w+)["\']\s*:\s*(\{.*?\}|["\'].*?["\']|\d+)'

Yes, I know it looks like a pocket-calculator having a stroke, but it works.
You can use the regex-tester to play around with it for yourself.

Python allows a function to be defined within another function, so we can begin to package our logic into a function.

def process_dict(dict_str):
    # Assume the input string is a dictionary and parse it
    key_val_pattern = r'["\'](\w+)["\']\s*:\s*(\{.*?\}|["\'].*?["\']|\d+)'  # Matches key-value pairs, supporting ' and "
    matches = re.findall(key_val_pattern, dict_str)

The Variable "matches" will contain a list of all the pairs in the input.

Now, we can loop through the matches and create our dictionary:

def process_dict(dict_str) -> dict:
    # Assume the input string is a dictionary and parse it
    key_val_pattern = r'["\'](\w+)["\']\s*:\s*(\{.*?\}|["\'].*?["\']|\d+)'  # Matches key-value pairs, supporting ' and "
    matches = re.findall(key_val_pattern, dict_str)
    
    parsed_dict = {}
    for key, value in matches:
        # Loop through and sotres keys as lower case, to support case-sentitivity. 
        if value.startswith('{'):  # If the value is a nested dictionary
            # Recursively process the nested dictionary
            parsed_dict[str(key).lower()] = process_dict(value)
        elif value.startswith('"') or value.startswith("'"):  # If the value is a string. 
            parsed_dict[str(key).lower()] = value.strip('"').strip("'")  # Remove quotes (either ' or ")
    
    return parsed_dict

Note in line 9, we which for a nested dictionary and recursively handle that.

If we pass this string

"{'ID': 0, 'Name': 'Exotic Liquids', 'Address': { 'Street': 'NE 228th', 'City': 'Sammamish', 'State':  WA', 'ZipCode': 98074, 'Country': 'USA' }, 'Concurrency': 0 }"

We will get the following result:

{'id': 0, 'name': 'Exotic Liquids', 'address': { 'street': 'NE 228th', 'city': 'Sammamish', 'state':  WA', 'zipcode': 98074, 'country': 'USA' }, 'concurrency': 0 }

Which we can then use to get the data we need.

Note: The full script also contains handling for a list of dictionaries.

Step 7.3: Add the data to the DataFrame

Now that you have your data in a variable, you can manipulate the DataFrame.

There is an infinite way of implementing this, but the most readable is a new function that uses the one above, and extracts the required info column by column.
There are faster ways, but for the sake of the example, readability > speed.

def get_address_data(row, newcolumn : str):
    address = row["Address"]
    address_info = process_dict(address)
    return_value = address_info["address"].get(newcolumn.lower(),'')
    return return_value

data["State"] = data.apply(lambda row: get_address_data(row, "State"), axis = 1)

In line 7, we create a new column, State and pass the entire row, and a value we want out.

Python now iterates through the dataframe and applies the function.

Step 8: The full script

The full script will then look like this.

def transform(data):
    """
    This function body should contain all the desired transformations on incoming DataFrame. Permitted builtin functions
    as well as permitted NumPy and Pandas objects and functions are available inside this function.
    Permitted NumPy and Pandas objects and functions can be used with aliases 'np' and 'pd' respectively.
    This function executes in a sandbox mode. Please refer the documentation for permitted objects and functions. Using
    any restricted functions or objects would cause an internal exception and result in a pipeline failure.
    Any code outside this function body will not be executed and inclusion of such code is discouraged.
    :param data: Pandas DataFrame
    :return: Pandas DataFrame
    """
    #####################################################
    def parse_string_to_list(data_str):
        result_list = []
        if ( data_str in ['','-','NULL',None] ):
            return result_list
        # Check if the string represents a list
        if data_str.startswith('['):
            # Extract individual elements of the list using regex
            # Matches nested dictionaries or flat dictionaries
            list_pattern = r'\{[^{}]*\{[^{}]*\}[^{}]*\}|\{[^{}]*\}'  
            list_items = re.findall(list_pattern, data_str)
        else:  
        # If it's not a list, treat the whole string as a single dictionary
            list_items = [data_str]

        # Step 2: Process each item in the list (or the single string)
        for item in list_items:
            result_list.append(process_dict(item))

        return result_list

    def process_dict(dict_str) -> dict:
        # Assume the input string is a dictionary and parse it
        key_val_pattern = r'["\'](\w+)["\']\s*:\s*(\{.*?\}|["\'].*?["\']|\d+)'  # Matches key-value pairs, supporting ' and "
        matches = re.findall(key_val_pattern, dict_str)
        
        parsed_dict = {}
        for key, value in matches:
            # Loop through and sotres keys as lower case, to support case-sentitivity. 
            if value.startswith('{'):  # If the value is a nested dictionary
                # Recursively process the nested dictionary
                parsed_dict[str(key).lower()] = process_dict(value)
            elif value.startswith('"') or value.startswith("'"):  # If the value is a string. 
                parsed_dict[str(key).lower()] = value.strip('"').strip("'")  # Remove quotes (either ' or ")
        
        return parsed_dict

    def get_address_data(row, newcolumn : str):
        # Get the value of the "Address" column
        address = row["Address"]
        # Process the input and read the first element of the list
        address_info = parse_string_to_list(address)[0]
        # get the requested value, and return an empty-string 
        return_value = address_info["address"].get(newcolumn.lower(),'')
        
        return return_value

    data["State"] = data.apply(lambda row: get_address_data(row, "State"), axis = 1)
    data["Zip"] = data.apply(lambda row: get_address_data(row, "zipcode"), axis = 1)
    data["Street"] = data.apply(lambda row: get_address_data(row, "Street"), axis = 1)
    data["Country"] = data.apply(lambda row: get_address_data(row, "Country"), axis = 1)
    data["City"] = data.apply(lambda row: get_address_data(row, "City"), axis = 1)
    #####################################################
    return data

Step 9: Add or select target table and run the Data Flow

When you've added a target local table, you can run the data flow, and you should get the following result:

Conclusion

The support for nested OData entities, and python code, in Datasphere is lacking, and will hopefully be expanded greatly in the future.
Whether you have the option to directly change the OData source, or you are using a public service, you should really consider some sort of middleware between the source and Datasphere.

Until then, you do have the option to manually transform JSON, stored as strings, in Datasphere, but this can be a massive headache.

By Category

Related Content

Activity Groups

Industry Groups

Influence and Feedback Groups

Interest Groups

Location Groups

Customer Only Groups

Forums

Related Resources

Products

Learning and Support

About

My Account

My Account

SAP Datasphere - Data flow scripts and generic OData - Unpacking nested values

Introduction

Summary

Prerequisites

Disclaimer: I don't like this solution any more than you do

Step 1: Establish the connections

Step 2: Consume /Suppliers in a Data Flow

Step 3 (Optional) : Understanding the problem

Step 4: Add a Script Operator

Step 5: Add the new columns to the script operator

Step 6: Reorder the columns

Step 7: Let's code away!

Script goals

Limitations

Step 7.1: The code editor

Step 7.2: Script - Convert string to a Dictionary or List object.

Step 7.3: Add the data to the DataFrame

Step 8: The full script

Step 9: Add or select target table and run the Data Flow

Conclusion