cancel
Showing results for 
Search instead for 
Did you mean: 

SAP Datasphere CLI & Python: Exporting Modeling Objects to CSV Files for Each Artifact

vikasparmar88
Explorer
659

Introduction

In this blog post, we'll explore how to use Python alongside SAP Datasphere CLI to extract modeling objects and export them to CSV files. The script allows users to handle artifacts such as remote tables, views, replication flows, and more, for each space in SAP Datasphere.
This solution is particularly useful for automating repetitive tasks and ensuring structured data handling across different modeling objects

Prerequisites

Steps to install SAP Datasphere CLI: 

https://help.sap.com/docs/SAP_DATASPHERE/d0ecd6f297ac40249072a44df0549c1a/f7d5eddf20a34a1aa48d8e2c68... 

https://community.sap.com/t5/technology-blogs-by-sap/sap-datasphere-external-access-overview-apis-cl... 

Step-by-Step Process

Step 1: Prepare Login.Json file

Create OAuth Client with Purpose as Interactive Usage and Redirect URL as http://localhost:8080

Get the value of all below fields from the OAuth Client and prepare the Login.json file.

{
"client_id": "",
"client_secret": "",
"authorization_url": "",
"token_url": "",
"access_token": "",
"refresh_token": ""
}

 Step 2: Create Model_Object.py file with below code

dsp host : give URL of Datasphere Tenant.

secrets_file : Give Path of Login.json file.

import subprocess
import pandas as pd
import sys

def manage_Modeling_Object(Modeling_Object):
    # Step 1: Login to Datasphere using host and secrets file
    dsp_host = '<URL of Datasphere>'
    secrets_file = '<path>/Login.json'
    command = f'datasphere login --host {dsp_host} --secrets-file {secrets_file}'
    subprocess.run(command, shell=True)  # Execute the login command
    
    # Step 2: Retrieve a list of all spaces in JSON format
    command = ['datasphere', 'spaces', 'list', '--json']
    result_spaces = subprocess.run(command, capture_output=True, shell=True, text=True)  # Run the command and capture output
    
    # Step 3: Parse the list of spaces from the command's output
    spaces = result_spaces.stdout.splitlines()  # Split output into individual lines
    
    ModelingObject_data = []  # Initialize a list to store Modeling Object data
    
    # Step 4: Check if the Modeling Object is 'spaces'
    if Modeling_Object == 'spaces':
        for space in spaces:
            if space == "[" or space == "]":
                continue  # Skip brackets in the JSON output
            space_id = space.strip()  # Extract space ID
            
            # Add space details to the data list
            ModelingObject_data.append({
                'Space ID': space_id.replace('"', '').replace(',', ''),
                'Technical Name': space_id.replace('"', '').replace(',', ''),
                'TYPE': Modeling_Object[:-1].upper()  # Set the TYPE as uppercase version of the input Modeling Object name
            })
    
    # Step 5: Process Modeling Objects for each space
    else:
        for space in spaces:
            if space == "[" or space == "]":
                continue  # Skip brackets in the JSON output
            space_id = space.strip()  # Extract space ID
            
            # Step 6: Retrieve Modeling Objects for the current space
            command = ['datasphere', 'objects', Modeling_Object, 'list', '--space', space_id.replace('"', '').replace(',', '')]
            result_ModelingObject = subprocess.run(command, capture_output=True, shell=True, text=True)  # Run the command
            
            # Step 7: Parse the Modeling Object data from the output
            ModelingObject_info = result_ModelingObject.stdout.splitlines()  # Split output into individual lines
            print("Checking "+Modeling_Object.upper()+" for space : "+space_id.replace('"', '').replace(',', ''))  # Log the space being checked
            
            # Step 8: Process each Modeling Object
            if len(ModelingObject_info) > 1:
                for flow in ModelingObject_info:
                    if '{' in flow or '}' in flow or '[' in flow or ']' in flow:
                        continue  # Skip brackets or braces in the output
                    cleaned_flow = flow.replace('"technicalName":', '').replace('"', '').strip()  # Clean up the output
                    
                    # Step 9: Add Modeling Object details to the data list
                    ModelingObject_data.append({
                        'Space ID': space_id.replace('"', '').replace(',', ''),
                        'Technical Name': cleaned_flow,
                        'TYPE': Modeling_Object[:-1].upper()  # Set the TYPE as uppercase version of the input Modeling Object name
                    })
    
    # Step 10: Write the collected data into a CSV file
    if ModelingObject_data:
        df = pd.DataFrame(ModelingObject_data)  # Create a DataFrame from the data list
        df.to_csv(Modeling_Object.upper()+'.csv', index=False)  # Save the DataFrame to a CSV file without the index
        print("Space vise all "+Modeling_Object.upper()+" have been written to "+Modeling_Object.upper()+".csv.")  # Log success message
    else:
        print("No Modeling Objects found.")  # Log message if no data was collected
    
    print('------------------------------------------------------------------------------------------------------------------------------------')  # Separator for readability
        
if __name__ == "__main__":
    # Check if an argument is provided via the command line
    if len(sys.argv) > 1:
        # Pass the first argument to the method
        manage_Modeling_Object(sys.argv[1])
    else:
        print("Please provide a Modeling Object name as an argument.")  # Log error message if no argument is provided
        
# Execute for predefined Modeling Objects
manage_Modeling_Object('remote-tables')
manage_Modeling_Object('local-tables')
manage_Modeling_Object('views')
manage_Modeling_Object('intelligent-lookups')
manage_Modeling_Object('data-flows')
manage_Modeling_Object('replication-flows')
manage_Modeling_Object('transformation-flows')
manage_Modeling_Object('task-chains')
manage_Modeling_Object('analytic-models')
manage_Modeling_Object('data-access-controls')

Step 3: Open command prompt and execute the Model_Objects.py file

vikasparmar88_0-1745646785952.png

Once the program execution is done it will generate CSV files for all the Datasphere artifactes mention in python code

vikasparmar88_1-1745646877909.png

each CSV file will have 3 columns : 

1) Space ID : Name of the space

2) Technical Name : Exact Technical Name of Object 

3) Type : Type of Object ( i.e view, local-table, remote-table, replication flw etc)

 

Conclusion

This script demonstrates how Python and SAP Datasphere CLI can collaborate to streamline artifact management and export data systematically. By following the steps provided, users can extend or adapt the code to suit their requirements.

View Entire Topic
Vitaliy-R
Developer Advocate
Developer Advocate
0 Kudos

Thank you for sharing. Have you considered going via APIs (instead of managing CLI commands) as an alternative?

https://api.sap.com/api/Catalog/resource/SAP_Data_Warehouse_Cloud_Consumption_Catalog

I haven't tried; therefore, I am interested to hear your opinion.

vikasparmar88
Explorer
Nope I haven't tried via api still but surely check this