Technology Blog Posts by Members
cancel
Showing results for 
Search instead for 
Did you mean: 
vikasparmar88
Participant
3,970

Introduction

In this blog post, we'll explore how to use Python alongside SAP Datasphere CLI to extract modeling objects and export them to CSV files. The script allows users to handle artifacts such as remote tables, views, replication flows, and more, for each space in SAP Datasphere.
This solution is particularly useful for automating repetitive tasks and ensuring structured data handling across different modeling objects

Prerequisites

Steps to install SAP Datasphere CLI: 

https://help.sap.com/docs/SAP_DATASPHERE/d0ecd6f297ac40249072a44df0549c1a/f7d5eddf20a34a1aa48d8e2c68... 

https://community.sap.com/t5/technology-blogs-by-sap/sap-datasphere-external-access-overview-apis-cl... 

Step-by-Step Process

Step 1: Prepare Login.Json file

Create OAuth Client with Purpose as Interactive Usage and Redirect URL as http://localhost:8080

Retrieve the necessary field values for the secret JSON file by running the following command.

datasphere config secrets show

 Prepare the Login_<TENANT>_DSP.json file for both DEV and PRD Tenants.

{
    "client_id": "",
    "client_secret": "",
    "authorization_url": "",
    "token_url": "",
    "access_token": "",
    "refresh_token": "",
    "host": "",
    "browser": ""
}

 

 Step 2: Create Model_Object.py file with below code

dsp_host – Provide the URL for both DEV and PRD Datasphere tenants.

secrets_file – Specify the path to the Login_<TENANT>_DSP.json file.

 

import subprocess
import pandas as pd
import sys
import json

def login_to_datasphere(Tenant):
    """
    Step 1: Logs into Datasphere using the appropriate credentials based on the tenant.
    - If `Tenant` is 'PRD', it uses production credentials.
    - Otherwise, it uses quality assurance credentials.
    - It also ensures the session is fresh by logging out first.
    """
    if Tenant == 'PRD':
        dsp_host = '<PRD URL>'
        secrets_file = '<PATH>/Login_PRD_DSP.json'
    else:
        dsp_host = '<DEV URL>'
        secrets_file = '<PATH>/Login_Q_DSP.json'

    # Logout first to ensure a fresh session
    subprocess.run("datasphere logout", shell=True)
    print(f'Set Host : datasphere config host set {dsp_host}')
    
    # Configure the Datasphere host based on the environment
    subprocess.run(f'datasphere config host set {dsp_host}', shell=True)
    
    # Perform login using the secrets file
    print(f'Login : datasphere login --host {dsp_host} --secrets-file {secrets_file}')
    subprocess.run(f'datasphere login --host {dsp_host} --secrets-file {secrets_file}', shell=True)
    
    print("Login to "+Tenant+" is successful")
        
    # Step 2: Retrieve and manage predefined modeling objects
    manage_Modeling_Object('spaces')
    manage_Modeling_Object('local-tables')
    manage_Modeling_Object('views')
    manage_Modeling_Object('analytic-models')
    manage_Modeling_Object('replication-flows')
    manage_Modeling_Object('remote-tables')
    manage_Modeling_Object('task-chains')
    manage_Modeling_Object('transformation-flows')
    manage_Modeling_Object('data-flows')
    manage_Modeling_Object('intelligent-lookups')
    manage_Modeling_Object('data-access-controls')

    return dsp_host  # Returning host info if needed in other operations


def fetch_modeling_objects(space_id, modeling_object="views", batch_size=200):
    """
    Step 4: Retrieves modeling objects using pagination.
    - Loops through available objects in batches of `batch_size` (default 200).
    - Uses Datasphere CLI commands to retrieve data dynamically.
    """

    all_objects = []  # List to store all retrieved modeling objects.
    skip = 0  # Offset for pagination, increasing in multiples of 200 to retrieve objects in batches.
    print(f"Checking {modeling_object.upper()} for space: {space_id.replace('"', '').replace(',', '')}")
    
    # Keep checking for objects unitil it return blank data
    while True:
       
        # Fetch modeling objects in batches using CLI
        command = f'datasphere objects {modeling_object} list --space {space_id} --top {batch_size} --skip {skip}'
        subprocess.run(command, capture_output=True, shell=True, text=True)
        result = subprocess.run(command, capture_output=True, shell=True, text=True)

        try:
            output = json.loads(result.stdout)
        except json.JSONDecodeError:
            print(f"Error decoding JSON: {result.stdout}")
            break

        if not output:
            break # Stop when no more objects are found
        
        # Clean the fetched data to write in csv file
        for obj in output:
            cleaned_flow = obj.get("technicalName", "").strip()
            all_objects.append({
                'Space ID': space_id,
                'Technical Name': cleaned_flow,
                'TYPE': modeling_object[:-1].upper()
            })

        skip += batch_size # Increment batch offset

    return all_objects

def manage_Modeling_Object(Modeling_Object):
    """
    Step 3: Retrieves all spaces in tenant.
    - Loops through available spaces and get the models for each space.
    - prepate csv file with all retrived data.
    """
    
    # Retrieve a list of all spaces in JSON format
    command = ['datasphere', 'spaces', 'list', '--json']
    result_spaces = subprocess.run(command, capture_output=True, shell=True, text=True)  # Run the command and capture output
    
    # Parse the list of spaces from the command's output
    spaces = result_spaces.stdout.splitlines()  # Split output into individual lines
    
    ModelingObject_data = []  # Initialize a list to store Modeling Object data
    
    # Check if the Modeling Object is 'spaces'
    if Modeling_Object == 'spaces':
        for space in spaces:
            if space == "[" or space == "]":
                continue  # Skip brackets in the JSON output
            space_id = space.strip().replace('"', '').replace(',', '')  # Extract space ID
            
            # Add space details to the data list
            ModelingObject_data.append({
                'Space ID': space_id.replace('"', '').replace(',', ''),
                'Technical Name': space_id.replace('"', '').replace(',', ''),
                'TYPE': Modeling_Object[:-1].upper()  # Set the TYPE as uppercase version of the input Modeling Object name
            })
    
    # Process Modeling Objects for each space
    else:
        for space in spaces:
            if space == "[" or space == "]":
                continue  # Skip brackets in the JSON output
            space_id = space.strip().replace('"', '').replace(',', '')  # Extract space ID
            
            # Get all the objects from the given space
            objects = fetch_modeling_objects(space_id, Modeling_Object)
            ModelingObject_data.extend(objects)
    
    # Write the collected data into a CSV file
    if ModelingObject_data:
        df = pd.DataFrame(ModelingObject_data)  # Create a DataFrame from the data list
        df.to_csv(Modeling_Object.upper()+'.csv', index=False)  # Save the DataFrame to a CSV file without the index
        print("Space vise all "+Modeling_Object.upper()+" have been written to "+Modeling_Object.upper()+".csv.")  # Log success message
    else:
        print("No Modeling Objects found.")  # Log message if no data was collected
    
    print('------------------------------------------------------------------------------------------------------------------------------------')  # Separator for readability
        
if __name__ == "__main__":
    # Check if two arguments are provided via the command line
    if len(sys.argv) > 1:
        Tenant = sys.argv[1]
        login_to_datasphere(Tenant)
    else:
        print("Please provide Tenant as argument.")  # Log error message if argument is missing

 

Step 3: Open command prompt and execute the Model_Objects.py file

python Model_Objects.py DEV

vikasparmar88_0-1748408323008.png

Once the program execution is done it will generate CSV files for all the Datasphere artifactes mention in python code

vikasparmar88_1-1745646877909.png

Each generated CSV file will contain three columns:

  1. Space ID – The name of the space.
  2. Technical Name – The exact technical name of the object.
  3. Type – The category of the object (e.g., view, local-table, remote-table, replication flow, etc.)

Conclusion

This script demonstrates how Python and SAP Datasphere CLI can collaborate to streamline artifact management and export data systematically. By following the steps provided, users can extend or adapt the code to suit their requirements.

Best regards,
Vikas Parmar

1 Comment