In this blog post, we'll explore how to use Python alongside SAP Datasphere CLI to extract modeling objects and export them to CSV files. The script allows users to handle artifacts such as remote tables, views, replication flows, and more, for each space in SAP Datasphere.
This solution is particularly useful for automating repetitive tasks and ensuring structured data handling across different modeling objects
Prerequisites
Steps to install SAP Datasphere CLI:
Step-by-Step Process
Step 1: Prepare Login.Json file
Create OAuth Client with Purpose as Interactive Usage and Redirect URL as http://localhost:8080
Retrieve the necessary field values for the secret JSON file by running the following command.
datasphere config secrets showPrepare the Login_<TENANT>_DSP.json file for both DEV and PRD Tenants.
{
"client_id": "",
"client_secret": "",
"authorization_url": "",
"token_url": "",
"access_token": "",
"refresh_token": "",
"host": "",
"browser": ""
}
Step 2: Create Model_Object.py file with below code
dsp_host – Provide the URL for both DEV and PRD Datasphere tenants.
secrets_file – Specify the path to the Login_<TENANT>_DSP.json file.
import subprocess
import pandas as pd
import sys
import json
def login_to_datasphere(Tenant):
"""
Step 1: Logs into Datasphere using the appropriate credentials based on the tenant.
- If `Tenant` is 'PRD', it uses production credentials.
- Otherwise, it uses quality assurance credentials.
- It also ensures the session is fresh by logging out first.
"""
if Tenant == 'PRD':
dsp_host = '<PRD URL>'
secrets_file = '<PATH>/Login_PRD_DSP.json'
else:
dsp_host = '<DEV URL>'
secrets_file = '<PATH>/Login_Q_DSP.json'
# Logout first to ensure a fresh session
subprocess.run("datasphere logout", shell=True)
print(f'Set Host : datasphere config host set {dsp_host}')
# Configure the Datasphere host based on the environment
subprocess.run(f'datasphere config host set {dsp_host}', shell=True)
# Perform login using the secrets file
print(f'Login : datasphere login --host {dsp_host} --secrets-file {secrets_file}')
subprocess.run(f'datasphere login --host {dsp_host} --secrets-file {secrets_file}', shell=True)
print("Login to "+Tenant+" is successful")
# Step 2: Retrieve and manage predefined modeling objects
manage_Modeling_Object('spaces')
manage_Modeling_Object('local-tables')
manage_Modeling_Object('views')
manage_Modeling_Object('analytic-models')
manage_Modeling_Object('replication-flows')
manage_Modeling_Object('remote-tables')
manage_Modeling_Object('task-chains')
manage_Modeling_Object('transformation-flows')
manage_Modeling_Object('data-flows')
manage_Modeling_Object('intelligent-lookups')
manage_Modeling_Object('data-access-controls')
return dsp_host # Returning host info if needed in other operations
def fetch_modeling_objects(space_id, modeling_object="views", batch_size=200):
"""
Step 4: Retrieves modeling objects using pagination.
- Loops through available objects in batches of `batch_size` (default 200).
- Uses Datasphere CLI commands to retrieve data dynamically.
"""
all_objects = [] # List to store all retrieved modeling objects.
skip = 0 # Offset for pagination, increasing in multiples of 200 to retrieve objects in batches.
print(f"Checking {modeling_object.upper()} for space: {space_id.replace('"', '').replace(',', '')}")
# Keep checking for objects unitil it return blank data
while True:
# Fetch modeling objects in batches using CLI
command = f'datasphere objects {modeling_object} list --space {space_id} --top {batch_size} --skip {skip}'
subprocess.run(command, capture_output=True, shell=True, text=True)
result = subprocess.run(command, capture_output=True, shell=True, text=True)
try:
output = json.loads(result.stdout)
except json.JSONDecodeError:
print(f"Error decoding JSON: {result.stdout}")
break
if not output:
break # Stop when no more objects are found
# Clean the fetched data to write in csv file
for obj in output:
cleaned_flow = obj.get("technicalName", "").strip()
all_objects.append({
'Space ID': space_id,
'Technical Name': cleaned_flow,
'TYPE': modeling_object[:-1].upper()
})
skip += batch_size # Increment batch offset
return all_objects
def manage_Modeling_Object(Modeling_Object):
"""
Step 3: Retrieves all spaces in tenant.
- Loops through available spaces and get the models for each space.
- prepate csv file with all retrived data.
"""
# Retrieve a list of all spaces in JSON format
command = ['datasphere', 'spaces', 'list', '--json']
result_spaces = subprocess.run(command, capture_output=True, shell=True, text=True) # Run the command and capture output
# Parse the list of spaces from the command's output
spaces = result_spaces.stdout.splitlines() # Split output into individual lines
ModelingObject_data = [] # Initialize a list to store Modeling Object data
# Check if the Modeling Object is 'spaces'
if Modeling_Object == 'spaces':
for space in spaces:
if space == "[" or space == "]":
continue # Skip brackets in the JSON output
space_id = space.strip().replace('"', '').replace(',', '') # Extract space ID
# Add space details to the data list
ModelingObject_data.append({
'Space ID': space_id.replace('"', '').replace(',', ''),
'Technical Name': space_id.replace('"', '').replace(',', ''),
'TYPE': Modeling_Object[:-1].upper() # Set the TYPE as uppercase version of the input Modeling Object name
})
# Process Modeling Objects for each space
else:
for space in spaces:
if space == "[" or space == "]":
continue # Skip brackets in the JSON output
space_id = space.strip().replace('"', '').replace(',', '') # Extract space ID
# Get all the objects from the given space
objects = fetch_modeling_objects(space_id, Modeling_Object)
ModelingObject_data.extend(objects)
# Write the collected data into a CSV file
if ModelingObject_data:
df = pd.DataFrame(ModelingObject_data) # Create a DataFrame from the data list
df.to_csv(Modeling_Object.upper()+'.csv', index=False) # Save the DataFrame to a CSV file without the index
print("Space vise all "+Modeling_Object.upper()+" have been written to "+Modeling_Object.upper()+".csv.") # Log success message
else:
print("No Modeling Objects found.") # Log message if no data was collected
print('------------------------------------------------------------------------------------------------------------------------------------') # Separator for readability
if __name__ == "__main__":
# Check if two arguments are provided via the command line
if len(sys.argv) > 1:
Tenant = sys.argv[1]
login_to_datasphere(Tenant)
else:
print("Please provide Tenant as argument.") # Log error message if argument is missing
Step 3: Open command prompt and execute the Model_Objects.py file
python Model_Objects.py DEVOnce the program execution is done it will generate CSV files for all the Datasphere artifactes mention in python code
Each generated CSV file will contain three columns:
This script demonstrates how Python and SAP Datasphere CLI can collaborate to streamline artifact management and export data systematically. By following the steps provided, users can extend or adapt the code to suit their requirements.
Best regards,
Vikas Parmar
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
| User | Count |
|---|---|
| 27 | |
| 24 | |
| 13 | |
| 13 | |
| 12 | |
| 12 | |
| 11 | |
| 11 | |
| 11 | |
| 10 |