Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
Sefan_Linders
Product and Topic Expert
Product and Topic Expert
7,912

Welcome to this technical blog post where we'll delve into generating SAP Datasphere SQL views from SAP Datasphere remote tables, and subsequently deploying them into SAP Datasphere using the SAP Datasphere Command-Line Client (CLI). Most of the text in this blog consists of the actual Python code that accomplishes this task.

In a next blog, we look at generating Transformation Flows with partitions.


Changelog:
  • 16/5/2023: Initial version
  • 19/7/2023: Changed from dwc to datasphere commands
  • 6/11/2023: Adapted to DSP CLI command changes
  • 6/5/2024: Moved credentials to file, added transformation flow partitioning feature

This last change has not been added to the code below, instead I created a dedicated folder in the following Git repo: https://github.com/SAP-samples/btp-global-center-of-excellence-samples/blob/main/DSP_object_generati...

May 15, 2024: Please note that the code in this blog is outdated, and does not make use of the latest CRUD feature set of the SAP Datasphere CLI. Even though the code still works when I last tested it, I advise you to take a look at the Git repo which contains the updated code. 

Use case

Let's set the stage with some background. Recently, I handled a case where a customer used SAP Datasphere to create a considerable amount of remote tables. Apart from modelling these tables within SAP Datasphere, the objective was to make them accessible to external tools as well, using a database user that connects to the Open SQL schema. Another requirement was that HDI containers would be able to read these remote tables.However, remote tables, at time of writing, cannot directly be exposed to the Open SQL schema or to an HDI container. The option simply does not exist in the UI to provide these privileges to Open SQL schema users. A solution to this is to encapsulate each remote table within a simple view and make these accessible to the Open SQL schema. This solution, although effective, needs to be repeated manually for hundreds of tables, and would need to be redone if an underlying remote tables were modified. Therefore we turned to object generation.

Object generation

All modelling objects in SAP Datasphere are stored as text-based objects, known as CSN (CDS Schema Notation) models. They can be exported and imported using the SAP Datasphere user interface, but also using the Command-Line Client that comes with the product. This allows us to solve the use case, namely by taking a remote table CSN, and creating a brand-new view CSN from it. Since the remote table CSN already contains all metadata we need, all we need to do is write a script that does the actual conversion. This I've done with Python, of which the code is pasted below. Since there is a 1-1 relation between the SQL view and the remote table, the code isn't that hard to generate. Basically, we need to read the metadata from the remote table, which are the column names and some additional metadata, and write this into a view definition. A SQL view definition has the preference over a graphical view definition, as its structure is simpler and therefore easier to generate.

Pre-requisites

  • Create Database Analysis User (with Space Schema access) on Datasphere for connection to HDB, see this SAP Help page.
  • If you want to avoid a login process while running the code, adapt the code under "Logon procedure". See this SAP help page.
  • To use the hdbcli package, install using pip: pip3 install hdbcli

Last retest was with version 2023.19.0 of the Command-Line Interface. However, please use the latest version yourself, and do let me know if something stopped working ;-).

The code


Below the full Python code to generate SQL views from remote tables, and push these into SAP Datasphere. After inserting your SAP Datasphere connection info, you can basically run the script, open up your SAP Datasphere data builder file overview, and wait for the generated view definitions to pop up. Alternatively you can watch the log statements being printed to see what is going right or wrong ;-).A few additional notes:

  • You will have to adjust the static variables to connect to SAP Datasphere and the underlying SAP HANA Database;
  • The code contains a SQL query that selects all remote tables inside a space. Adjust this query if you just want to generate only a subset;
  • View definitions are pushed one by one. Technically objects can be combined in one space definition, but that might make the resulting CSN too large to be successfully processed by SAP Datasphere;
  • A wait function is implemented to allow for full deployment of the view before the next view is pushed. A product fix was brought out in May '23 and therefore the wait time has been set to 0, meaning that objects are now deployed in parallel;
  • Access to the remote table metadata is provided through the SAP HANA Database. This proved easier than fetching object definitions from reading the space definition;
  • In the code I deliberately used long names for providing the CLI options, as these are better readable and are static of nature, so that the code is upward version compatible.

 

 

# -----------------

# Introduction
# -----------------
# Datasphere (DSP) View generation based on Remote Tables
# This Python script generates design time SQL views in DSP for each Remote Table (RT) that 
# resides in a given space and then pushes the view definition to DSP via the DSP Command Line Interface.
# The view is "exposed" so that it can be accessed from the Open SQL schema
#
# Author: Sefan Linders
# Changelog: 
# - 16/5/2023: Initial version
# - 19/7/2023: Changed from dwc to datasphere commands
# - 6/11/2023: Adapted to DSP CLI command changes, wait function set to 0

# Prereqs
# - Install Datasphere CLI (npm install -g @sap/datasphere-cli)
# - Create Database Analysis User (with Space Schema access) on Datasphere for connection to HDB. https://help.sap.com/docs/SAP_DATASPHERE/9f804b8efa8043539289f42f372c4862/c28145bcb76c4415a1ec6265dd2a4c11.html?locale=en-US
# - Prepare DSP CLI secrets file according to https://help.sap.com/docs/SAP_DATASPHERE/d0ecd6f297ac40249072a44df0549c1a/eb7228a171a842fa84e48c899d48c970.html?locale=en-US#log-in-by-passing-oauth-client-information-in-a-secrets-file
# - If you want to avoid a login process, use the following instructions and adapt code under "Logon procedure": https://help.sap.com/docs/SAP_DATASPHERE/d0ecd6f297ac40249072a44df0549c1a/eb7228a171a842fa84e48c899d48c970.html?locale=en-US#avoid-running-a-login-command-by-extracting-access-and-refresh-tokens
# - To use the hdbcli package, install using pip: pip3 install hdbcli

# -----------------------------------------------------------
# Static variables, these you have to adjust
# -----------------------------------------------------------

# dsp
dsp_host = 'https://<your-host-url>'  # Placeholder for the SAP Datasphere URL
dsp_space = '<your-dsp-space>'  # Placeholder for the technical space name

# hdb
hdb_address='<your-hdb-address>'  # Placeholder for the SAP HANA database (hdb) address
hdb_port=443  # Port for the hdb, usually 443
hdb_user='<your-hdb-user>'  # Placeholder for the hdb user
hdb_password='<your-hdb-password>'  # Placeholder for the hdb password

# settings
view_suffix = '_V'  # Suffix for the generated views
secrets_file = '<path-to-your>/dsp_cli_secrets.json'  # Path for the secrets file
deploy_wait_time_in_seconds = 0 # wait time between object deployments, to prevent overload. After DSP fix May '23, this seems not needed anymore.
export_folder_path = '<path-to-your>/scripts/dsp_cli_view_generation/'  # Path to export folder

# -----------------------------------------------------------
# Package import
# -----------------------------------------------------------

import subprocess # For OS commands on DSP CLI
import json # For handling the CSN files which are in JSON format
import time # For wait function to let an object deploy finish before starting the following
from hdbcli import dbapi # To connect to SAP HANA Database underneath SAP Datasphere to fetch Remote Table metadata

# -----------------
# print versions of relevant components
# -----------------

command = 'node --version'
print(command)
subprocess.run(command, shell=True)

command = 'npm --version'
print(command)
subprocess.run(command, shell=True)

command = 'datasphere -version'
print(command)
subprocess.run(command, shell=True)

# -----------------------------------------------------------
# CLI logon procedure with oAuth authentication to DSP CLI
# -----------------------------------------------------------

# logout is needed to have the login consider new creds file, e.g., in case it is replaced with new client id/secret
command = f'datasphere logout'
print(command)
subprocess.run(command, shell=True)

# login
command = f'datasphere login --host {dsp_host} --secrets-file {secrets_file}'
print(command)
subprocess.run(command, shell=True)

# Optional command to debug or to get the access and refresh token to avoid login command (see header comments)
command = 'datasphere config secrets show'
print(command)
subprocess.run(command, shell=True)

# -----------------------------------------------------------
# Wait function, used to prevent overload of space deployment
# -----------------------------------------------------------

def wait(seconds):
    for i in range(seconds):
        if i > 0 and i % 10 == 0:
            print()
        print('.', end='', flush=True)
        time.sleep(1)
    print()

# -----------------------------------------------------------
# This function takes a Remote Table (RT) CSN as input and transforms it to a SQL View CSN
# -----------------------------------------------------------

def remote_table_to_view(rt_csn):
    rt_name = list(rt_csn['definitions'].keys())[0]
    view_name = rt_name + view_suffix

    elements = rt_csn["definitions"][rt_name]["elements"]
    elements_string = ', '.join(f'"{element}"' for element in elements)

    view_csn = {
        "definitions": {
            view_name: {
                "kind": "entity",
                "elements": elements,
                "query": {
                    "SELECT": {
                        "from": {"ref": [rt_name]},
                        "columns": [{"ref": [element]} for element in elements]
                    }
                },
                "@EndUserText.label": view_name,
                "@ObjectModel.modelingPattern": {"#": "DATA_STRUCTURE"},
                "@ObjectModel.supportedCapabilities": [{"#": "DATA_STRUCTURE"}],
                "@DataWarehouse.consumption.external": True,
                "@DataWarehouse.sqlEditor.query": f"SELECT {elements_string}\nFROM \"{rt_name}\""
            }
        }
    }
    
    return view_csn

# -----------------------------------------------------------
# Generates space csn (including view definition) with view csn as input
# -----------------------------------------------------------
def generate_space_csn_with_view(view_csn):
    space_csn = {}
    space_csn = { dsp_space : view_csn }
    return space_csn

# -----------------------------------------------------------
# Write space definition to csn file and return file name
# -----------------------------------------------------------
def write_space_csn(space_csn):
    space_csn_pretty = json.dumps(space_csn, indent=4)
    view_name = next(iter(space_csn[next(iter(space_csn.keys()))]["definitions"].keys()))
    space_csn_file = f'{export_folder_path}space_{dsp_space}_object_{view_name}.csn'
    with open(space_csn_file, 'w') as f:
        f.write(space_csn_pretty)
    return space_csn_file

# -----------------------------------------------------------
# Push view csn to DSP with space definition csn as input
# -----------------------------------------------------------
def push_space_csn_to_DSP(space_csn_file):
    command = f'datasphere spaces create --host {dsp_host} --space {dsp_space} --file-path {space_csn_file} --force-definition-deployment --verbose'
    print(command)
    subprocess.run(command, shell=True)

# -----------------------------------------------------------
# Fetch RT metadata from HANA DB, and for each RT create a view and push this to DSP
# -----------------------------------------------------------

# Connect to HDB
conn = dbapi.connect(
    address=hdb_address,
    port=hdb_port,
    user=hdb_user,
    password=hdb_password
)
cursor = conn.cursor()

# select statement to fetch remote table csn's. Selection on highest ARTIFACT_VERSION for each object.
st = f'''
    SELECT A.ARTIFACT_NAME, A.CSN, A.ARTIFACT_VERSION
    FROM "{dsp_space}$TEC"."$$DeployArtifacts$$" A
    INNER JOIN (
      SELECT ARTIFACT_NAME, MAX(ARTIFACT_VERSION) AS MAX_ARTIFACT_VERSION
      FROM "{dsp_space}$TEC"."$$DeployArtifacts$$"
      WHERE PLUGIN_NAME='remoteTable'
      GROUP BY ARTIFACT_NAME
    ) B
    ON A.ARTIFACT_NAME = B.ARTIFACT_NAME
    AND A.ARTIFACT_VERSION = B.MAX_ARTIFACT_VERSION;
'''
print('>>> SELECT statement to fetch remote table definitions')
print(st)
cursor.execute(st)

# Loop over the remote tables, create a view definition, and push its csn to DSP
rows = cursor.fetchall()
conn.close()
total_rows = len(rows)
print ('Total rows: ' + str(total_rows))

for i, row in enumerate(rows):
    rt_name = row[0]
    csn = row[1]
    
    # Load remote table csn and print it
    rt_csn = json.loads(csn)
    rt_csn_pretty = json.dumps(rt_csn, indent=4)
    print('>>> Remote table csn of: ' + rt_name)
    print(rt_csn_pretty)
    
    # Get view definition based on remote table and print it
    view_csn = remote_table_to_view(rt_csn)
    view_csn_pretty = json.dumps(view_csn, indent=4)
    print('>>> Generated view csn of: ' + rt_name)
    print(view_csn_pretty)
    
    # Generate space csn
    space_csn_new = generate_space_csn_with_view(view_csn)
    space_csn_new_pretty = json.dumps(space_csn_new, indent=4)
    print('>>> Generated space csn of: ' + rt_name)
    print(space_csn_new_pretty)
          
    # Write space csn to file
    space_file_write = write_space_csn(space_csn_new)
          
    # Push space csn to DSP
    push_space_csn_to_DSP(space_file_write)
    
    # Check if it's the last row
    if i < total_rows - 1:
        # Wait to prevent overload of space deployment
        print(f'waiting {deploy_wait_time_in_seconds} seconds for space deployer to finish previous deploy')
        wait(deploy_wait_time_in_seconds)

 

 

 

Conclusion

You could think of many scenario's in which you want to generate objects and push these into SAP Datasphere. In this blog I described a the method to generate SAP Datasphere SQL views from remote tables, and push these into the SAP Datasphere repository. I hope it helps you build your own generation scenario, and you got an idea of what is possible with a bit of code and the SAP Datasphere CLI.

8 Comments