Technology Blog Posts by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
henri_hosang
Product and Topic Expert
Product and Topic Expert
3,059

Querying and Managing SAP Datasphere with Python, Postman, Open SQL and the Command Line Interface

 

 

Introduction

This blog post aims to provide an overview of different external tooling options for SAP Datasphere as these resources are scattered across different Help pages, the Business Accelerator Hub and other community Blogs. This blog post does not aim to be exhaustive or cover every detail, but lists the different possibilities to perform actions in Datasphere or create, read, update or delete objects and data using external tools like Postman, Python, the CLI or open SQL.

Here is an overview of Topics covered in the Blog Post:

  1. Rest API: TLS server certificates API; Connections API; Data Sharing Cockpit API and SCIM 2.0 API for user management
  2. Command Line Interface: Manage User Access, Spaces, Modeling Objects, the Data Marketplace, Tasks and Task Chains as well as Connectivity
  3. OData API: Get assets from the SAP Datasphere Catalog; Consume the datasets and metadata from consumable data assets.
  4. Open SQL schema and ODBC/JDBC: Query the SAP HANA Cloud database with database users using SQL Statements like SELECT; CREATE; UPDATE; INSERT and ALTER Tables and Views.

Every Agenda Item is split into two sections: What? and How? The first section explains which use cases are supported by the shown technology and provides links to the relevant documentation. Second, a simple example is shown for each technology using the appropriate tools like Postman, Python, SQL or the CLI. This example can then easily be adapted and extended for future options explained in the What? section following the documentation. Additionally further links are provided to gain a deeper understanding of possible use case scenarios. Often the same action can be achieved by multiple options. E.g. It is possible to create and list connections via the CLI or via the REST API.

Of course, there is also the option to integrate SAP Datasphere directly with third party applications via e.g. OData or ODBC/JDBC connections or pushing the data to target systems like AWS S3 or GCP Cloud Storage using Replication Flows. However, these options are not part of this blog post, because application specific scenarios must be considered.

 

 

REST API

REST APIs are based on a standard architecture that uses HTTP methods like GET, POST, PUT, and DELETE. They allow you to perform actions in Datasphere regarding User & Role Management, Connection & Certificate Management and the usage of the Data Sharing Cockpit. You can call them via an API Platform such as Postman or via Programming Languages like Python or Type Script.

What?

How?

Using the REST API involves generally two steps.

  1. Creating an OAuth 2.0 Client to Authenticate Against SAP Datasphere
  2. Using Postman or another technology to obtain an access token and then calling the REST API with the access token

1 Creating an OAuth 2.0 Client to Authenticate Against SAP Datasphere

To create an OAuth2.0 Client users need the DW Administrator role. Under System -> Administration -> App Integration a new OAuth Client can be added.

henri_hosang_0-1744874906487.png

In the OAuth Client configuration enter a name and choose the following settings:

  1. Purpose: API Access
  2. Access: Select the appropriate access (e.g. User Provisioning if you want to use the SCIM 2.0 API)
  3. Security: Client Credentials
  4. Token Lifetime

Click Add and copy the Client ID and Client Secret from the next screen (the client secret can only be copied now and you need to create a new client if you lose it!).

henri_hosang_1-1744874906492.png

Additionally, please copy the Authorization URL and Token URL from the App Integration overview as they are needed later to authenticate.

henri_hosang_2-1744874906497.png

2 Using Postman or another technology to obtain an access token and calling the REST API

Now we can use the Authorization URL, Token URL, Client ID and Client Secret to first obtain an access token and then call the REST API that we need.

The next steps will be shown in (1) Postman and (2) Python as two alternative approaches

(1) Postman: Create a collection and a new GET request within that collection. Provide the copied token URL and add grant_type=client_credentials in the Parameters.

henri_hosang_3-1744874906500.png

Then switch to the Authorization tab use Authorization type Basic Auth and enter the Client ID as username and the client secret as password. You can now send the request and get the access_token with its lifetime as result. Copy the token for the next step.

henri_hosang_4-1744874906512.png

Now we can call the REST API endpoint as documented in the resources linked in the What? Section (mainly https://api.sap.com/package/sapdatasphere/rest). For this simple example we will just get a list of connections from one of the Datasphere space by calling this endpoint:

henri_hosang_5-1744874906513.png

Follow these steps:

  1. Create a new GET request in Postman
  2. Enter the URL https://<host>/api/v1/datasphere/spaces/<spaceId>/connections
    • Host refers to the URL of your Datasphere Host; it can be copied from the browser. Copy everything until the first “/”
    • henri_hosang_6-1744874906515.png
    • The space ID is the ID of the space from which you want to get the connections. You can see available spaces if you navigate to Space Management in Datasphere.
  3. Add x-sap-sac-custom-auth=true in the Headers section in Postman
  4. Specify the Auth Type as Bearer Token by using the Token from the previous step.

henri_hosang_7-1744874906519.png

Once you send the request you get a list of all connections from that space returned. You can now adapt this example to use any other REST API mentioned above by simply changing the URL + HTTP Method to the one specified in the documentation and adding relevant parameters.

Note: If you want to use the SCIM 2.0 API to create, modify or delete users you also need a so called CSRF Token by calling https://<host>/api/v1/csrf with the obtained access token and x-sap-sac-custom-auth=true and x-csrf-token=fetch as Headers.

 

(2) Python: The steps in Python are like the ones in Postman, using the same credentials and parameters. First an access_token needs to be obtained and then the connections API is called.

To simplify the scenario the Authorization URL, Token URL, Client ID and Client Secret are stored in global variables. However, in a productive scenario a secret store or environment variables should be used instead.

For this demonstration Python 3.9 is used. The only import that is needed is the requests library to make the API calls. Additionally, the credentials are stored as variables as mentioned. That is all for the setup.

import requests

token_url = xxx
username = xxx
password = xxx

To get the list of connections again two calls are made. (1) to get the access_token and (2) to the actual connections API endpoint. Each call is a function.

The get_token function specifies the authentication context using the client ID as username and the client secret as password. Via the request library the token_url is called with the defined authentication context. If the call was successful (code 200) the access_token is read from the API response.

def get_token():
    
    # Use basic authentication with username and password
    auth = requests.auth.HTTPBasicAuth(username, password)
    
    # API call
    response = requests.get(token_url, auth=auth)
    
    # Check result and return
    if response.status_code == 200:
        return response.json()['access_token']
    else:
        print("HTTP Error occurred", response.status_code)

In the second function the first function is called to get the access_token. Then it is passed on with the header x-sap-sac-custom-auth=true to the URL https://<host>/api/v1/datasphere/spaces/<spaceId>/connections as shown in the postman section.

def get_connections():
    connection_url = xxx

    # Get Token
    bearer_token = get_token()
    
    # Define headers and use bearer  token as authentication method
    headers = {
        "Authorization" : f"Bearer {bearer_token}",
        "x-sap-sac-custom-auth" : "true",
        "Content-Type": "application/json"
    }
    
    # API call
    response = requests.get(connection_url, headers=headers)
    
    # Check result and return
    if response.status_code == 200:
        return response.json()
    else:
        print("HTTP Error occurred", response.reason)

As with the Postman option you can now adapt this example to use any other REST API mentioned above by simply changing the URL + HTTP Method to the one specified in the documentation and adding relevant parameters.

To use the SCIM API, first a csrf token needs to be generated and then send in combination with the bearer token as headers to the Endpoint for deleting, modifying or creating users in the system. Instead of calling the API Endpoints directly from requests, a session needs to be created via requests.Session() to obtain the csrf token and call the create user Endpoint from one session. 

 

 

Command Line Interface (CLI)

The Command Line Interface is a business user friendly toolset to achieve a variety of tasks in Datasphere without needing to write any code or use API platforms. Datasphere end users can run simple one-line statements in the command line after authenticating to perform admin tasks as well as to work with modeling objects in Datasphere.

What?

How?

To use the command line with SAP Datasphere it is recommended to use an OAuth 2.0 Client with interactive usage. The setup is quite similar to the one shown above for REST API, but some parameters have to be set up differently.

Again, to create an OAuth2.0 Client users need the DW Administrator role. Under System -> Administration -> App Integration a new OAuth Client can be added.

henri_hosang_8-1744874906534.png

In the OAuth Client configuration different settings are used to use that Client with the CLI instead of REST API:

  1. Purpose: Interactive Usage
  2. Authorization Grant: Authorization Code
  3. Redirect URI: This is the URI the user will be redirected to after authorization. For the Command Line Interface, you can simply start a localhost server on your machine using http://localhost:8080
  4. Token Lifetime
  5. Refresh Token Lifetime

Click Add and copy the Client ID and Client Secret from the next screen (the client secret can only be copied now and you need to create a new client if you lose it!).

henri_hosang_9-1744874906538.png

Additionally, please copy the Authorization URL and Token URL from the App Integration overview as they are needed later to authenticate.

henri_hosang_10-1744874906544.png

 

Using the Command Line Interface directly

To use the CLI for Datasphere Node.js >= 18 and <= 22 as well as npm >= 8 and <= 10 need to be installed. Npm is automatically installed with Node.js. Node.js can be downloaded from https://nodejs.org/

You can test the installation by running the following commands in your command line

$ node -v
$ npm -v

Then run the following command to install the datasphere related commands:

$ npm install -g /datasphere-cli

Some packages will be installed and you can check the successful installation by running

$ datasphere –version

Here is the summary of the installation commands:

henri_hosang_11-1744874906546.png

Next step is to log in to Datasphere. As a best practice, make sure to always clean host, cache and secrets before logging in again. However, if this is the first time you are using the CLI no credentials will be available. Run these commands

$ datasphere config host clean
$ datasphere config cache clean
$ datasphere config secrets reset

Then log in to Datasphere using the datasphere login command. You will be prompted with the necessary credentials for log in. For this step you need the host URL you see when opening Datasphere in your browser (see above). Additionally, client ID and client secret that are copied from the OAuth client are needed.

$ datasphere login
✔ URL of the system to connect to: … <host>
✔ Please enter your client ID: … <client ID>
✔ Please enter your client secret: … <client secret>

After entering the client secret a browser window to the redirect URI provided in the OAuth Client will open and the log in will be automatically handled. You can continue in the CLI, where you are logged in now. Just start running commands to Datasphere documented in the What? Section above. Here is a simple example to get all the spaces in your Datasphere Tenant.

$ datasphere spaces list -H <host>

Here are all commands:

henri_hosang_12-1744874906548.png

Note: For now you need to provide the host URL of the Datasphere Tenant via the -H option in every command. But the host can also be set as default and then it must not be used in every statement:

$ datasphere config host set <host>

And now you can run

$ datasphere spaces list

Once you are done, log out from Datasphere again

$ datasphere logout

henri_hosang_13-1744874906549.png

There are many more options when using the CLI to work with Datasphere. E.g. the credentials can be stored in a secrets file, so you don’t have to paste them every time you log in. Additionally, a refresh token can be extracted once you are logged in and passed on when running a command, so you do not have to log in at the beginning of every session. Please refer to the documentation for more details.

 

Using the Command Line Interface within a scripting language

While this introduction uses the CLI directly, it should be mentioned that users can also use scripting languages like Python to automate the CLI usage by calling the commands directly from their code. In Python, the subprocess library is used to call CLI commands from code. This is a simple example to log in and list all available spaces in Datasphere. The credentials are stored in a JSON file:

import subprocess

subprocess.run("datasphere config host clean", shell=True)
subprocess.run("datasphere config cache clean", shell=True)
subprocess.run("datasphere config secrets reset", shell=True)

subprocess.run("datasphere login --host <host> --options-file ./dsp_cli_secrets.json", shell=True)

subprocess.run("datasphere spaces list -H <host>", shell=True)

Using this option is handy if you want to automate running CLI commands and chaining multiple commands together. An example would be to first get all spaces, then use a for loop to get all connections per space and use another command to validate all the connections.

Here is a great blog post that shows how you can combine the power of CLI and Python to generate views in Datasphere: https://community.sap.com/t5/technology-blogs-by-sap/sap-datasphere-view-generation-with-python-and-...

 

 

OData API

So far REST APIs and the CLI are shown to perform certain actions in Datasphere. But what if you want to consume and report on data within the Datasphere Tenant? In that case, you can use the next two options: OData API and the OpenSQL schema can be used.

What?

How?

As mentioned, the OData API is mainly used to consume objects from Datasphere. It can be accessed directly from the browser, via an API Platform like Postman, a scripting language like Python or it can be used by 3rd party applications like SAP Analytics Cloud and PowerBI to consume data from Datasphere in a reporting scenario.

 

Using the Browser to consume OData Requests

If you are privileged to see objects in Datasphere you can directly access the OData API from the browser, you are already logged in to Datasphere, since the log in context is just reused for the OData API without any additional setup. If there is e.g. an analytic model that you build and want to consume via OData, you can directly do so by just opening a new window in the browser and pasting the OData request URL. The OData request URL can be crafted yourself by referring to the documentation or you can use the Generate OData request available from the Datasphere UI for all objects exposed for consumption.

To generate the OData request, open the asset from the data builder. If it is exposed for consumption (Analytic Models are exposed by default; Views have a switch in the details pane to expose them) an icon appears in the header section under “Tools” to generate the OData request.

henri_hosang_14-1744874906550.png

A pop-up opens to customize the OData URL. In the top, there is a selection if the actual data of the object should be received or its metadata. Additionally, variables and query parameters can be defined – as shown below. If the generate OData request is opened to retrieve data and the default settings are used the OData request URL look like this:

  • Exposed Views: <host>/api/v1/dwc/consumption/relational/<space Id>/<object technical name>/<object technical name>
  • Exposed Analytic Models: <host>/api/v1/dwc/consumption/analytical/<space Id>/<object technical name>/<object technical name>

As one can see the relational URL is used for views while the analytical is used for analytic models. This is because analytic models allow for more features like restricted measures and exception aggregations that are processed like a multidimensional statement and are dependent on the aggregation state defined via the variables and parameters. The relational URL for views just receives the result in a row-by-row fashion. Here is a Blog post exploring the differences in more detail: https://community.sap.com/t5/technology-blogs-by-sap/sap-datasphere-analytical-and-relational-odata-...

Changes you do to the variables and query parameters are reflected in the OData request URL. Variables are defined during the objects modeling process. If a default value is set for a variable it is used by default. If no default value is set, you must set a value for the variable to call the OData request. On the other hand, query parameters are not defined in the modeling process but are used in only that specific OData request. Query parameters are standard URL parameters used to filter, sort and limit the result set. Here is an overview of the usable query parameters

  • $select – return only specified columns
  • $filter – restrict result according to the provided criteria
  • $orderby – sorts the result by the specified column
  • $count – returns the count of the number of records
  • $top – limits the number of returned records to <n>
  • $skip – excludes the first <n> items
  • sap-language – returns the data in the specified language (if available)

In this example an analytic model to monitor task chain runs is shown. It has one variable INCLUDE_FAILURES_ONLY with the default value YES and query parameters are set to show only task chain steps with replication flows, ordered by the end date of the task chain run and limited to show only the last 100 results.

henri_hosang_15-1744874906554.png

To see the result of the OData request you can click Preview or copy the URL to a new browser window. If everything is configured correctly a value array of objects is shown, where each object is one result row. As mentioned, this works without any additional setup since the log in context from your browser is reused.

henri_hosang_16-1744874906562.png

Note: There is the limitation, that a user can send a maximum of 300 OData requests per minute. Pagination is used by default with 50.000 records per page. Via $skip and $top parameters client-side pagination can be implemented.

 

Using Postman to consume OData Requests

So far, we have seen, how to consume data in Datasphere via the OData API directly from the browser without any additional setup. However, if you want to use a 3rd party tool like Postman or Python to call OData requests you have to setup an OAuth client under System -> Administration -> App Integration with Interactive usage and a redirect URI. Please refer to the How? section under Command Line Interface (CLI) as the setup is the same. Once you got the client id, client secret, authentication and token URL you can continue in Postman by creating a new GET request. Copy & Paste the URL that you generated via the generate OData request function in Datasphere. You will see that the query parameters are automatically shown in the Parameters section of Postman.

henri_hosang_17-1744874906570.png

The main step to use OData from Postman is to setup the Authorization correctly. Use Auth Type = OAuth 2.0 and header Prefix = Bearer (default).

henri_hosang_18-1744874906580.png

Then configure a New Token in the section below.

  • Grant type = Authorization Code
  • Callback URL = <Your Callback URL set in the OAuth client (http://localhost:8080)>
  • Auth URL = <Authorization URL copied from System -> Administration -> App Integration>
  • Access Token URL = <Token URL copied from System -> Administration -> App Integration>
  • Client ID = <Client ID copied from the OAuth client>
  • Client Secret = <Client secret copied from the OAuth client>
  • Client Authentication = Send as Basic Auth Header.

Then you have to click on “Get New Access Token”.

henri_hosang_19-1744874906594.png

Postman automatically handles the redirect and opens an embedded browser where you need to log in with your business user, using your normal Datasphere credentials. If the log in is successful, an Access Token Is generated, and you can click on Use Token.

henri_hosang_20-1744874906601.png

The Token has the lifetime defined in the OAuth client. After it is expired you just click on get new access Token again and use the new Token. Having the Token, you can now send the API request to consume an analytic model or exposed view. Here we consume the same analytic model as we did when we generated the OData request and opened the URL in the browser.

henri_hosang_21-1744874906619.png

That’s it for consuming data via OData in the browser and via Postman. Of course, you can also use Python or another language to replicate Postman’s behavior in handling the redirect URI and authorization against Datasphere by calling the authorization URL and handling the callback via the redirect URI. Since this approach involves a bit more coding than the other consumption options, I will publish the scenario in a separate Blog post. Please see my upcoming blog post [TBD].

 

 

HANA Database Explorer & Open SQL Schema

The last option to consume data from Datasphere via external tools is using SQL Statements to access the HANA Cloud Database underneath Datasphere directly instead of querying the objects in the modeling UI of Datasphere. Because SQL provides two-dimensional results in a row-by-row fashion, only views and tables can be consumed. Analytic models are multidimensional statements, so the OData API should be used to consume them instead. Third party tools like Tableau and Power BI use a JDBC/ODBC connection to the HANA Cloud to report on data exposed in views. This section shows how you can use this connection to consume and create objects of the HANA Cloud via the HANA Database Explorer and via Python.

What?

How?

To consume data via ODBC/JDBC you need either a database user or a database analysis user. The database user is limited to read from and/or write to an Open SQL schema with restricted access to the space schema whereas the Database analysis users have read only access to all space schemas (if configured).

To simplify the scenario, we will use a database analysis user for this introduction, because this user has default access to all the objects in Datasphere. To create a database analysis user, you need to be an administrator. Go to System -> Configuration -> Database Access -> Database Analysis User. Then click create to create a new user.

henri_hosang_22-1744874906627.png

The Database Analysis User starts with “DWCDBUSER#” and you have to provide a custom suffix for the user. Additionally enable Space Schema Access to also consume data that is available in the spaces of your datasphere tenant. Click Create

henri_hosang_23-1744874906629.png

After the user is created the database Host Name, Port, Password and Username are displayed. Copy all four credentials, as they are needed to log into the HANA cloud. In difference to OAuth Client used for the CLI and OData API you can simply request a new password for this user if you lose the old one.

henri_hosang_24-1744874906631.png

That is the setup for now if you want to access the HANA Cloud via the HANA Cockpit or HANA Database Explorer. To consume data in Datasphere, the HANA Database Explorer can be used. Open your HANA Database explorer (e.g. via Space Management -> Edit -> Database Access -> Select a Database User -> Open Database Explorer). To use the created Database Analysis user the HANA Database Instance has to be added again with the analysis user. Click on the “+” sign in the upper left corner, select SAP HANA Database as Instance type and paste the credentials from the user creation.

henri_hosang_25-1744874906637.png

You see the added connection in the upper left section of the HANA Database Explorer. The explorer helps you by automatically creating Select statements. To find an element to consume open the Instance and the Catalog Option. Choose Tables to consume Tables or Views to consume Views. By default, all Tables / Views from all schemas are shown. You can search for a specific Table / View or filter by schema. The schema option shows all schemas available on the HANA Cloud, but you can simply search for the name of one of your spaces that holds the object you want to consume.

In this example I select the table SalesOrders_TestUpload by selecting Tables from the catalog and filtering for my Space schema COE_EMEA_DW_DM and searching for test. By right clicking the table you have multiple options. To just consume the data via OpenSQL click “Generate SELECT Statement” and execute the statement. Data will be shown.

henri_hosang_26-1744874906648.png

Instead of using the HANA Database Explorer you can also consume data in the HANA Cloud from any other Database client (e.g. DBeaver) or from a scripting language like Python by defining a connection to that HANA Database. To consume assets from Python you first need to allowlist your environment’s external IP address. Go to System -> Configuration -> IP Allowlist -> Trusted IPs and add your external IP address. You can get your external IP Address by running the command curl ifconfig.me on Linux/macOS or opening a website like https://ifconfig.me/. If you are using a VPN client, investigate the settings of your client and check if the external IP address is provided there. For testing IP range 0.0.0.0/0 can be used.

henri_hosang_27-1744874906661.png

Then install the hdbcli extension module for Python. This defines the necessary API specification to directly send SQL queries to the HANA Cloud. You can find the documentation here: https://pypi.org/project/hdbcli/ and install it in the command line via the command:

$ pip install hdbcli

In a new Python file import the dbapi from hdbcli and define the connection with the four credentials from the Database Analysis user similarly to the HANA Database Explorer. Write a SQL command or copy the one generated from the HANA Database Explorer and execute it. Finally, run cursor.fetchall() to get a row-by-row result. Here is the full code

from hdbcli import dbapi	

conn = dbapi.connect(
    address="<HANA host>", 
    port=443, 
    user="DWCDBUSER#<suffix>", 
    password="<DB Analysis User Password>",
)

sql = 'SELECT * FROM "COE_EMEA_DW_DM"."SalesOrders_TestUpload"'
cursor = conn.cursor()
cursor.execute(sql)
cursor.fetchall()

Certainly, you can now start creating more complex SQL statements or working with the result set by e.g. passing the fetchall() command into a pandas DataFrame. As shown in the What? Section there are many more possibilities when working with the HANA Open SQL schema, like running SQL script procedures in Task Chains or executing HANA machine learning algorithms that are beyond this introductory blog post. Take a look at the articles linked in the What? Section to get you started with additional scenarios.

 

 

Conclusion

That’s it for this introduction to external access to Datasphere. By now you should have a good understanding of all the major ways to interact with Datasphere from tools like the CLI; Postman and Python. As mentioned in the beginning Replication Flows, OData and the ODBC/JDBC connection can also directly be used by 3rd party application to retrieve data from Datasphere. This was not part of this blog post as the configuration differs between all the possible targets. There is extensive documentation and there are many community blogs available to explain application specific setups.

If you have questions or noticed a scenario I didn’t cover, feel free to leave a comment below the blog post.

Cheers, Henri

5 Comments