Technology Blog Posts by SAP
cancel
Showing results for 
Search instead for 
Did you mean: 
konradlange
Associate
Associate
1,847

Abstract

Integrating SAP Datasphere with Databricks opens up powerful possibilities for advanced analytics, machine learning pipelines, and cross-platform data processing.

In this guide, I'll share my experience automating SAP Databricks notebook execution using SAP Datasphere task chains and the Databricks Jobs API through a practical, step-by-step approach.

Common Use Cases:

  • Automated data processing pipelines across SAP and Databricks
  • Machine learning model training triggered by Datasphere data flows
  • Advanced analytics on SAP data using Databricks' compute power
  • Orchestrated cross-platform workflows with unified monitoring

Business Value:

  • Reduced manual intervention in data workflows
  • Faster time-to-insight with automated analytics
  • Unified orchestration across enterprise data platforms

Prerequisites

Before getting started, ensure you have:

  • SAP Datasphere access with permission to create connections and task chains
  • Databricks workspace with notebook development access
  • Databricks Personal Access Token (PAT) generation privileges
  • Basic understanding of REST APIs and HTTP methods
  • A Databricks notebook ready to be executed

Architecture Overview

The integration follows this flow:

┌─────────────────────┐
│  SAP Datasphere     │
│  Task Chain         │
└──────────┬──────────┘
           │
           │ HTTP POST Request
           │ (Jobs API)
           ▼
┌─────────────────────┐
│  Databricks         │
│  Jobs API           │
└──────────┬──────────┘
           │
           │ Triggers
           ▼
┌─────────────────────┐
│  Databricks         │
│  Notebook Run       │
└─────────────────────┘

Key Components:

  1. Datasphere Task Chain - Orchestration layer
  2. Generic HTTP Connection - API authentication setup
  3. Databricks Jobs API - /api/2.1/jobs/runs/submit endpoint
  4. PAT Authentication - Bearer token security
  5. Databricks Notebook - Target execution logic

Step 1: Create a Generic HTTP Connection in Datasphere

The first step is to establish a secure connection from Datasphere to Databricks. We'll use a Generic HTTP Connection with Bearer token authentication.

Navigate to Connections

In SAP Datasphere, go to Connections and create a new Generic HTTP connection.

Configure Connection Settings

Setting Value Notes

Connection Namedatabricks_rest_apiChoose a descriptive name
Host<your-workspace>.cloud.databricks.comYour Databricks workspace hostname
Port443Datasphere uses 443 (HTTPS) as default
ProtocolHTTPSAlways use secure connections
Path/api/2.1 (optional)Can be set here or in the API task later

konradlange_1-1773759968644.png

Authentication Setup

Databricks uses Bearer tokens (Personal Access Tokens / PATs) for API authentication. For simplicity in Datasphere, we'll configure this as a Username/Password authentication type:

Setting Value

Authentication TypeUsername and Password
Usernametoken
Password<Your Databricks PAT>

Note: The username is literally the string token. This is how Databricks interprets Bearer token authentication via basic auth.

Obtaining a Databricks Personal Access Token (PAT)

You'll need a PAT to authenticate your API calls. Here's how to generate one:

  1. In your Databricks workspace, click on your user profile in the top-right corner
  2. Select Settings
  3. Navigate to Developer under the User section
  4. Click on Manage next to Access Tokens

konradlange_2-1773760126615.png

  1. Click Generate new token
  2. Provide a description (e.g., "Datasphere Task Chain Integration")
  3. Set an expiration period (recommended: 90 days for security)
  4. Copy the generated token immediately as you won't be able to see it again!

konradlange_3-1773760203446.png

Test Your Connection

Before proceeding, verify that your PAT is working correctly by making a test API call:

curl -H "Authorization: Bearer <your-PAT>" \
  https://<your-workspace>.cloud.databricks.com/api/2.1/jobs/list

If configured correctly, you should receive a JSON response listing your Databricks jobs (or an empty list if you have none as in my case).


Step 2: Create a Datasphere Task Chain

Now that the connection is established, let's create a task chain that will trigger the Databricks notebook.

Create a New Task Chain

  1. In Datasphere, navigate to Data Builder
  2. Create a new Task Chain
  3. Give it a descriptive name (e.g., trigger_databricks_notebook)

Add an API Task

From the task palette, drag the API task onto the canvas.


Step 3: Configure the API Task

This is where we define how Datasphere will call the Databricks Jobs API.

Basic Configuration

Click on the API task to open its configuration panel.

Setting Value

Connectiondatabricks_rest_api (the connection you created earlier)
HTTP MethodPOST

konradlange_4-1773760494296.png

API Endpoint Configuration

The endpoint depends on how you configured your connection:

Setting Value

Full Endpoint (if Path is empty in connection)/api/2.1/jobs/runs/submit
Partial Endpoint (if Path is /api/2.1 in connection)/jobs/runs/submit

This endpoint is used to submit a one-off notebook run without needing a pre-configured Databricks job. For more details, see the Databricks Jobs API documentation.

konradlange_6-1773760606462.png

Request Body

The request body defines what notebook to run and with what parameters. Here's what I used:

{
    "name": "<job-name>",
    "tasks": [{
        "task_key": "run_notebook",
        "notebook_task": {
            "notebook_path": "/Workspace/Users/<your-user>/<your-notebook>"
        },
        "environment_key": "Default",
        "timeout_seconds": 3600
    }],
    "environments": [{
        "environment_key": "Default",
        "spec": {
            "client": "1"
        }
    }],
    "max_concurrent_runs": 1
}

Request Body Breakdown

Field Purpose

nameA descriptive name for this job run (appears in Databricks UI)
tasks[].task_keyUnique identifier for this task within the job
tasks[].notebook_task.notebook_pathFull path to your Databricks notebook
tasks[].environment_keyReferences the environment configuration
tasks[].timeout_secondsMaximum execution time (3600 = 1 hour)
environments[].environment_keyEnvironment identifier (must match task's reference)
environments[].spec.clientClient specification for the environment
max_concurrent_runsHow many instances of this job can run simultaneously

Serverless Compute Note: The example above uses Databricks Serverless Compute, so no cluster specification is required. If you're using a dedicated cluster, you'll need to add a new_cluster or existing_cluster_id field in the task definition.

Response Configuration

For monitoring purposes, configure the response handling:

Setting Value

Response TypeHTTP Status Code
Success Status Codes200, 201

The API will return a 200 OK status if the notebook run was successfully triggered, along with a JSON response containing the run_id.

konradlange_7-1773760895957.png

 


Step 4: Test Your Task Chain

Before deploying to production, thoroughly test your integration.

Execute the Task Chain

  1. Save your task chain configuration
  2. Click Run to execute the task chain manually
  3. Monitor the execution in the Datasphere task chain log

konradlange_8-1773761864961.png

Verify in Databricks

  1. Navigate to your Databricks workspace
  2. Go to Jobs & Pipelines
  3. You should see a new run with the name datasphere_job (or whatever you specified)
  4. Click on the run to see execution details and logs
2 Comments