Triggering SAP Databricks Jobs through SAP Datasph...

konradlange · ‎2026 Mar 17

Abstract

Integrating SAP Datasphere with Databricks opens up powerful possibilities for advanced analytics, machine learning pipelines, and cross-platform data processing.

In this guide, I'll share my experience automating SAP Databricks notebook execution using SAP Datasphere task chains and the Databricks Jobs API through a practical, step-by-step approach.

Common Use Cases:

Automated data processing pipelines across SAP and Databricks
Machine learning model training triggered by Datasphere data flows
Advanced analytics on SAP data using Databricks' compute power
Orchestrated cross-platform workflows with unified monitoring

Business Value:

Reduced manual intervention in data workflows
Faster time-to-insight with automated analytics
Unified orchestration across enterprise data platforms

Prerequisites

Before getting started, ensure you have:

SAP Datasphere access with permission to create connections and task chains
Databricks workspace with notebook development access
Databricks Personal Access Token (PAT) generation privileges
Basic understanding of REST APIs and HTTP methods
A Databricks notebook ready to be executed

Architecture Overview

The integration follows this flow:

┌─────────────────────┐
│  SAP Datasphere     │
│  Task Chain         │
└──────────┬──────────┘
           │
           │ HTTP POST Request
           │ (Jobs API)
           ▼
┌─────────────────────┐
│  Databricks         │
│  Jobs API           │
└──────────┬──────────┘
           │
           │ Triggers
           ▼
┌─────────────────────┐
│  Databricks         │
│  Notebook Run       │
└─────────────────────┘

Key Components:

Datasphere Task Chain - Orchestration layer
Generic HTTP Connection - API authentication setup
Databricks Jobs API - /api/2.1/jobs/runs/submit endpoint
PAT Authentication - Bearer token security
Databricks Notebook - Target execution logic

Step 1: Create a Generic HTTP Connection in Datasphere

The first step is to establish a secure connection from Datasphere to Databricks. We'll use a Generic HTTP Connection with Bearer token authentication.

Navigate to Connections

In SAP Datasphere, go to Connections and create a new Generic HTTP connection.

Configure Connection Settings

Setting Value Notes

Connection Name	databricks_rest_api	Choose a descriptive name
Host	<your-workspace>.cloud.databricks.com	Your Databricks workspace hostname
Port	443	Datasphere uses 443 (HTTPS) as default
Protocol	HTTPS	Always use secure connections
Path	/api/2.1 (optional)	Can be set here or in the API task later

Authentication Setup

Databricks uses Bearer tokens (Personal Access Tokens / PATs) for API authentication. For simplicity in Datasphere, we'll configure this as a Username/Password authentication type:

Setting Value

Authentication Type	Username and Password
Username	token
Password	<Your Databricks PAT>

Note: The username is literally the string token. This is how Databricks interprets Bearer token authentication via basic auth.

Obtaining a Databricks Personal Access Token (PAT)

You'll need a PAT to authenticate your API calls. Here's how to generate one:

In your Databricks workspace, click on your user profile in the top-right corner
Select Settings
Navigate to Developer under the User section
Click on Manage next to Access Tokens

Click Generate new token
Provide a description (e.g., "Datasphere Task Chain Integration")
Set an expiration period (recommended: 90 days for security)
Copy the generated token immediately as you won't be able to see it again!

Test Your Connection

Before proceeding, verify that your PAT is working correctly by making a test API call:

curl -H "Authorization: Bearer <your-PAT>" \
  https://<your-workspace>.cloud.databricks.com/api/2.1/jobs/list

If configured correctly, you should receive a JSON response listing your Databricks jobs (or an empty list if you have none as in my case).

Step 2: Create a Datasphere Task Chain

Now that the connection is established, let's create a task chain that will trigger the Databricks notebook.

Create a New Task Chain

In Datasphere, navigate to Data Builder
Create a new Task Chain
Give it a descriptive name (e.g., trigger_databricks_notebook)

Add an API Task

From the task palette, drag the API task onto the canvas.

Step 3: Configure the API Task

This is where we define how Datasphere will call the Databricks Jobs API.

Basic Configuration

Click on the API task to open its configuration panel.

Setting Value

Connection	databricks_rest_api (the connection you created earlier)
HTTP Method	POST

API Endpoint Configuration

The endpoint depends on how you configured your connection:

Setting Value

Full Endpoint (if Path is empty in connection)	/api/2.1/jobs/runs/submit
Partial Endpoint (if Path is /api/2.1 in connection)	/jobs/runs/submit

This endpoint is used to submit a one-off notebook run without needing a pre-configured Databricks job. For more details, see the Databricks Jobs API documentation.

Request Body

The request body defines what notebook to run and with what parameters. Here's what I used:

{
    "name": "<job-name>",
    "tasks": [{
        "task_key": "run_notebook",
        "notebook_task": {
            "notebook_path": "/Workspace/Users/<your-user>/<your-notebook>"
        },
        "environment_key": "Default",
        "timeout_seconds": 3600
    }],
    "environments": [{
        "environment_key": "Default",
        "spec": {
            "client": "1"
        }
    }],
    "max_concurrent_runs": 1
}

Request Body Breakdown

Field Purpose

name	A descriptive name for this job run (appears in Databricks UI)
tasks[].task_key	Unique identifier for this task within the job
tasks[].notebook_task.notebook_path	Full path to your Databricks notebook
tasks[].environment_key	References the environment configuration
tasks[].timeout_seconds	Maximum execution time (3600 = 1 hour)
environments[].environment_key	Environment identifier (must match task's reference)
environments[].spec.client	Client specification for the environment
max_concurrent_runs	How many instances of this job can run simultaneously

Serverless Compute Note: The example above uses Databricks Serverless Compute, so no cluster specification is required. If you're using a dedicated cluster, you'll need to add a new_cluster or existing_cluster_id field in the task definition.

Response Configuration

For monitoring purposes, configure the response handling:

Setting Value

Response Type	HTTP Status Code
Success Status Codes	200, 201

The API will return a 200 OK status if the notebook run was successfully triggered, along with a JSON response containing the run_id.

Step 4: Test Your Task Chain

Before deploying to production, thoroughly test your integration.

Execute the Task Chain

Save your task chain configuration
Click Run to execute the task chain manually
Monitor the execution in the Datasphere task chain log

Verify in Databricks

Navigate to your Databricks workspace
Go to Jobs & Pipelines
You should see a new run with the name datasphere_job (or whatever you specified)
Click on the run to see execution details and logs

By Category

Related Content

Activity Groups

Industry Groups

Influence and Feedback Groups

Interest Groups

Location Groups

Customer Only Groups

Forums

Related Resources

Products

Learning and Support

About

My Account

My Account