Integrating SAP Datasphere with Databricks opens up powerful possibilities for advanced analytics, machine learning pipelines, and cross-platform data processing.
In this guide, I'll share my experience automating SAP Databricks notebook execution using SAP Datasphere task chains and the Databricks Jobs API through a practical, step-by-step approach.
Common Use Cases:
Business Value:
Before getting started, ensure you have:
The integration follows this flow:
┌─────────────────────┐
│ SAP Datasphere │
│ Task Chain │
└──────────┬──────────┘
│
│ HTTP POST Request
│ (Jobs API)
▼
┌─────────────────────┐
│ Databricks │
│ Jobs API │
└──────────┬──────────┘
│
│ Triggers
▼
┌─────────────────────┐
│ Databricks │
│ Notebook Run │
└─────────────────────┘Key Components:
The first step is to establish a secure connection from Datasphere to Databricks. We'll use a Generic HTTP Connection with Bearer token authentication.
In SAP Datasphere, go to Connections and create a new Generic HTTP connection.
Setting Value Notes
| Connection Name | databricks_rest_api | Choose a descriptive name |
| Host | <your-workspace>.cloud.databricks.com | Your Databricks workspace hostname |
| Port | 443 | Datasphere uses 443 (HTTPS) as default |
| Protocol | HTTPS | Always use secure connections |
| Path | /api/2.1 (optional) | Can be set here or in the API task later |
Databricks uses Bearer tokens (Personal Access Tokens / PATs) for API authentication. For simplicity in Datasphere, we'll configure this as a Username/Password authentication type:
Setting Value
| Authentication Type | Username and Password |
| Username | token |
| Password | <Your Databricks PAT> |
Note: The username is literally the string token. This is how Databricks interprets Bearer token authentication via basic auth.
You'll need a PAT to authenticate your API calls. Here's how to generate one:
Before proceeding, verify that your PAT is working correctly by making a test API call:
curl -H "Authorization: Bearer <your-PAT>" \
https://<your-workspace>.cloud.databricks.com/api/2.1/jobs/listIf configured correctly, you should receive a JSON response listing your Databricks jobs (or an empty list if you have none as in my case).
Now that the connection is established, let's create a task chain that will trigger the Databricks notebook.
From the task palette, drag the API task onto the canvas.
This is where we define how Datasphere will call the Databricks Jobs API.
Click on the API task to open its configuration panel.
Setting Value
| Connection | databricks_rest_api (the connection you created earlier) |
| HTTP Method | POST |
The endpoint depends on how you configured your connection:
Setting Value
| Full Endpoint (if Path is empty in connection) | /api/2.1/jobs/runs/submit |
| Partial Endpoint (if Path is /api/2.1 in connection) | /jobs/runs/submit |
This endpoint is used to submit a one-off notebook run without needing a pre-configured Databricks job. For more details, see the Databricks Jobs API documentation.
The request body defines what notebook to run and with what parameters. Here's what I used:
{
"name": "<job-name>",
"tasks": [{
"task_key": "run_notebook",
"notebook_task": {
"notebook_path": "/Workspace/Users/<your-user>/<your-notebook>"
},
"environment_key": "Default",
"timeout_seconds": 3600
}],
"environments": [{
"environment_key": "Default",
"spec": {
"client": "1"
}
}],
"max_concurrent_runs": 1
}Field Purpose
| name | A descriptive name for this job run (appears in Databricks UI) |
| tasks[].task_key | Unique identifier for this task within the job |
| tasks[].notebook_task.notebook_path | Full path to your Databricks notebook |
| tasks[].environment_key | References the environment configuration |
| tasks[].timeout_seconds | Maximum execution time (3600 = 1 hour) |
| environments[].environment_key | Environment identifier (must match task's reference) |
| environments[].spec.client | Client specification for the environment |
| max_concurrent_runs | How many instances of this job can run simultaneously |
Serverless Compute Note: The example above uses Databricks Serverless Compute, so no cluster specification is required. If you're using a dedicated cluster, you'll need to add a new_cluster or existing_cluster_id field in the task definition.
For monitoring purposes, configure the response handling:
Setting Value
| Response Type | HTTP Status Code |
| Success Status Codes | 200, 201 |
The API will return a 200 OK status if the notebook run was successfully triggered, along with a JSON response containing the run_id.
Before deploying to production, thoroughly test your integration.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
| User | Count |
|---|---|
| 36 | |
| 28 | |
| 27 | |
| 26 | |
| 26 | |
| 26 | |
| 24 | |
| 23 | |
| 22 | |
| 22 |