Technology Blog Posts by SAP
cancel
Showing results for 
Search instead for 
Did you mean: 
Yogananda
Product and Topic Expert
Product and Topic Expert
0 Likes
927

As organizations move more of their SAP and enterprise workloads into modern lakehouse platforms, data governance becomes a critical priority. Sensitive data—such as personal identifiers, financial details, and confidential business information—must be identified quickly and protected consistently.

This is where Data Classification in SAP Databricks plays a vital role, especially when working with SAP data products shared between Datasphere via BDC or pipelines on the Databricks Lakehouse.

In this blog, we’ll explore what data classification means, why it matters, and how Databricks simplifies classification for SAP‑related datasets. 

What Is Data Classification?

Data Classification is the process of automatically identifying and labeling sensitive information stored across your data environment.

SAP Databricks provides an intelligent scanner that examines:

  • Table metadata
  • Column names
  • Column content patterns
  • Statistical characteristics

Based on this analysis, Databricks applies tags such as:

  • PII (Personally Identifiable Information)
  • Financial Data
  • Confidential
  • Internal Only

These tags enable automated governance, access controls, lineage tracking, and risk assessments—all essential for SAP data that often contains personal or financial details.

How Databricks Enables Data Classification

Below screenshot shows the feature inside SAP Databricks Catalog Explorer → Details → Advanced for a catalog named “<mycatalog>”.

Under the Advanced section, you can see:2026-03-05_12-29-46.png

Data Classification: Disabled

With an option to Enable.2026-03-05_21-55-47.png Once enabled, Databricks will:2026-03-05_21-57-00.png

  1. Scan all tables in the selected catalog
  2. Detect sensitive columns (e.g., customer numbers, emails, IBANs, tax IDs)
  3. Apply classification tags automatically
  4. Make tags visible to administrators, governance teams, and data stewards
  5. Feed the metadata into Unity Catalog governance policies

2026-03-05_21-56-14.png

This is particularly powerful for SAP workloads because:

  • SAP systems generate complex, interconnected tables
  • Sensitive data is often embedded deep in transactional structures
  • Manual tracking is nearly impossible at scale

Why Data Classification Matters ?

When SAP Delta sharing (Zero copy) lands in Databricks—whether through SAP Datasphere or custom ETL flows—it frequently includes regulated data fields.

Data classification supports:

Compliance (GDPR, HIPAA, SOX, etc.)

Automatically tag and monitor sensitive data.

Least‑privilege access

Policies can enforce who can view what.

Secure analytics

Sensitive data is masked or tokenized before analytics or ML workloads access it.

Automated governance workflows

Classification integrates with Unity Catalog, allowing:

  • Row/column-level security
  • Access auditing
  • Change management
  • Data lineage tracking

How to Enable Data Classification in Databricks (Step-by-Step)

  1. Open Catalog Explorer in Databricks
  2. Select the target catalog (e.g., skf)
  3. Open the Details tab
  4. Scroll to the Advanced section
  5. Toggle Data Classification → Enable
  6. Databricks begins automatic scanning in the background
  7. Review classifications under Table → Column Details

After enabling, every new table ingested into the catalog will be scanned automatically.

What Happens After Classification?

Once tags are generated, you can:

  • View sensitive columns
    • Under each table’s schema view.
  • Create governance rules
    • Using Unity Catalog’s policy engine (e.g., hide PII unless user is in allowed group).
  • Implement data masking
    • Auto mask email, phone, or ID fields for non‑privileged users.
  • Monitor sensitive data flows
    • Using lineage dashboards.

Real‑World Use Case: SAP Customer Data Migration

Imagine for ML Usecase or Analytics, Delta Sharing from Datasphere of SAP Sales & Distribution or SAP CRM or S/4 HANA customer tables into SAP Databricks. These contain:

  • Customer Names
  • Addresses
  • Contact Info
  • Tax IDs
  • Payment Terms

By enabling Data Classification:

  • Databricks identifies PII automatically
  • Data engineers do not manually inspect thousands of customer fields
  • Security teams gain full visibility
  • Access policies enforce compliance from day one

Conclusion

Data Classification in Databricks is a key enabler for secure, compliant, and scalable SAP analytics.
With just one click, you activate an automated governance engine that keeps your SAP datasets protected, discoverable, and compliant.

2 Comments