Technology Blog Posts by Members
cancel
Showing results for 
Search instead for 
Did you mean: 
mike_zaschka
Active Participant
9,624

Preface

This blog post is the first in a two-part series introducing cds-kafka, an open-source CAP plugin designed for seamless integration with Apache Kafka.
In this post, we’ll cover the theoretical foundations of the plugin, including its features and capabilities. The second post will focus on the practical implementation, demonstrating how to use the plugin in a real-world application, both locally and on SAP BTP.

Introduction

Event-driven architectures (EDA), while not a new concept, have become a cornerstone for building scalable, decoupled applications in today’s dynamic cloud landscape. Instead of hard-wiring connections between different applications and services, EDA enables communication through a central broker or message queue. This intermediary ensures loose coupling between producers (which generate events) and consumers (which process them). Such an architecture allows systems to react to changes asynchronously, supporting real-time processing, enhanced scalability, and improved resilience.

SAP has embraced EDA by offering several message broker services on SAP Business Technology Platform (SAP BTP):

  • SAP Event Mesh (as part of Integration Suite)
  • SAP Advanced Event Mesh
  • SAP Cloud Application Event Hub

These services provide robust asynchronous messaging capabilities and integrate almost seamlessly with other SAP solutions. For instance, S/4HANA can emit pre-configured or custom events to these brokers, which can then be consumed by other SAP services like SAP Cloud Integration, SAP Build Process Automation, or applications built with the SAP Cloud Application Programming Model (CAP) using dedicated connectors.

The Ecosystem Perspective

From an organizational perspective, SAP’s proprietary messaging solutions are a natural fit for companies heavily invested in the SAP ecosystem. However, other IT departments may rely on more open-source platforms, such as Apache Kafka, to cater to broader architectural needs for messaging and event streaming. This divergence often creates scenarios where integration between Kafka and SAP services becomes essential.

SAP has addressed some of these integration needs with available adapters, including:

However, certain scenarios may require or benefit from direct communication between Kafka and cloud applications running in the SAP ecosystem, particularly when no Kafka Bridge is available or when introducing another broker middleware would add an unnecessary layer of complexity.

Focus of this Blog Post

In this blog post, I will focus on the latter scenario — connecting SAP cloud applications directly to Apache Kafka. To address this need, I have created and published an open-source CAP plugin cds-kafka that implements the messaging service capabilities within CAP to seamlessly integrate with Apache Kafka.

Is Apache Kafka available on SAP BTP?

To directly address this: Apache Kafka is not available as a public service on SAP BTP. It does not appear in the Discovery Center nor in the BTP Service Metadata. However, there are indications that SAP uses Kafka internally for certain applications running on SAP BTP. For example, the official CAP documentation (source) mentions Kafka, and there is an undocumented Kafka Adapter embedded in the CAP sources. This adapter can be activated in a CAP project by running the command cds add kafka, which also adds a reference to a Kafka service in the mta.yaml file. 

However, the Kafka service itself is not publicly available, and the existing Kafka Adapter in CAP is restricted, making it unsuitable for connecting to other Kafka instances outside of SAP BTP. To overcome this limitation, I decided to implemented and offer the open-source adapter cds-kafka capable of connecting to any Kafka instance, whether hosted locally or in the cloud. 

Apache Kafka in a nutshell

Apache Kafka is an open-source, ecosystem-agnostic and distributed event streaming platform, that has originally been developed by LinkedIn, but now is being maintained by the Apache Software Foundation. Apache Kafka is highly successful and is being used by many customers (including 80+ of the top 100 companies – source). Compared to SAP Event Mesh, which is essentially a message broker, Apache Kafka as an event streaming platform and differs in certain areas. A high level overview: 

Kafka organizes messages into topics, which are partitioned for scalability and processed independently by consumers. Unlike traditional message brokers, which often emphasize guaranteed delivery, dead-letter queues, and strict message acknowledgment semantics, Kafka relies on a log-based storage mechanism with offsets to manage message consumption. An offset is a unique identifier for each message in a topic, representing its position in the log. Consumers are responsible for managing their offsets, allowing for flexible consumption patterns such as replaying messages, skipping unimportant ones, or starting from a specific point in the stream, as shown in the following, simplified illustration: 

kafka-basic.png

In traditional message brokers, messages are typically removed from the queue once consumed, and delivery guarantees often depend on acknowledgment mechanisms. Kafka, in contrast, does not remove messages upon consumption. Instead, it retains messages for a configurable period, irrespective of whether they are processed. This design allows multiple consumers (consumer groups) to process the same data independently and at their own pace without interfering with each other.

Kafka’s core strength lies in its ability to handle real-time data streams efficiently. With tools like Kafka Streams, developers can perform real-time data transformations, filtering, and aggregations directly on event streams without needing additional systems. For integration, Kafka Connect simplifies connecting Kafka to external data systems, providing pre-built connectors for databases, cloud storage, and other platforms. These capabilities enable Kafka to act not just as a message broker but as a central hub for processing, transforming, and routing data across an entire enterprise.

Introducing cds-kafka

The SAP Cloud Application Programming Model (CAP) is SAP’s golden-path framework for building cloud-native applications on SAP BTP. As a state-of-the-art framework, CAP has support for EDA and in fact treats events as a first-class concept, with event-driven behavior integrated into the core of the framework (source)
  • Everything happening at runtime is triggered by / in reaction to events
  • Providers subscribe to and handle events within the system.
  • Observers subscribe to external events asynchronously, often from remote systems.
When looking at the Kafka integration, cds-kafka focuses on remote and asynchronous events, allowing applications to send and consume events directly from Kafka. While the focus of the plugin is to use the internal CAP messaging APIs a much as possible, some Kafka concepts require to be handled a little different.
 
cds-kafka is open-source and the source code available on github: https://github.com/mikezaschka/cds-kafka

Features

Under the hood cds-kafka uses KafkaJS to connect to Kafka. Since Kafka itself uses a binary protocol over TCP, KafkaJS is the required abstraction layer to access Kafka functionality in JavaScript. By leveraging the CAP messaging APIs and KafkaJS, cds-kafka offers the following capabilities:

  • CAP Integration: Seamlessly integrates with CAP’s native messaging framework for event handling.
  • Event Publishing: Enables sending messages/events to Apache Kafka.
  • Topic Subscription: Allows subscribing to one or multiple topics in Apache Kafka to receive incoming messages.
  • Advanced Topic Matching: Supports subscription to multiple topics using regular expressions.
  • CloudEvents Support: Provides built-in support for the CloudEvents specification.
  • Kafka-Specific Features: Leverages Kafka-specific capabilities, including the use of keys, partitions and topic subscription behavior.
  • Flexible Connectivity: Connects to Kafka instances hosted locally or in any cloud environment.
  • Topic Management: Automates the creation of topics in Kafka.

Initial configuration

cds-kafka is provided as an NPM module and is publicly available in the NPM repository (see here).
In any existing CAP project, cds-kafka can simply be installed via the following command:

 

npm i cds-kafka

 

After installing the plugin, the service configuration must be added to CAP. The following example registers a service kafka-messaging and provides the minimal service configuration. The name of the service can be chosen freely, with the exception of kafka (which will then use the internal Kafka-adapter). Since kafka-messaging is just another service inside CAP, it is also possible to use this next to other messaging services. So, it is also possible to build hybrid proxy/bridge applications connecting the SAP internal (Event Mesh) and external messaging systems:

 

"cds": {
    "requires": {
        "kafka-messaging": {
            "kind": "kafka-service",
            "credentials": {
             ...
            }
        },
        "kafka-service": {
            "impl": "cds-kafka"
        }
    }
}

 

Sending messages

Sending a message to Kafka is possible by simply using the default messaging API in CAP.  CAP and cds-kafka will take care of connecting to the broker and transporting the message. The following example will send a simple message to the Kafka topic some-topic:

 

const kafka = await cds.connect.to('kafka-messaging') 
await kafka.emit('some-topic', { subject: 'Hello World' })

 

By default, the name of the event and the name of the topic is expected to match. In the case the topic does not yet exist in Apache Kafka, cds-kafka will try to create it.

Subscribing to topics and consuming messages

Subscribing to a topic in Apache Kafka is also possible by simply using the CAP default messaging API. The following example creates a subscription to the some-topic topic in Kafka and the callback will be triggered whenever a new message is being published to the broker:

 

const kafka = await cds.connect.to('kafka-messaging') 
kafka.on('some-topic', (message) => console.log(message.data)) 

 

By default, each on-Call will create a subscription to the defined topic. So it is also possible that a CAP applications will subscribe to and receive messages from multiple topics.

Advanced functionality

As the above examples have shown, the usage of cds-kafka pretty much feels like using any other existing messaging service in CAP. But since Kafka also has a special set of features, there are some more things to look at.

Topics and events types

By default, cds-kafka expects the name of the Kafka topic and the event type to be the same. As an example, sending an event my/custom/event to Kafka, would result in a topic with the same name to be created and used by CAP. However, there are possibilities to diverge from this rule and it is also possible to simply route all event types to a single topic or create a wild mix of topics and event types.
Apache Kafka is so successful, because it provides a lot of flexibility and this also comes down to handling topics and its messages.
cds-kafka also has support for scenarios where the event type and the topic do not match (more on this below). To still have all the important information at hand, cds-kafka will add two headers to each message, that contain the event and topic name:

  • x-sap-cap-event: Contains the name of the event in CAP.
  • x-sap-cap-kafka-topic: Indicates the actual Kafka topic from which the message originated.

This information can simply be accessed by inspecting the message headers:

 

kafka.on("my/custom/event", (message) => {
  console.log(message.headers["x-sap-cap-event"])
  console.log(message.headers["x-sap-cap-kafka-topic"])
})

 

Providing connection details

As mentioned before, cds-kafka internally uses KafkaJS to connect to and communicate with Kafka. Therefore, all connection options provided within the credentials section of the service configuration are directly routed to KafkaJS. This enables flexible access to local or remote Kafka systems, including authentication.
All available configuration options are provided in the KakfkaJS documentation.

Next to the broker endpoints, a clientId can be specified. The purpose of the clientId within Kafka is to be able to track the source of requests allowing a logical application name to be included in server-side request logging. When not provided, cds-kafka will create a dynamic clientId based on the environment. A clientId should be provided and shared across a horizontally scaled application, but be distinct for each application.

The following shows an example configuration for accessing a local dockerized Kafka instance running on the default port 9092:

 

"cds": {
    "requires": {
        "kafka-messaging": {
            "kind": "kafka-service",
            "credentials": {
                "clientId": "my-cap-app",
                "brokers": [
                    "localhost:9092"
                ]
            }
        }
    }
}

 

Subscribing to topics via Regular Expressions or catch-all (*)

By default, cds-kafka subscribes to topics based on the event name specified in the observer registration (on calls). However, unlike standard CAP behavior, cds-kafka extends this capability by allowing subscriptions to multiple topics using regular expressions. Regular expressions enable subscribing to multiple topics with matching patterns. Since CAP’s internal API only supports string-based subscriptions, the two additional headers (see above) are used to track the information:

  • x-sap-cap-event: Contains the registered regular expression.
  • x-sap-cap-kafka-topic: Indicates the actual Kafka topic from which the message originated.

It's important to note, that regex-based subscriptions only work for already existing topics at the time of subscription. If a new matching topic is added after the application has started, the application must be restarted to receive messages from this topic.

Next to regular expressions, there is also support for a catch-all subscription which allows listening to all topics of a broker, excluding Kafka’s private ones (those starting with __). Internally, a regex subscription is created to match all public topics dynamically.

The following snippet shows how both options can be used:

 

const kafka = await cds.connect.to('kafka-messaging') 

// Subscribe all topics matching the pattern (e.g. topic-A-1, topic-B-1...) 
kafka.on(/topic-(A|B|C)-.*/i, (message) => console.log(message))

// Subscribe all topics (excluding the Kafka private ones starting with __)
kafka.on("*", (message) => console.log(message))

 

While these features provide flexibility, it’s essential to carefully manage regex and catch-all subscriptions in production environments to avoid unintended message processing or performance issues. These options are particularly useful for dynamic architectures where topic names are not static or predefined.

Using CloudEvents 

CAP has native support for the CloudEvents specification (see documentation). By setting the message format to cloudevents supporting metadata will be added to all messages. cds-kafka also supports CloudEvents adhering to the official CloudEvents specification for Apache Kafka. While the specification distinguishes between wrapping the message data and adding headers, cds-kafka enables both at the same time. 
The reasons for this is, that CAP internally takes care of CloudEvents metadata and the incoming message is already stripped of all CloudEvents metadata. Because this metadata might be required, the header data will still be available when processing the incoming message.

The following example shows an inspection of a message in the event handler. The CloudEvents header fields (prefixed with ce-) are still visible and can be read programatically. 

 

{
  data: { subject: 'Hello World' },
  headers: {
    'ce-specversion': '1.0',
    'ce-id': 'df396250-cb0d-4f88-97c4-0aa25faa12ef',
    'ce-type': 'some-topic',
    'ce-source': 'cap-kafka-example-app-1',
    'ce-time': '2025-01-02T15:59:24.265Z',
    'ce-datacontenttype': 'application/json',
  },
  event: 'some-topic'
}

 

Kafka-specifc features: Consumer Groups, Topics, Partitions and Message Keys 

Kafka has been briefly explained at the beginning of the post and while the illustration was showcasing some of the core concepts, it was a little simplified. Apache Kafka’s architecture involves several advanced features that add flexibility and scalability but also introduce complexity. Key concepts such as topics, partitions, keys, consumer groups, and offsets play a crucial role in Kafka’s operation. cds-kafka has support for all of those features, but before showcasing, let’s explore these in more detail.

kafka-advanced.png

  • Messages and Keys: Each message in Kafka is sent to a topic, which acts as the logical container. A message contains the message data, metadata (headers) and an optional key. When a key is provided, Kafka ensures that all messages with the same key are sent to the same partition, maintaining their order. If no key is specified, Kafka distributes messages across partitions in a round-robin or balanced fashion. In such cases, the order of messages across the topic may not be preserved, especially when multiple partitions and consumers are involved.
  • Topics and Partitions: A topic in Kafka is divided into one or more partitions, which serve as the basic unit of parallelism and scalability. Messages within a topic are written to partitions either based on the key (if provided) or randomly (if no key is provided). Partitions allow Kafka to process messages in parallel, enabling higher throughput. However, this design also introduces complexity in maintaining message order across partitions, as order is only guaranteed within a single partition, not across the entire topic.
  • Consumer Groups: Kafka’s consumer group mechanism is a core feature for scalability and fault tolerance in message consumption. Each consumer group consists of one or more consumers that collectively process messages from a topic. Kafka ensures that each partition is assigned to only one consumer within a group, distributing the workload evenly. 
  • Offsets:  Kafka tracks each consumer’s progress through the offset, a unique identifier for each message within a partition. Offsets enable replayability (by resetting the offset) and error recovery. Offsets are stored by Kafka for each consumer group - partition combination, ensuring that every group maintains its independent progress.

How to handle all of this in cds-kafka?

All of those aspects can be used with cda-kafka to also connect to more complex Kafka architectures.

Consumer Group

Every CAP application using cds-kafka and connecting to Kafka for message consumption is treated as one consumer. If not specified, the groupId will automatically be generated (by using the VCAP application or the local process ID). However, the groupId can also be provided via the configuration. Thus it is possible connect the application to an existing consumer group or to have multiple instances of the same application running in parallel, each as an individual consumer within the consumer group.

 

"cds": {
    "requires": {
        "kafka-messaging": {
            ...
            "groupId": "my-group"
            ...
        }
    }
}

 

Initial offset configuration

The default behavior when subscribing to an existing topic with a new groupId will not consume any existing messages, but internally set the offset to the latest message. Only new arriving messages will then be consumed by the application. However, it is possible to override this behavior: By directly naming the topic in the configuration and setting fromBeginning: true the service will start to consume all existing messages being in the topic's retention period.

 

"cds": {
    "requires": {
        "kafka-messaging": {
            ...
            "topics": [{
                "some-topic": {
                    "fromBeginning": true
                }
            }]
            ...
        }
    }
}

 

Specifying topic, key and partition when emitting a message

When emitting a message with cap-kafka, the default behavior is to send the message to a random partition, with the event name used as the topic. However, this behavior can be customized to suit more complex architectural needs. The following custom headers can be used to override the default behavior:

  • @kafka.key:
    Assign a key to the message for logical grouping or ordering purposes. For example, all messages related to a specific entity can use the same key, ensuring they are processed in the same order.
    @kafka.partition:
    Specify a specific partition to which the message should be sent. This is useful for advanced use cases requiring precise control over partition distribution.
    @kafka.topic:
    Override the default topic name (derived from the event name) and specify a custom topic. This enables routing messages to different topics based on dynamic conditions.

The following example show how those headers can be set. If set, those headers will not be sent to Kafka, but only used internally for applying the specific behavior. 

 

const messaging = await cds.connect.to('kafka-messaging')
await messaging.emit({
    event: 'ObjectCreated',
    data: { subject: 'Hello World' },
    headers: {
      '@kafka.key': '1234',
      '@kafka.partition': "0",
      '@kafka.topic': 'cap.test.object.created.v1',
    }
});

 

While the @kafka headers are only used internally, every message consumed by cds-kafka (even those not using the @kafk headers) will receive a set of additional headers providing information on Kafka internals. Next to the already described x-sap-cap-kafka-topic, the following header fields are available:

  • x-sap-cap-kafka-partition: Contains the information of the partition the message has been retrieved from.
  • x-sap-cap-kafka-offset: Contains the offset within the partition the message has been retrieved from.
  • x-sap-cap-kafka-timestamp: Contains the timestamp the message has been stored inside Kafka.

These capabilities allow cap-kafka to support more sophisticated setups, such as architectures with specific partitioning strategies or dynamic topic routing, while not breaking CAP’s internal APIs.

Verdict

In this blog post, I explained the basic concepts of Apache Kafka and introduced cds-kafka, an open-source CAP plugin that extends CAP’s messaging capabilities to integrate with Kafka. I also highlighted the plugin’s features and how it bridges SAP applications with external Kafka instances.

In the upcoming second post, I will demonstrate a practical example application, showcasing how to use cds-kafka both locally and on SAP BTP, connecting to an external cloud-based Kafka instance.

While SAP-provided message brokers remain the preferred choice in a fully SAP-centric ecosystem, Apache Kafka and cds-kafka provide a powerful alternative for connecting SAP applications to external event streams. Additionally, with CAP being a great and versatile framework, it can serve as an excellent choice even outside the traditional SAP environment.

3 Comments