This blog post is the first in a two-part series introducing cds-kafka, an open-source CAP plugin designed for seamless integration with Apache Kafka.
In this post, we’ll cover the theoretical foundations of the plugin, including its features and capabilities. The second post will focus on the practical implementation, demonstrating how to use the plugin in a real-world application, both locally and on SAP BTP.
Event-driven architectures (EDA), while not a new concept, have become a cornerstone for building scalable, decoupled applications in today’s dynamic cloud landscape. Instead of hard-wiring connections between different applications and services, EDA enables communication through a central broker or message queue. This intermediary ensures loose coupling between producers (which generate events) and consumers (which process them). Such an architecture allows systems to react to changes asynchronously, supporting real-time processing, enhanced scalability, and improved resilience.
SAP has embraced EDA by offering several message broker services on SAP Business Technology Platform (SAP BTP):
These services provide robust asynchronous messaging capabilities and integrate almost seamlessly with other SAP solutions. For instance, S/4HANA can emit pre-configured or custom events to these brokers, which can then be consumed by other SAP services like SAP Cloud Integration, SAP Build Process Automation, or applications built with the SAP Cloud Application Programming Model (CAP) using dedicated connectors.
From an organizational perspective, SAP’s proprietary messaging solutions are a natural fit for companies heavily invested in the SAP ecosystem. However, other IT departments may rely on more open-source platforms, such as Apache Kafka, to cater to broader architectural needs for messaging and event streaming. This divergence often creates scenarios where integration between Kafka and SAP services becomes essential.
SAP has addressed some of these integration needs with available adapters, including:
However, certain scenarios may require or benefit from direct communication between Kafka and cloud applications running in the SAP ecosystem, particularly when no Kafka Bridge is available or when introducing another broker middleware would add an unnecessary layer of complexity.
In this blog post, I will focus on the latter scenario — connecting SAP cloud applications directly to Apache Kafka. To address this need, I have created and published an open-source CAP plugin cds-kafka that implements the messaging service capabilities within CAP to seamlessly integrate with Apache Kafka.
To directly address this: Apache Kafka is not available as a public service on SAP BTP. It does not appear in the Discovery Center nor in the BTP Service Metadata. However, there are indications that SAP uses Kafka internally for certain applications running on SAP BTP. For example, the official CAP documentation (source) mentions Kafka, and there is an undocumented Kafka Adapter embedded in the CAP sources. This adapter can be activated in a CAP project by running the command cds add kafka, which also adds a reference to a Kafka service in the mta.yaml file.
However, the Kafka service itself is not publicly available, and the existing Kafka Adapter in CAP is restricted, making it unsuitable for connecting to other Kafka instances outside of SAP BTP. To overcome this limitation, I decided to implemented and offer the open-source adapter cds-kafka capable of connecting to any Kafka instance, whether hosted locally or in the cloud.
Apache Kafka is an open-source, ecosystem-agnostic and distributed event streaming platform, that has originally been developed by LinkedIn, but now is being maintained by the Apache Software Foundation. Apache Kafka is highly successful and is being used by many customers (including 80+ of the top 100 companies – source). Compared to SAP Event Mesh, which is essentially a message broker, Apache Kafka as an event streaming platform and differs in certain areas. A high level overview:
Kafka organizes messages into topics, which are partitioned for scalability and processed independently by consumers. Unlike traditional message brokers, which often emphasize guaranteed delivery, dead-letter queues, and strict message acknowledgment semantics, Kafka relies on a log-based storage mechanism with offsets to manage message consumption. An offset is a unique identifier for each message in a topic, representing its position in the log. Consumers are responsible for managing their offsets, allowing for flexible consumption patterns such as replaying messages, skipping unimportant ones, or starting from a specific point in the stream, as shown in the following, simplified illustration:
In traditional message brokers, messages are typically removed from the queue once consumed, and delivery guarantees often depend on acknowledgment mechanisms. Kafka, in contrast, does not remove messages upon consumption. Instead, it retains messages for a configurable period, irrespective of whether they are processed. This design allows multiple consumers (consumer groups) to process the same data independently and at their own pace without interfering with each other.
Kafka’s core strength lies in its ability to handle real-time data streams efficiently. With tools like Kafka Streams, developers can perform real-time data transformations, filtering, and aggregations directly on event streams without needing additional systems. For integration, Kafka Connect simplifies connecting Kafka to external data systems, providing pre-built connectors for databases, cloud storage, and other platforms. These capabilities enable Kafka to act not just as a message broker but as a central hub for processing, transforming, and routing data across an entire enterprise.
Under the hood cds-kafka uses KafkaJS to connect to Kafka. Since Kafka itself uses a binary protocol over TCP, KafkaJS is the required abstraction layer to access Kafka functionality in JavaScript. By leveraging the CAP messaging APIs and KafkaJS, cds-kafka offers the following capabilities:
cds-kafka is provided as an NPM module and is publicly available in the NPM repository (see here).
In any existing CAP project, cds-kafka can simply be installed via the following command:
npm i cds-kafka
After installing the plugin, the service configuration must be added to CAP. The following example registers a service kafka-messaging and provides the minimal service configuration. The name of the service can be chosen freely, with the exception of kafka (which will then use the internal Kafka-adapter). Since kafka-messaging is just another service inside CAP, it is also possible to use this next to other messaging services. So, it is also possible to build hybrid proxy/bridge applications connecting the SAP internal (Event Mesh) and external messaging systems:
"cds": {
"requires": {
"kafka-messaging": {
"kind": "kafka-service",
"credentials": {
...
}
},
"kafka-service": {
"impl": "cds-kafka"
}
}
}
Sending a message to Kafka is possible by simply using the default messaging API in CAP. CAP and cds-kafka will take care of connecting to the broker and transporting the message. The following example will send a simple message to the Kafka topic some-topic:
const kafka = await cds.connect.to('kafka-messaging')
await kafka.emit('some-topic', { subject: 'Hello World' })
By default, the name of the event and the name of the topic is expected to match. In the case the topic does not yet exist in Apache Kafka, cds-kafka will try to create it.
Subscribing to a topic in Apache Kafka is also possible by simply using the CAP default messaging API. The following example creates a subscription to the some-topic topic in Kafka and the callback will be triggered whenever a new message is being published to the broker:
const kafka = await cds.connect.to('kafka-messaging')
kafka.on('some-topic', (message) => console.log(message.data))
By default, each on-Call will create a subscription to the defined topic. So it is also possible that a CAP applications will subscribe to and receive messages from multiple topics.
As the above examples have shown, the usage of cds-kafka pretty much feels like using any other existing messaging service in CAP. But since Kafka also has a special set of features, there are some more things to look at.
By default, cds-kafka expects the name of the Kafka topic and the event type to be the same. As an example, sending an event my/custom/event to Kafka, would result in a topic with the same name to be created and used by CAP. However, there are possibilities to diverge from this rule and it is also possible to simply route all event types to a single topic or create a wild mix of topics and event types.
Apache Kafka is so successful, because it provides a lot of flexibility and this also comes down to handling topics and its messages.
cds-kafka also has support for scenarios where the event type and the topic do not match (more on this below). To still have all the important information at hand, cds-kafka will add two headers to each message, that contain the event and topic name:
This information can simply be accessed by inspecting the message headers:
kafka.on("my/custom/event", (message) => {
console.log(message.headers["x-sap-cap-event"])
console.log(message.headers["x-sap-cap-kafka-topic"])
})
As mentioned before, cds-kafka internally uses KafkaJS to connect to and communicate with Kafka. Therefore, all connection options provided within the credentials section of the service configuration are directly routed to KafkaJS. This enables flexible access to local or remote Kafka systems, including authentication.
All available configuration options are provided in the KakfkaJS documentation.
Next to the broker endpoints, a clientId can be specified. The purpose of the clientId within Kafka is to be able to track the source of requests allowing a logical application name to be included in server-side request logging. When not provided, cds-kafka will create a dynamic clientId based on the environment. A clientId should be provided and shared across a horizontally scaled application, but be distinct for each application.
The following shows an example configuration for accessing a local dockerized Kafka instance running on the default port 9092:
"cds": {
"requires": {
"kafka-messaging": {
"kind": "kafka-service",
"credentials": {
"clientId": "my-cap-app",
"brokers": [
"localhost:9092"
]
}
}
}
}
By default, cds-kafka subscribes to topics based on the event name specified in the observer registration (on calls). However, unlike standard CAP behavior, cds-kafka extends this capability by allowing subscriptions to multiple topics using regular expressions. Regular expressions enable subscribing to multiple topics with matching patterns. Since CAP’s internal API only supports string-based subscriptions, the two additional headers (see above) are used to track the information:
It's important to note, that regex-based subscriptions only work for already existing topics at the time of subscription. If a new matching topic is added after the application has started, the application must be restarted to receive messages from this topic.
Next to regular expressions, there is also support for a catch-all subscription which allows listening to all topics of a broker, excluding Kafka’s private ones (those starting with __). Internally, a regex subscription is created to match all public topics dynamically.
The following snippet shows how both options can be used:
const kafka = await cds.connect.to('kafka-messaging')
// Subscribe all topics matching the pattern (e.g. topic-A-1, topic-B-1...)
kafka.on(/topic-(A|B|C)-.*/i, (message) => console.log(message))
// Subscribe all topics (excluding the Kafka private ones starting with __)
kafka.on("*", (message) => console.log(message))
While these features provide flexibility, it’s essential to carefully manage regex and catch-all subscriptions in production environments to avoid unintended message processing or performance issues. These options are particularly useful for dynamic architectures where topic names are not static or predefined.
CAP has native support for the CloudEvents specification (see documentation). By setting the message format to cloudevents supporting metadata will be added to all messages. cds-kafka also supports CloudEvents adhering to the official CloudEvents specification for Apache Kafka. While the specification distinguishes between wrapping the message data and adding headers, cds-kafka enables both at the same time.
The reasons for this is, that CAP internally takes care of CloudEvents metadata and the incoming message is already stripped of all CloudEvents metadata. Because this metadata might be required, the header data will still be available when processing the incoming message.
The following example shows an inspection of a message in the event handler. The CloudEvents header fields (prefixed with ce-) are still visible and can be read programatically.
{
data: { subject: 'Hello World' },
headers: {
'ce-specversion': '1.0',
'ce-id': 'df396250-cb0d-4f88-97c4-0aa25faa12ef',
'ce-type': 'some-topic',
'ce-source': 'cap-kafka-example-app-1',
'ce-time': '2025-01-02T15:59:24.265Z',
'ce-datacontenttype': 'application/json',
},
event: 'some-topic'
}
Kafka has been briefly explained at the beginning of the post and while the illustration was showcasing some of the core concepts, it was a little simplified. Apache Kafka’s architecture involves several advanced features that add flexibility and scalability but also introduce complexity. Key concepts such as topics, partitions, keys, consumer groups, and offsets play a crucial role in Kafka’s operation. cds-kafka has support for all of those features, but before showcasing, let’s explore these in more detail.
All of those aspects can be used with cda-kafka to also connect to more complex Kafka architectures.
Every CAP application using cds-kafka and connecting to Kafka for message consumption is treated as one consumer. If not specified, the groupId will automatically be generated (by using the VCAP application or the local process ID). However, the groupId can also be provided via the configuration. Thus it is possible connect the application to an existing consumer group or to have multiple instances of the same application running in parallel, each as an individual consumer within the consumer group.
"cds": {
"requires": {
"kafka-messaging": {
...
"groupId": "my-group"
...
}
}
}
The default behavior when subscribing to an existing topic with a new groupId will not consume any existing messages, but internally set the offset to the latest message. Only new arriving messages will then be consumed by the application. However, it is possible to override this behavior: By directly naming the topic in the configuration and setting fromBeginning: true the service will start to consume all existing messages being in the topic's retention period.
"cds": {
"requires": {
"kafka-messaging": {
...
"topics": [{
"some-topic": {
"fromBeginning": true
}
}]
...
}
}
}
When emitting a message with cap-kafka, the default behavior is to send the message to a random partition, with the event name used as the topic. However, this behavior can be customized to suit more complex architectural needs. The following custom headers can be used to override the default behavior:
The following example show how those headers can be set. If set, those headers will not be sent to Kafka, but only used internally for applying the specific behavior.
const messaging = await cds.connect.to('kafka-messaging')
await messaging.emit({
event: 'ObjectCreated',
data: { subject: 'Hello World' },
headers: {
'@kafka.key': '1234',
'@kafka.partition': "0",
'@kafka.topic': 'cap.test.object.created.v1',
}
});
While the @kafka headers are only used internally, every message consumed by cds-kafka (even those not using the @kafk headers) will receive a set of additional headers providing information on Kafka internals. Next to the already described x-sap-cap-kafka-topic, the following header fields are available:
These capabilities allow cap-kafka to support more sophisticated setups, such as architectures with specific partitioning strategies or dynamic topic routing, while not breaking CAP’s internal APIs.
In this blog post, I explained the basic concepts of Apache Kafka and introduced cds-kafka, an open-source CAP plugin that extends CAP’s messaging capabilities to integrate with Kafka. I also highlighted the plugin’s features and how it bridges SAP applications with external Kafka instances.
In the upcoming second post, I will demonstrate a practical example application, showcasing how to use cds-kafka both locally and on SAP BTP, connecting to an external cloud-based Kafka instance.
While SAP-provided message brokers remain the preferred choice in a fully SAP-centric ecosystem, Apache Kafka and cds-kafka provide a powerful alternative for connecting SAP applications to external event streams. Additionally, with CAP being a great and versatile framework, it can serve as an excellent choice even outside the traditional SAP environment.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
| User | Count |
|---|---|
| 26 | |
| 26 | |
| 21 | |
| 21 | |
| 19 | |
| 14 | |
| 14 | |
| 14 | |
| 14 | |
| 10 |