As someone that regularly works in a space referred to by SAP as ‘Integration Architect’, I recently decided to work through the openSAP course
Simplify Integration with SAP Cloud Platform Integration Suite, of June 2019. Early in this material we see a demonstration of a pre-packaged CPI ’Integration Flow’: SuccessFactors to SAP HCM ‘Employee Data Replication’. A second demonstration was later provided showing the replication of Exchange Rate data from S/4 to SuccessFactors, using another pre-packaged CPI Integration Flow (or ‘iFlow'). It then became apparent that SAP has decided to solve the – reborn – problem of the ‘Golden Record’ using Data Replication; there is no ‘Golden Record’, but instead ‘all-records-are-equal’.
A question that remained, after seeing how easy it was to set up these iFlows using CPI, was what happens if I want to also send employee data changed in SuccessFactors to a second or third system (upon the basis of a brute-force, all-records-are-equal approach), and what if one of those (possibly non-SAP) systems is not covered by an existing CPI iFlow? The result is clear, you need to build a parallel solution, using an entirely different approach, and both solutions must be maintained in parallel. At this point in time one must ask, given this complexity and future ‘Lock-In’ risks, if it wouldn’t be more simple to build only a single custom integration that manages all of the companies integration needs; one which does not carry any licensing costs? I suspect the answer should be straight-forward for most.
And what of the second example, Exchange Rate Data Replication? Could we imagine this data needing to be diffused to more than only SuccessFactors? Could we possibly even imagine a need for real-time Exchange Rate data? This latter would be problematic using the CPI iFlow approach, as this CPI Integration is scheduled; not event-driven. In any case, let us imagine a scenario where an enterprise has five diverse back-end systems that all depend upon near-real-time Exchange Rate data, where all five are miraculously covered by pre-packaged CPI Integration Content. Each one of the five separate iFlows would need to be scheduled to run every 2-5 seconds = up to 150 job runs per minute on CPI for a single data replication requirement (and regardless of whether the Rate has changed or not). This turns the very simple need for an Exchange Rate ‘Golden Record’ into a costly and complex proposition; should we pretend that all five back-ends were by some lucky circumstance all covered by pre-packaged CPI Integration Content, and that this doesn’t change (i.e. there is never a need to add a new, non-included system).
The Course goes on to discuss CPI ‘API Management’ for the ‘monetization of your developed APIs’. Here, we are told to “Think about launching an API like launching a product. It’s about doing research, finding the right API to build, identifying who will use the API, what will they be willing to pay”. At this point in time, it occurred to me that I have never seen an ERP client in twenty years that develops completely new SOAP APIs for its external partners. Indeed, it seems that such APIs are normally the domain of (software/) ERP vendors; not ERP clients. If an enterprise makes the decision to install a large and complex ERP landscape, it could be argued that it has also made a decision NOT to develop their own custom solutions; or at least to develop as few as possible. And for those custom integrations that they will inevitably need to put in place with their partners, don't they usually exploit the large number of well-established, publicly-available standards for mass data exchange, such as EDIFACT, iDoc, cXML, to name but a few? Isn’t that precisely why such standards for data exchange are public, and stable?
We then learn about SAP’s ‘Digital Integration Hub’, a recent product offering from SAP “for implementing large-scale, high-throughput APIs by inserting a high-performance, In-Memory ‘Data Store Layer’, between the ‘API Service Layer’ and the System-of-Record”. It is the ‘Integration Layer’ component of the Digital Integration Hub that “Keeps the system of record sources and high-performance data store in sync”. As such, it seems that the fake-gold records that have already been duplicated across various back-end systems, via scheduled jobs running on CIP, must now also be duplicated into the Digital Integration Hub’s In-Memory ‘Data Store Layer’. Additionally, if you haven’t already developed a custom API for each of your custom integration needs, then you have nothing to expose to your partners in the Digital Integration Hub’s customer-facing ‘API Service Layer’. As a Cloud-based solution, you have probably also guessed that irrespective of any large investments your company has already made to beef-up your existing On-Prem S/4 DB Server for the robust needs of In-Memory computing, none of that memory is useful in a Cloud-based scenario: you must instead buy a subscription for the Cloud-resident ‘In-Memory Data Grid’ of ‘HANA-as-a-Service’. You must then use HANA’s ‘Smart Data Integration’ (‘Data Provisioning Agent’) for ‘live’ Data Replication into the new Cloud-based ‘In-Memory Data Grid’ (multiplied by the number of exposed APIs). This solution nonetheless “Decouples front-end API services from the system of record for fast response time”; as you want your partners to be willing to pay you for API-use. It also “Enables front-end API services to access data scattered across multiple back-end systems”; thereby making the aggregated and exposed fake-gold records ephemeral.
More recently launched by SAP, and therefore not covered in the mentioned Course, is the Cloud-based ‘SAP Graph’, a Beta-product launched as recently as September 2019, that is “still under development”. SAP Graph “wraps the APIs of existing [SAP] products into a single harmonized API layer across the existing [SAP] source systems”. It is a “network of connected data objects that are stored and owned across different SAP solutions and technologies. The data objects that are made accessible through SAP Graph in an interconnected data model can be consumed through an HTTP-based API, which exposes data from multiple SAP systems in a unified schema” = in a ‘harmonized Entity-Layer’ (albeit one with no concrete existence). However, SAP Graph will offer only a ‘curated’ set of APIs; re-exposing approximately 20% of existing back-end APIs with normalized signatures (because developers should hopefully find this approach less complex). And what if your solution requires one of the 80% of APIs that are not re-exposed by SAP Graph, or APIs that are hosted by non-SAP products? Once again, you can put in place a parallel solution, using a completely different technology, and hope that you never need to add non-SAP (or old-SAP) products to the integrations that you spent weeks developing with SAP Graph; at which point in time you will need to start again (having already paid the relevant subscription).
What is quite interesting about this, is that I only just heard in the same openSAP CPI course that thanks to the ‘High-Performance Data Store’ of SAP’s new Digital Integration Hub, we can now build “a single, consolidated view of entities, the data for which is stored in one or multiple [SAP or non-SAP] System-of-Records” (in an In-Memory cache). But what is most worrying of all about SAP’s ‘API-first’ dogma, is the willingness to confuse APIs – the core building block of SOA macroservices – with data, which exists completely independently of any particular architectural pattern. SAP Graph, “a single harmonized API layer”, exposes a “network of connected data objects” in a harmonized data model; which “can be consumed through an HTTP-based API, which exposes data from multiple SAP systems” (but not from non-SAP systems).
Likewise, SAP Graph and SAP’s Digital Integration Hub – both being fully API-centric – have no role to play whatsoever in an ‘Event Driven Architecture’ (something SAP also mentions from time-to-time when discussing its new ‘Enterprise Messaging’ product). Why do I mention ‘Events’? Because people typically talk about either data, or transactions; about master data, or transactional data. In fact, each of these things are ‘Events’, and that is why ‘Event Driven Architectures’ will quickly replace APIs for internal macroservices; APIs only ever being needed for external partners, and often not even then. To provide an illustration: if a customer requests a Sales Order, that Request represents an Event, and the eventual Order Creation also represents an Event; as does any subsequent Change to the Order (an Event typically referred to by SAP users as ‘VA02’). Events represent instances of ‘transactions’, and those events are stored in the Database as ‘data’ records; for which reason, you will have a hard time finding any single record in the DB that is not the direct consequence of an ‘Event’ (eg. ‘EmployeeHired’). This is precisely why an ‘Event Driven Architecture’ so naturally lends itself to the creation of a harmonized ‘Entity-Layer’; something far more natural, and far more useful, than a harmonized ‘API-Layer’. There is, conversely, no conceptual link between data and APIs.
In order to help SAP towards a shift in mindset from the (SOAP-based) ‘Service-Oriented Architecture’ that reigned-supreme between approximately 2007 and 2017, towards the ‘Event-Driven Architecture’ that is very quickly growing in popularity today – and which happens to be a perfect pattern for the macroservices needed in ERP-centric landscapes – I will need to make a possibly uncomfortable point: The Application needed to build an In-Memory, Event-Driven, fault-tolerant, high-availability, harmonized and centralized common ‘Entity-Layer’ – addressable by HTTP calls – is already freely available within the Open Source Community, and it fully supports On-Premise landscapes. The ‘High-Performance Data Store’ of SAP’s Digital Integration Hub – with which we can build “a single, consolidated view of entities, the data for which is stored in one or multiple System-of-Records” – is no more than the ‘State Store’ of Apache ‘Kafka Streams’; something that can be run In-Memory, out-of-the-box. Perhaps even more interesting in this regard, is that SAP already provides free integration for Kafka (
https://github.com/SAP/kafka-connect-sap).
Incoming Events can be ‘folded’ into their corresponding ‘State Stores’ using the ‘Event Sourcing’ Pattern, meaning that events are merged as they arrive, in real-time – not using schedules – and that the aggregated entity record that can subsequently be queried by SAP or non-SAP clients, represents the ‘Golden Record’ of each ‘Entity’. Given that SAP already uses Open Source solutions in its commercial products (e.g. PostgreSQL in ‘API Management’), there is no time like the present; SAP’s clients need a harmonized and centralized common ‘Entity-Layer’. The 1970's response to the Golden Record problem was the 'ERP' – where there is no (single) ‘Golden Record’, there is no ‘ERP’. Some fifty years later, it seems clear that ‘Event Sourcing’ is the ideal modern response to this re-emerged problem; one that ERPs can perhaps no longer solve, most evidently in the new context of IoT and Edge Computing.
What’s more, the various Kafka ‘Topics’ (eg. ‘SalesOrders’) fed into the ‘State Stores’ provide a full – auditable – history of all Business Entity Events – coming from any number of SAP or non-SAP back-ends (Cloud or On-Prem) – that can be easily interrogated by each client using temporal queries in order to return only new, unprocessed events. In this case, the last read ‘Offset’ of each Topic – always managed by each client – would represent the equivalent of an OData ‘e-Tag’. That’s important, because it solves a fundamental problem of the ever-growing number of Offline-Mobile scenarios (in a fashion very similar to that which I described in my Blog:
How to build a ‘Rolling-Delta Database’ for Offline Mobile scenarios). Conversely, how do SAP Graph and SAP’s Digital Integration Hub help to meet the growing need for Delta-queries – multiplied by the number of mobile devices – on the ever changing entities referenced in Offline-Mobile scenarios?