Introduction
In this blog post I want to share some ideas and experiences with implementing a component using SAP
Java
Connector to easily access data and integrate non-SAP processes like data pipelines in big data clusters with the SAP Netweaver Application Server. This finally led to our solution called i-OhJa.
About two years ago we started working on a project for some customer in the financial sector. The main task was to set up a Hadoop stack as the new company wide BI platform for big data and streaming. One requirement in the migration phase was to stage load data from SAP BW (on Oracle) to Kafka, another one to load data from Kafka to BW in streaming mode. During this time Kafka streams was out of scope, so we decided to implement our dataflows in Spark streaming. This was the first project to run i-OhJa on customer side in a Spark cluster.
Prior to the project we already started evaluating SAP JCo as a library not only providing a JDBC driver for databases in a SAP Netweaver system, but being able to tightly integrate SAP and non-SAP applications in a flexible way. Our goal for i-OhJa was to implement a thin library that meets the requirements mentioned above in a highly regulated sector of financial institutions, without the need to introduce new complex infrastructures or clusters. JCo being a mighty but also basic Java library with the ability to establish a trusted and encrypted connection to SAP Netweaver systems using SNC and SSO provided by the SAP Cryptolib was the best fit for this scenario.
To be honest, the last time I implemented software in Java and J2EE using Eclipse IDE was back in 2008. After a long period mostly utilizing ABAP in SAP Netweaver based systems, I was pleased to use a more modern IDE and a functional and type safe language like Scala as our choice for developing i-OhJa. In the above-mentioned scenario, we chose to use SAP BW Open Hubs and SAP Netweaver events to get data out of BW. For the opposite direction we used Webservice DataSources to load data into BW in a streaming and real-time manner. All these interfaces are based on RFC enabled function modules and have a tight integration into the monitoring and scheduling environment of SAP BW.
EL(T)
Since there are already a bunch of ETL tools out there and as we wanted to keep i-OhJa as thin as possible, we decided to cover only the
Extraction and
Load of ELT. This also follows some basic ideas of modern microservice architectures. In the scenarios mentioned above this means that the data held in Kafka is a close replication of the data being transferred via OpenHub or Webservice DataSource.
Type safety and wrappers
A thing we like most about Scala besides its functional style is keeping our code type safe. A short time after starting to implement i-OhJa we decided to wrap all necessary JCo classes in Scala classes for being able to use them natively in e.g. pattern matching and collection functions and providing type safety and immutability. We provide class wrappers for function templates, function requests, function responses, tables, records/structures, record fields, all elementary types available in JCo 3.1, parameter lists, corresponding meta data classes of JCo, DDIC types, message types, selection ranges, aggregation functions, ABAP exceptions, JCo destinations, BAPI return messages, BAPI transaction commit and rollback in JCo context and more.
Code generation
With one of our customers we saw customer specific ELT frameworks written in Scala using JSON-like configuration files to define individual data pipelines. Concepts following this dynamic principle have some drawbacks:
- The configuration files need to be parsed with every execution of the batch or streaming pipeline
- Issues in the configuration files besides syntactic correctness lead to runtime exceptions because they bypass type checks of the compiler.
- The ELT framework implementation makes heavy use of methods like asInstanceOf and isInstanceOf to cope with the vast amount of different interface data structures and types. As all the ELT logic is kept in configuration files the individual data types are known as soon as these files are being parsed, which is at runtime of the pipelines.
- Using configuration files to define pipelines makes them more flexible. They can be changed and maintained without the need to change the underlying code. Therefore, the changes don’t need to be recompiled and you don’t have to follow the build and release cycle of the customer.
We consider this approach as a drawback here. Changes to configuration files are comparable to code changes bypassing all the code checks and quality gates implemented in your continuous integration processes.
Us being supporters of static type safety, we decided to take a different approach to handle the challenge of diverse data structures. We implemented Scala code generators producing case classes representing the structure of individual RFC function modules.
An example of the generated classes for a simple BAPI like BAPI_ODSO_GETLIST, delivering a list of available DSOs in a SAP BW system, would look like this:
case class JCoBapiOdsoGetlistWrapper(
importing: JCoBapiOdsoGetlistWrapper.Importing = JCoBapiOdsoGetlistWrapper.Importing(),
exporting: JCoBapiOdsoGetlistWrapper.Exporting = JCoBapiOdsoGetlistWrapper.Exporting(),
changing: JCoBapiOdsoGetlistWrapper.Changing = JCoBapiOdsoGetlistWrapper.Changing(),
tables: JCoBapiOdsoGetlistWrapper.Tables) extends JCoBapiOdsoGetlistWrapper.JCoFunctionWrapper {…}
case class Importing(objvers: Option[Char] = Some('R')) extends JCoParametersWrapper {…}
case class Exporting(`return`: Exporting.Return = Exporting.Return()) extends JCoParametersWrapper {…}
case class Changing() extends JCoParametersWrapper {…}
case class Tables(
odsobjectlist: Seq[Tables.Odsobjectlist] = Nil,
selodsobject: Seq[Tables.Selodsobject] = Nil,
seltextlong: Seq[Tables.Seltextlong] = Nil) extends JCoParametersWrapper {…}
case class Return(
`type`: Option[Char] = None,
id: Option[String] = None,
number: Option[String] = None,
message: Option[String] = None,
logNo: Option[String] = None,
logMsgNo: Option[String] = None,
messageV1: Option[String] = None,
messageV2: Option[String] = None,
messageV3: Option[String] = None,
messageV4: Option[String] = None,
parameter: Option[String] = None,
row: Option[Int] = None,
field: Option[String] = None,
system: Option[String] = None) extends JCoRecordWrapper {…}
case class Odsobjectlist(
odsobject: Option[String] = None,
objvers: Option[Char] = None,
textlong: Option[String] = None,
objstat: Option[String] = None,
activfl: Option[Char] = None,
infoarea: Option[String] = None,
odsotype: Option[Char] = None) extends JCoRecordWrapper {…}
case class Selodsobject(
sign: Option[Char] = None,
option: Option[String] = None,
odsobjectlow: Option[String] = None,
odsobjecthigh: Option[String] = None) extends JCoRecordWrapper {…}
case class Seltextlong(
sign: Option[Char] = None,
option: Option[String] = None,
textlonglow: Option[String] = None,
textlonghigh: Option[String] = None) extends JCoRecordWrapper {…}
Client and server
JCo consists of two types of library components: JCo server and JCo client. For now, it is enough to know that JCo Server will be used in cases where SAP Netweaver ABAP initiates communication (here: OpenHub scenario). Whereas JCo client is used if non-SAP applications make calls to RFCs (here: Webservice DataSource push).
BW and ODP specific client libraries
For interfaces, protocols and function libraries that are more complex than just calling one single RFC function module, it makes sense to provide specific client library modules.
We have done so for the good old BW OpenHub connections, BW BAPIs and Operational Data Provisioning. As one might expect all these interfaces are based on plain RFC functions, even ODPs as successor of the business content DataSources, which were based on ALE/IDocs.
A challenge you will face when querying data through generic interfaces like OpenHubs, BW BAPIs or ODPs serving different kind of data structures is that data is serialized as text encoded strings with a record being a concatenation of different data types in a single string. An obvious option to reconstruct data structures would be to parse these strings directly into native Java or Scala objects. We took an alternative approach by parsing the strings into wrapped JCo structures created on the fly based on the meta information we get from the corresponding function modules. This way we use JCo to convert the strings into proper data types and we can afterwards make use of all the methods provided by i-OhJa for the respective wrapper classes.
JCo allows creating JCoStructure and JCoTable objects based on JCoRecordMetaData objects. In a normal use case one would get these meta data objects by querying the repository, but in this scenario the structures need to be built at runtime by making use of the
JCo.createRecordMetaData method. A challenge we faced was providing byte lengths, offsets and alignments to the
JCoRecordMetaData.add methods for all the data types a field of a structure can have. i-OhJa offers the ability to calculate the byte offsets and lengths for all JCo type wrappers. Together with conversion methods for all the different DDIC and InfoObject types it can create structures and even whole JCoFunction instances based on different kind of metadata.
Reactive OpenHub server using Akka
For being able to listen for new incoming data and requests i-OhJa provides a server component with services wrapped around JCo server. Services have to extend a defined service trait and have to implement the logic for listening and processing incoming requests from a SAP system. They can also make use of the i-OhJa client libraries to request data from the SAP system. The services can be provided at runtime by registering to an i-OhJa server instance. Some predelivered services like the OpenHub extraction provide another interface for being able to register so called data adapters. The services will pass the processed data to the registered data adapter that contains the logic to finally pass or store the data. Predelivered data adapters for the OpenHub service can store data in a CSV file, output data in the console or publish data to a Kafka topic in a Kafka cluster. Further services or data adapters can be implemented according to customer specific requirements.
For being able to react fast on incoming requests, a dispatcher instance forwards requests instantly to the corresponding services. The OpenHub service uses Akka to process several OpenHub request-packages asynchronously and in parallel. A backpressure algorithm takes care of in-/decreasing the amount of data packages extracted and processed in parallel according to the resources being available.
Converters and SerDe
A typical scenario in JCo applications is to register a destination provider before using a destination object to retrieve a function object from the repository. This mutable function object contains all meta data of the corresponding RFC function module together with the state of parameter values and exceptions before and after calling the function module. In short, this object contains all the information we send to and get back from the Netweaver system using function modules.
In several cases you will have the need to serialize this object or parts of it, e.g. when sending data to Kafka or storing RFC responses for use in unit tests. Another advantage of serializing function objects is that after storing a serialized RFC response you don’t need an active SAP connection anymore to create function objects. This is an alternative to using JCo function templates, which are supported by i-OhJa too.
You can serialize a function object easily by using native Java serialization with several drawbacks, like not being able to deserialize it when upgrading JCo versions and so on. But this was only one reason for us to implement (de-)serialization using different kind of data formats.
So far, we have implemented and used converters and SerDes for the following data formats:
- i-OhJa Scala wrappers and case classes from JCo Java classes:
For all the converters and SerDes implemented, i-OhJa provides implicit Scala conversion classes that just need to be imported in your application code. JCo is completely and transparently hidden behind the corresponding Scala wrappers and you do not have to worry about internal conversion and usage of JCo.
- JSON:
The JSON SerDe provides lossless bidirectional serialization and deserialization holding data values and types in the same structure. We recommend the use of JSON as serialization format e.g. for RFC function templates, so you will be able to store the interface definition of common RFC enabled function modules or BAPIs provided by SAP Netweaver systems in a human readable way.
- Avro 1.9.2:
Avro is widely used when it comes to serialization in big data systems like Kafka. It was our first choice for serializing data coming from SAP in the scenarios mentioned in the introduction. It supports all the data types we need like deep structures and separates the schema definition from the actual serialized data. i-OhJa can be used to apply lossless bidirectional serialization and deserialization using Avro. The generated schema definition can be used to e.g. include it in the stream of data published to Kafka, it can be stored in a AVSC file or send to a schema registry service.
- Protobuf v3:
The Protobuf SerDe uses a fixed schema defined in a .proto file according to the type wrapper classes used to represent RFC requests and responses and all dependent types like records, tables and elementary types.
- CSV:
The structure of CSV files only allows to store two-dimensional data like tables, so currently there is only support to serialize tables and records used in function modules in CSV files.
- Spark:
Spark structured streaming uses its own schema definition defined in the package sql. Data structures can be nested deeply in objects of the class sql.Row. Spark includes all data types and schema structures necessary to convert all kind of JCo types into Spark SQL types. i-OhJa includes corresponding conversion methods, which were used when implementing the data pipelines in our customers projects.
- Kafka connect:
Kafka connect defines unique schema and data objects for internal use. This allows to apply common data type converters supplied by Confluent to data streams created by diffent kind of source connectors. In a POC we implemented a flexible Kafka connect SAP Netwevaer source connector tested using the debezium embedded engine for use in the Confluent platform.
- Native object serialization:
One can use native Scala and Java object serialization for the wrapper classes of i-OhJa or the POJOs that can be extracted from wrapped data.
UI
We provide two types of user interfaces for interaction with i-OhJa: A
Command
Line
Interface and a simple server and client Web UI based on Twirl and Akka HTTP. Akka HTTP is used for building a reactive HTTP server, while Twirl allows to generate HTML pages based on native Scala code. This is an easy way to provide direct access to all kind of Scala based functionality provided by i-OhJa that is wrapped around JCo.
The following screenshots show basic web pages demonstrating possibilities to monitor and access i-OhJa.
The Server page is a simple page containing information about registered services and their RFC modules as well as running instances. In the Image shown above you can see the progress of extracting an OpenHub request.
The i-OhJa client page gives an overview of artifacts available in a connected SAP Netweaver system like RFCs, ODPs, OpenHubs and so on.
The page for a SAP BW DataStoreObject gives you an overview of the properties of a DSO and the available fields respectively InfoObjects. One can download the schema of the data in various formats or open a data preview for active data of the DSO for being able to see what the structure of the data being extracted will look like.
Testing and mocks
When it comes to unit testing of interface library functions like the ones contained in i-OhJa, we first must provide server and client mocks for being able to execute local tests without the need to establish a connection to a SAP Netweaver system. This was achieved by extending the wrapped JCo classes like JCoDestination, DestinationDataProvider, ServerDataProvider, JCoServerFactory etc.
With the mocks in place you just have to implement the behavior of the mocks based on RFC function calls. A simple scala test for the BI client method to retrieve a list of OpenHub objects available in a SAP system looks like this:
This test implementation uses a trace of an RFC call stored in a json file to retrieve the list of OpenHubs without establishing a connection to a SAP system.
For generating test data, one has basically two options:
- Create a function template object and use generators like the ones from ScalaCheck to fill the parameters of the created function objects with generated values.
- Trace SAP RFC function calls once, serialize and store them in object files using a SerDe and use these to replay a specific scenario locally without an active connection to a SAP system. This is the scenario we can see in the example provided above.
Side notes
- Before starting to work with JCo it is worth to know that renaming the JCo-jar leads to a runtime exception because SAP doesn’t allow renaming this file:
Exception in thread "main" java.lang.ExceptionInInitializerError: JCo initialization failed with java.lang.ExceptionInInitializerError:
Illegal JCo archive "sapjco-3.1.2.jar". It is not allowed to rename or repackage the original archive "sapjco3.jar".
at com.sap.conn.jco.rt.MiddlewareJavaRfc.<clinit>(MiddlewareJavaRfc.java:165)
at com.sap.conn.jco.rt.DefaultJCoRuntime.initialize(DefaultJCoRuntime.java:78)
at com.sap.conn.jco.rt.JCoRuntimeFactory.<clinit>(JCoRuntimeFactory.java:23)
This is an issue when working with local artifactories storing artifacts using the Maven naming conventions. One workaround is to store the native libraries and the jar as a zip-file and configuring local unzipping as part of your POM. Another solution is to provide the path to the native library when starting your application using jvm parameter -Djco.library=".\sapjco-3.1.2.dll". If doing so the jar needs to be stored at the same path as the native library, but you will be able to rename it.
- When running JCo in a Hortonworks HDP cluster we realized that using isDestinationDataProviderRegistered() does not lead to the expected results. This will often return false, but if you try to register a destination provider afterwards you will get a runtime exception saying that a destination provider was already registered.
- An attentive reader might have noticed the default value ‘R’ for the object version in the importing parameter list of the generated code example for BAPI BAPI_ODSO_GETLIST. This seems to be a bug in JCo not being able to resolve parameter default values based on type pool constants like RS_C_OBJVERS-ACTIVE.
Summary
SAP JCo and IDoc provide basic APIs for a flexible and extensive interaction with SAP Netweaver application servers. We have implemented i-OhJa as a Scala framework for being able to use SAP JCo in a functional, typesafe and easy way. Furthermore, it simplifies accessing and transforming data transmitted via RFCs. We have successfully utilized it in a Big Data scenario using Apache Spark to integrate streaming pipelines with SAP Netweaver systems. This kind of integration is much more flexible compared to simple JDBC drivers with direct database access and allows to combine any kind of services and processes available in Java and SAP Netweaver based systems.