Hello viewers! This article covers the integration of SAP data services 4.2 SP 10 and above with Apache hadoop. From data services 4.2 Support Pack 10 and above, we have the flexibility to opt out the data services job server installation on hadoop name node, for more information: configuring WebHDFS would ease the job for us!!
You might be thinking that this is just another requirement and why do we need a new article for the same. We had received a demo request in the recent past from a prospect to integrate SAP data services 4.2 SP 10 version on Windows box having hadoop on Linux server. But until we had Data Services 4.2 Support Pack 10 it is a straight forward answer and it is: "Installing SAP Data Services job server components on one of the hadoop nodes", no matter whether it is on linux or Windows.
The reason behind this article is that i have not found good content(blog/document/guide) which would help configuring the same. Here is how it works from data services standpoint. Connectivity can be established via WebHDFS File Locations as below.
WebHDFS should be configured with port enabled on a HADOOP name node. Ideally it would be the same hadoop system that communicates with data services for hdfs andhive data transfer.
Target file should be a FlatFile format rather than hdfs format. For more information please follow "Supplement For Hadoop" for data services 4.2 SP 10 on wards.
Creating New File Locations:
With WebHDFS as communication protocol:
Testing SAP Data Services Read:
A test file has been placed under hdfs:// locations
Data in text.txt test file is "helloworld".
In the below test job source file "Location" in the file format is pointing to WebHDFS_File_Location. Manually enter the "File name(s)" while defining the format.
Target file in this test job is a local directory on windows server. C:\Temp. Up on successful job execution, file/files will be transferred to temporary location.
Testing SAP Data Services Write:
Consider there is a Pharma company that will need to cleanse and push down their claims data into their data lakes for further processing. Example, data exploration by data scientists.
We had created data services job to load end results in a PharmaClaims.csv file and dump in our target HDFS location. The target format should a data services FlatFile but not HDFS. However, the files will be placed under hdfs location itself on the hadoop node. For example a .csv file or .txt file will be the file extension in the file format as below.
We use the created File Location as "Location" in the file format. Upon successful execution, the same file has been pushed down to hdfs:// location as below.
Lastly, Without having data services job server on hadoop name node, now we can move files to hdfs and hive! Please go through the below Supplement for more information on the same., The supplement discusses the Data Services objects and processes related to accessing your Hadoop account for downloading and uploading data, and the processes for configuring these objects.
For more information refer the below Supplement for Hadoop