Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
Vitaliy-R
Developer Advocate
Developer Advocate
1,611
I hope you haven't missed the announcement of SAP Data Intelligence, trial edition 3.1 posted by dimitri last week! Please do not miss as well exercises from SAP TechEd hands-on sessions that were published too. One of them describes steps to read and load data into SAP HANA.

In my post, I would like to share a slightly different approach: with no data, but control flowing between operators. I will use the same setup and scenario from my last post where I was scripting the import of multiple files into SAP HANA Cloud from S3 cloud storage.

In a nutshell, I want to do automate a load of multiple files stored in a single Amazon S3 bucket into the corresponding tables in the SAP HANA Cloud database. In my exercise, I work with 25 files generated for TPC-DS. Some of these files have significant sizes, but further optimization of their load is not in the scope for now.

The setup...


...is taken from the previous post, assuming the files are generated and stored in the S3 bucket, the PSE in the database is all configured, and we can use the same S3Reader user credentials (but do not try to copy/paste this key/secret, as I recreated them after the publication 🙂 ).

The graph...


... has at least 3 operators:

  1. The one to list all required files from the S3 bucket,

  2. A custom operator to build SQL statements to import data files + truncate, in case of reloads of this initial step to populate tables,

  3. HANA Client to execute SQL statements.



But I included two more operators to detect and terminate execution when the last IMPORT is done.

Btw, I used two different programming languages for custom operators -- JavaScript and Python -- not for any other reason than showing possibilities. The logic could (and in normal situations as well should) be done without multiplying variations in one solution.

List files


This operator works only with connections defined in the Connection Manager, so I had to configure the connection to the S3 bucket there first.


Please note:

  • RegEx-based patter in the filter; it would allow you to reload only selected files, e.g. /sf0001/data/init/\S*_dim\S* would reload only dimension files.

  • We are loading from S3, where "catalog structure" is a prefix of objects stored in the flat structure. Therefore /sf0001/data/init/ is a part of the name.


Convert the list into SQL statements


Just remember to replace the AWS region, the bucket name, and user credentials with your own 😉
$.setPortCallback("input",onInput);

function onInput(ctx,s) {
var msg = {};

var inbody = s.Body;
var inattributes = s.Attributes;

msg.Attributes = {};
msg.Attributes["message.lastBatch"]=inattributes["message.lastBatch"];

var theS3Obj = inattributes.file.path;
var tableName = (theS3Obj.substring(theS3Obj.lastIndexOf('/')+1).split('.'))[0];
msg.Body = "TRUNCATE TABLE TPCDS." + tableName + "; ";
msg.Body+= "IMPORT FROM CSV FILE 's3-eu-central-1://AKIAUJZDQQSWQ3BAMYWJ:QSlFbJGngCdQyp+Ft3fUw6S8g9iCMvmCCVFK0Jb6@tpcds4hdbc" + theS3Obj + "' INTO TPCDS." + tableName + " WITH FIELD DELIMITED BY '|' THREADS 2;"
msg.Attributes["sql_stmnt"]=msg.Body;

$.output(msg);
}

SAP HANA Client


Make sure you set a long enough time-out, so that IMPORT has enough time to finish without being terminated by the client. The value 0 means no time-out, which should not be used in the real situation.



The graph termination


Because we send two SQL statements (TRUNCATE and IMPORT) with every incoming message the client sends two outgoing messages -- once for each completed statement -- even if both are treated as a database transaction.
lastBatchesReceived=0

def on_input(msg):
global lastBatchesReceived
if 'message.lastBatch' in msg.attributes and msg.attributes['message.lastBatch'] == True: lastBatchesReceived+= 1
if lastBatchesReceived > 1: api.send("stopsig", lastBatchesReceived)

api.set_port_callback("sqlresult", on_input)

So, we need to have a counter two collect two messages with the 'message.lastBatch' equal to True before sending a signal to the Graph Termination operator.




It was an example -- kind of a prototype -- quick and dirty development. I will appreciate your comments on how you would improve this graph.

Enjoy the exploration of these trial editions!
-Vitaliy aka @Sygyzmundovych

 
3 Comments