Have you ever used the CREATE REFERENCE statement in SAP HANA smart data streaming? It allows you to combine incoming events with data from an external database. Used effectively, it gives you important context for the information you’re receiving and helps you make decisions by showing you the complete picture.
How to Use It
Suppose you run an online clothing store. Orders are processed as event data that flows into your smart data streaming project. With event data alone, you would only know how many of each item you’ve sold and your remaining inventory—important information, but not enough for you to make informed decisions.
Now, imagine you have existing customer data stored in a HANA table in an external database, including customer IDs, names, and addresses. You can build your streaming project to use the CREATE REFERENCE statement to query the external database for customer information every time an order is made. Now you know which of your customers buys what, how often, and from where, giving you a complete picture of what’s happening in your store.
With a bit of tinkering and creativity, you can also combine a CREATE REFERENCE statement with other tools in smart data streaming. Jeff Wootton
wrote a post showing how you can configure your streaming project to send e-mail alerts whenever a machine shuts down for an extended period of time. You can use the SMTP Output adapter to send automated e-mail for the event, and the CREATE REFERENCE statement populates it with information about the machine (such as machine ID and past temperatures). You can learn about this procedure
here.
How it Works
The basic syntax for a CREATE REFERENCE statement looks like this:
CREATE REFERENCE name
SCHEMA schema_clause
[PRIMARY KEY (column1, column2, ...)]
PROPERTIES service = 'service name',
source = 'table name'
[, sourceSchema = 'schema name' ]
[, maxReconnectAttempts = integer]
[, reconnectAttemptDelayMSec = integer]
[, exitOnConnectionError = true|false]
[, cachePolicy = 'NONE'|'ONACCESS']
[, maxCacheSizeMB = integer] [, maxCacheAge = interval];
The parameters below are mandatory:
Parameter |
Description |
CREATE REFERENCE name |
The name you give to the statement. |
schema_clause |
The schema for the join table. It should match the schema of the reference table. Refer to Datatype Mapping for the Database Adapter to map different datatypes to streaming. |
service name |
The name of the service being used to connect to the external database. It must be defined in the cluster. |
table name |
The name of the reference table. It should match the name the database table you’re querying. |
The PRIMARY KEY is technically optional, but should be specified whenever possible as omitting it affects the types of joins the reference can participate in. It should match the primary key of the table. Also note that schema name, which specifies the database schema of the source, is mandatory if the reference table has a primary key.
The other parameters in the statement are entirely optional and add functions like max reconnect attempts and cache policies. You can learn more about them in the
CREATE REFERENCE Statement topic in the
SAP HANA Smart Data Streaming: CCL Reference.
Example
Going back to our example with the online clothing store, you can easily use the CREATE REFERENCE statement to join incoming event data with customer data in your HANA table in the external database. Here is just one way you can do it:
CREATE REFERENCE customerRef
SCHEMA ( customerID integer, fullName string, address string )
PRIMARY KEY (customerID)
PROPERTIES service='databaseServiceName', source='customerTable',
sourceSchema='databaseSchema', cachePolicy='NONE',
maxCacheSizeMB=0, maxCacheAge=0 minutes;
CREATE INPUT STREAM orderStream
SCHEMA ( orderID integer, customerID integer, itemID integer);
CREATE OUTPUT STREAM orderWithCustomerInfoStream
SCHEMA ( orderID integer, customerName string, customerAddress string )
AS SELECT orderStream.orderID, customerRef.fullName, customerRef.address
FROM orderStream, customerRef
WHERE orderstream.customerID = customerRef.customerID;
The table, called customerTable, includes the customerID, full name, and addresses of all your customers. The primary key of the table is the customerID column. The example above creates a reference to the table so that as orders from customers—identified by their customer ID—stream in, the database is queried for customer information that is then joined with the order information in an output stream.
If you aren’t incorporating external database tables into your smart data streaming project yet, start thinking of ways you can now! You can find more ideas and information on using the CREATE REFERENCE statement in the
CREATE REFERENCE statement topic within the
SAP HANA Smart Data Streaming: CCL Reference.