Hi Just thought to share my learning experience on Streaming of Tweets using Java & inserting all in HANA for further Text/Token Analysis, Idea is simple and straight forward how you can leverage the capabilities/Power of inbuilt capability of text analysis of SAP HANA on some real-time information & I found twitter is better place for collecting some real-time information for understanding the text analysis in better way. So below is a short implementation which I wanted to share with everyone. This has already been implemented by multiple people/organization hence I am just adding my experience/learning & challenges here. So, at the instance you think for implementing text analysis technology – Please keep in mind following things.
- In which language, you are going to write the code. it is Java in my case you can use Python as well.
- How will you get real time data (Do you have access to any API which can provide you some real-time information) Answer is Twitter API's are ones to provide all the real-time information which you are looking for? e.g. - You can perform analysis on Political tweets, Sports Tweets, Technological Tweets & Geo Tweets.
I opt for analyzing tweets related to SAP HANA (#SAPHANA, #IoT, #SAP) So these Hash tags will be used later for fetching tweets using Twitter API.
You will be navigated to developer page at Twitter. Click on create New App & fill the below required information.
Create your Twitter Application
Next step is to keep all the security tokens with you for Consuming Twitter API's, below is a Snap of the Security tokens of mine.
Now Click on create Access Token
Your Access token will be generated Successfully.
Download latest version of Twitter API for using it into your project. please click on below to Download latest version of Twitter 4j.
http://twitter4j.org/en/index.html.
below is a snap of latest Twitter4j API -
Twitter API libraries will be used later.
Install the SAP HANA Client if not installed, Get it from SAP Service Market place which would be having the jdbc library for accessing the HANA from java.
Go to Service Marketplace -> Software Downloads -> Installation and Upgrades - > Browse Our Download Catalog -> SAP in Memory (SAP HANA) -> SAP HANA Platform and download the HANA Client
below is a snap of HDB Client, Important thing to notice is - it must have JDBC inside this.
Install HDB Client on your machine(32 or 64 Bit check this before download)
Download Twitter-analysis App
here
Once done with above activities open eclipse IDE then open java perspective in package explorer -> right click here -> Import
Click Finish -> You project will be imported into package explore
Switch to HANA Development perspective for creating table which will store the Tweets information. execute the below commands of SAP HANA SQL Console.
SET SCHEMA "<YOUR_SCHEMA>";
CREATE COLUMN TABLE TWEETS(
"ID"
INTEGER NOT NULL,
"USER_NAME" NVARCHAR(100),
"CREATED_AT"
DATE,
"TEXT" NVARCHAR (140),
"HASH_TAGS" NVARCHAR (100),
PRIMARY KEY("ID"));
After creating the table in HANA, switch to configuration folder - change the config for HANA & Twitter connectivity. Open Java Configuration file & Perform the changes connecting the HANA Server.
1- Check if there is any proxy then make the proxy variable true & enter proxy details
2- Hana Database Host, Port, User, Schema & Password
3- Twitter tokens received above including Consumer keys & Secret keys.
4- Search Term What you want to fetch from Twitter like #SAP or #SAPHANA
After updating above details
Open the TwitterConnection.java & execute the file -
Test Connection to Twitter
Test Connection to SAP HANA
Open theHDBConnection.java & execute the file -
Before executing the TwitterSearch.java file, Configure TwitterApi properly then only you would be able to execute the Application else you will encounter errors like the Source of this class is not found hence i thought to mention how to configure source path for Twitter Api's.
Right Click on Project.
Click on Configure build path -> Click on Java build Path -> Add External Jars -> Go to libraries folder of Twitter4j -> Select All Jars.
make sure All jars are available in libraries folder.
Click on Apply this will make all the classes available for your application. you can see in reference library folder all the Jars are available.
>TweetDAO.java will be used for inserting the tweets data into HANA System, here SQL Statement is prepared first & then executed.
After completing all the config & code now it's time to invoke the twitter API for fetching the data from Twitter & insert the Tweets into HANA System. Execute the TwitterSearch.Java file.
Go to HANA System & and put a select on "Tweets" table
Now Leverage the text analysis capabilities of SAP HANA create Full Text Index on Tweets table here is the Syntax for that.
Create FullText Index "TWEETS_FTI" On "TWEETS"("TEXT")
TEXT ANALYSIS ON CONFIGURATION 'EXTRACTION_CORE';
As you execute the above command a FullText Index will be created on this table & text analysis will be on the Data of the table & additionally a $TA_TWEETS_FTI table will be created this table would be containing the token information for the Tweets data table.
Below is the structure of table $TA_TWEETS_FTI -
Now you can preview the data of $TA_TWEETS_FTI for getting the better understanding of the text analysis by SAP HANA.
So here is the Analysis done by SAP HANA Text Analysis capability -
In Above image you can see Search term #SAPHANA is highlighted & got the highest count in table now you can build your data model based on this $TA_TWEETS_FTI table & can put different where clause for analysis like Combination of tweets of SAP HANA & IOT or SAP HANA & Cloud etc.
Queries/Questions are most welcome.
Thanks.