Hi Just thought to share my learning experience on Streaming of Tweets using Java & inserting all in HANA for further Text/Token Analysis, Idea is simple and straight forward how you can leverage the capabilities/Power of inbuilt capability of text analysis of SAP HANA on some real-time information & I found twitter is better place for collecting some real-time information for understanding the text analysis in better way. So below is a short implementation which I wanted to share with everyone. This has already been implemented by multiple people/organization hence I am just adding my experience/learning & challenges here. So, at the instance you think for implementing text analysis technology – Please keep in mind following things.
In which language, you are going to write the code. it is Java in my case you can use Python as well.
How will you get real time data (Do you have access to any API which can provide you some real-time information) Answer is Twitter API's are ones to provide all the real-time information which you are looking for? e.g. - You can perform analysis on Political tweets, Sports Tweets, Technological Tweets & Geo Tweets.
I opt for analyzing tweets related to SAP HANA (#SAPHANA, #IoT, #SAP) So these Hash tags will be used later for fetching tweets using Twitter API.
Once done with above activities open eclipse IDE then open java perspective in package explorer -> right click here -> Import
Click Finish -> You project will be imported into package explore
Switch to HANA Development perspective for creating table which will store the Tweets information. execute the below commands of SAP HANA SQL Console.
"TEXT" NVARCHAR (140),
"HASH_TAGS" NVARCHAR (100),
After creating the table in HANA, switch to configuration folder - change the config for HANA & Twitter connectivity. Open Java Configuration file & Perform the changes connecting the HANA Server.
1- Check if there is any proxy then make the proxy variable true & enter proxy details
2- Hana Database Host, Port, User, Schema & Password
3- Twitter tokens received above including Consumer keys & Secret keys.
4- Search Term What you want to fetch from Twitter like #SAP or #SAPHANA
After updating above details
Open the TwitterConnection.java & execute the file -
Test Connection to Twitter
Test Connection to SAP HANA
Open theHDBConnection.java & execute the file -
Before executing the TwitterSearch.java file, Configure TwitterApi properly then only you would be able to execute the Application else you will encounter errors like the Source of this class is not found hence i thought to mention how to configure source path for Twitter Api's.
Right Click on Project.
Click on Configure build path -> Click on Java build Path -> Add External Jars -> Go to libraries folder of Twitter4j -> Select All Jars.
make sure All jars are available in libraries folder.
Click on Apply this will make all the classes available for your application. you can see in reference library folder all the Jars are available.
>TweetDAO.java will be used for inserting the tweets data into HANA System, here SQL Statement is prepared first & then executed.
After completing all the config & code now it's time to invoke the twitter API for fetching the data from Twitter & insert the Tweets into HANA System. Execute the TwitterSearch.Java file.
Go to HANA System & and put a select on "Tweets" table
Now Leverage the text analysis capabilities of SAP HANA create Full Text Index on Tweets table here is the Syntax for that.
Create FullText Index "TWEETS_FTI" On "TWEETS"("TEXT")
TEXT ANALYSIS ON CONFIGURATION 'EXTRACTION_CORE';
As you execute the above command a FullText Index will be created on this table & text analysis will be on the Data of the table & additionally a $TA_TWEETS_FTI table will be created this table would be containing the token information for the Tweets data table.
Below is the structure of table $TA_TWEETS_FTI -
Now you can preview the data of $TA_TWEETS_FTI for getting the better understanding of the text analysis by SAP HANA.
So here is the Analysis done by SAP HANA Text Analysis capability -
In Above image you can see Search term #SAPHANA is highlighted & got the highest count in table now you can build your data model based on this $TA_TWEETS_FTI table & can put different where clause for analysis like Combination of tweets of SAP HANA & IOT or SAP HANA & Cloud etc.