How to create a developer workspace in HANA Studio.
How to create & share a project in HANA Studio
Run HANA Text Analysis on a table
With release of HANA SPS07, a lot of new features are available. One of the main features is the support for custom dictionaries in Text Analysis. By default HANA comes with three configurations for text analysis:
Voice of Customer
One of the main issues you can come across while working on HANA Text Analysis is defining your own custom configurations for Text Analysis engine to work upon. In the following lines, you will find how to create your own custom dictionary, so you could benefit more from HANA text analysis capabilities.
Assume that your company manufactures laptops and have recently launched some new laptops series. You want to know if the consumers out there who have bought the machine are facing any problems or not. The consumers will be definitely tweeting, posting, blogging about the product on the social media.
You are now harvesting massive amount of unstructured data through social media, blogs, forums, e-mails and other mediums. The main motivation behind this will be to gain customer perception about the products (laptops). You may want to receive early warning of product defects and shortfalls and listen to channel and market-specific customer concerns and delights.
With HANA SPS07 we can create custom dictionaries which can be used to detect word/term/phrase occurrences which may not be detected while we run Text Analysis without any custom dictionary.
You need to follow the following steps to get started with custom dictionaries:
1. Create the source XML file
I have created some dummy data in a table with “ID” and “TEXT” columns.
User_tweets table structure
The #lenovo T540 laptop's latch are very loose.
my laptop's mic is too bad. It can't record any voice. will not be buying #lenovo in near future
LCD display is gone for my T520. Customer care too is pathetic.
T530 performance is awesome. Only problem I am facing is with microphone. 😞
The mycustomdict.xml file has the following structure:
After executing the above command a file named mycustomdic.nc will be generated in the
/<INSTALLATION_DIR>/<SID>/SYS/global/hdb/custom/config/lexicon/lang folder which will be later used by the text analysis engine.
3. Create custom HANA Text Analysis configuration file
After compiling the xml file, we need to create a custom text analysis configuration to refer to the compiled .nc file we created in the previous step. The configuration file specify the text analysis
processing steps to be performed, and the options to use for each step.
In HANA studio create a workspace and then create and share a project. Under this project create a new file with extension “hdbtextconfig”. Copy all the contents of one of the predefined configurations delivered by SAP as mentioned above. They are located in the HANA repository package: “sap.hana.ta.config”. For this scenario, I have copied the contents of the configuration file “EXTRACTION_CORE_VOICEOFCUSTOMER”.