In this blog, we will cover the SAP HANA Text Capabilities.
We are in world of google, where we are getting immediate results to our queries irrespective of the fact whether we know what we are looking for or not. Similar functionality is achieved using SAP HANA Full Text Search. TREX Engine/In-database Search engine which is one of the core component of HANA used for search enablement, text analysis and mining. Figure1 below shows supported text capabilities in HANA platform.
Figure1: HANA Platform Text Capabilities
To understand the core concept of SAP HANA Text Capabilities, we have divided the blog into two parts :
- “SAP HANA FULL TEXT SEARCH” to get an understanding of EXACT, LINGUISTIC and FUZZY Text Search”
- “TEXT ANALYSIS” and configurations available for text analysis. https://blogs.sap.com/2018/02/01/sap-hana-text-analysis/
SAP HANA Full Text Search
Exact, Fuzzy (Error Tolerant) and Linguistic Search
An important goal of SAP HANA is to provide powerful analysis, text indexing and search capabilities that support a state-of-the-art search experience. This includes full text search with advanced features such as free style search (without knowing the name of attributes in the database), linguistic search and fault-tolerant fuzzy search.
The main function of SAP HANA text search is performing fast searches for words that are contained in texts stored in database tables. This works not only for text stored in character fields but also for documents stored as BLOBs or CLOBs. The full text search provides you with the capability to identify natural language terms that satisfy a query and, optionally, to sort them by relevance (ranking) to the query. The most common type of search is to find the texts that contain the specified terms and return results in the order of their similarity to the specified terms.
SAP HANA text search is not limited to finding exact matches. It supports fuzzy search, which finds similar words. Fuzzy search is a fast and fault-tolerant search feature of SAP HANA. The basic concept behind the fault-tolerant search is that a database query returns records even if the search term (user input) contains additional or missing characters, or even spelling errors. Fuzzy search can be used in various applications -- for example, to trigger a fault-tolerant search in a structured database content, like a search for a product called 'coffe krisp biscuit' and you can also find 'Toffee Crisp Biscuits'.
Along with that, linguistic form (singular or plural form) of the word can also be obtained during search by using Linguistic search like search for go and you can also find “went” “going”.
Free Style Search is a search on multiple columns. We have listed couple of examples in below section covering all types of searches (Listed commands can be executed in SQL Console).
****************************************************************************
CREATE COLUMN TABLE
// Full Text Index created on column Company_name to get all search features and to improve // performance
// Implicit Full Text Index
CREATE COLUMN TABLE COMPANIES(
ID INTEGER PRIMARY KEY,
COMPANY_NAME SHORTTEXT(200) FUZZY SEARCH INDEX ON);
****************************************************************************
INSERT RECORDS
INSERT INTO COMPANIES VALUES (1, 'SAP');
INSERT INTO COMPANIES VALUES (2, 'SAP in Walldorf');
INSERT INTO COMPANIES VALUES (3, 'SAP AG');
INSERT INTO COMPANIES VALUES (4, 'ASAP Corp');
INSERT INTO COMPANIES VALUES (5, 'BSAP orp');
INSERT INTO COMPANIES VALUES (6, 'IBM Corp');
INSERT INTO COMPANIES VALUES (7, 'INTEL Corporation');
***************************************************************************
EXACT SEARCH
Select Commands:
// Search is not case sensitive
// “Contains” Predicate used to search table and views
// Below select statement is show you records where SAP AG (entire word) is present.
SELECT * FROM COMPANIES
WHERE CONTAINS (COMPANY_NAME, 'sap ag', EXACT);
Figure 2a shows the output of select statement.
Figure 2a: Result Set of Exact Search
// OR, AND Operators can also be used while performing search
// Below select statement will show the records where either “sap” or “corp” is present.
SELECT * FROM COMPANIES
WHERE CONTAINS (COMPANY_NAME, 'sap OR corp', EXACT);
Figure 2b shows the output of select statement.
Figure 2b: Result Set of Exact Search
************************************************************************
FUZZY SEARCH
Select Commands:
// 1.0 fault tolerant means strings are identical
// 0.0 fault tolerant means strings have nothing in common
// Higher the score more similar the string is
// Fault-tolerant of 0.8 is selected in this case.
// Here, the FUZZY() searches all the strings in the table with the search string and gives the best matching results having SCORE() greater than 0.8.
// Highlighted use to surround HTML bold text around the search term
SELECT SCORE() AS score, ID, HIGHLIGHTED(COMPANY_NAME) FROM COMPANIES
WHERE CONTAINS(COMPANY_NAME,'SAP', FUZZY(0.8))
ORDER BY score DESC;
Figure 3a shows the output of select statement.
Figure 3a: Result Set of Fuzzy Search
// Fault-tolerant of 0.8 in this case.
// Here, the FUZZY() searches all the strings in the table with the search string and gives the best matching results having SCORE() greater than 0.8.
// Notice the difference between search and compare in next examples. As the part of result it also includes the strings composed of two words. To make it simple, even if the search string is part of word, it will be shown in result set when you use “similarCalculationMode” as search.
SELECT SCORE() AS score, * FROM COMPANIES
WHERE CONTAINS(COMPANY_NAME,'SAP', FUZZY(0.8,'similarCalculationMode=search'))
ORDER BY score DESC;
Figure 3b shows the output of select statement.
Figure 3b: Result Set of Fuzzy Search
// Fault-tolerant of 0.8 in this case.
// Here, the FUZZY() compares all the strings in the table with the search string and gives the best matching results having SCORE() greater than 0.8
SELECT SCORE() AS score, * FROM COMPANIES
WHERE CONTAINS(COMPANY_NAME,'SAP',FUZZY(0.8,'similarCalculationMode=compare'))
ORDER BY score DESC;
Figure 3c shows the output of select statement.
Figure 3c: Result Set of Fuzzy Search
************************************************************************
LINGUISTIC SEARCH
// Create and Insert command
// Explicit Full Text Index
create column table ZLINGUISTIC
(
"LINGUISTIC_KEY" INTEGER,
"SPRAS" NVARCHAR(2),
"TEXT" NVARCHAR(1000)
);
INSERT INTO ZLINGUISTIC VALUES (1,'DE','ER GEHT');
INSERT INTO ZLINGUISTIC VALUES (1,'EN','HE IS GOING');
INSERT INTO ZLINGUISTIC VALUES (2,'EN','ER GEHT');
INSERT INTO ZLINGUISTIC VALUES (2,'DE','HE WENT');
CREATE FULLTEXT INDEX FTI_ZLINGUISTIC ON "ZLINGUISTIC" ("TEXT") FAST PREPROCESS OFF ASYNC ;
SELECT * FROM ZLINGUISTIC WHERE CONTAINS(TEXT, 'go', linguistic);
Figure 4 shows the output of select statement.
Figure 4: Result Set of Linguistic Search
************************************************************************
To summarize, SAP HANA text search is used to perform fast search for words that are contained in texts stored in database tables. In next blog we will discuss “TEXT ANALYSIS” and configurations available for text analysis.
For details on Full Text Index, go through the blog[
https://blogs.sap.com/2018/02/15/sap-hana-full-text-index/].
To create Custom Rule Set, refer to blog [
https://blogs.sap.com/2018/03/15/custom-rule-set-sap-hana-text-search/ ].