Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
This blog post content is to explain budding SAP HANA and MDG Developers and consultants help them to understand on some SAP HANA Search Annotations of Fuzzy Search that is also used in Matching feature of MDG Consolidation.
The budding consultants may not have the understanding of different annotations and value range that can be used for Fuzzy Search. This blog provides some insights on value ranges applicable for HANA Search Annotations (mainly focusing on Fuzzy) along with default and applicable data types.


MDG - Configure SAP HANA Fuzzy Searching


Ranking and Weightage

The first thing to be noted when it comes to Match or Search is the Ranking or weight of a column that is used to calculate the overall score.
It is interesting, but important to be noted that only one of the annotations Weight or Ranking can be used for a column while set up.
Weight is of type Decimal and usually ranging between 0.0 to 1.0
Ranking on the other hand is NOT between 0.0 – 1.0, but #High, #Medium and #Low
('HIGH' = 1.0, 'MEDIUM' = 0.7, 'LOW' = 0.5)

Fuzzy Search

A fuzzy search is done by means of a fuzzy matching program, which returns a list of results based on likely relevance even though search argument words and spellings may not exactly match.
Fuzzy Search is a fast and fault-tolerant search feature for SAP HANA. A fuzzy search returns records even if the search term contains additional or missing characters or other types of spelling errors.
Default value is 1. This means that an exact search is performed.
While setting up Match and using Fuzzy Search, the understanding of the following Fuzzy Search Option will enable a consultant or developer to cater their customer requirement better.

1) emptyScore - Defines how an empty value and a non-empty value match.

Range: 0.0 to 1.0
Default: None
Applies to Data Types: Text, String, Date

2) emptyMatchesNull - Returns null values if an empty value is searched.

Range: On,off,true,false
Default: off
Applies to Data Types: Text, String, Date

3) interScriptMatching - Activates fuzzy matching across different scripts (for example, simplified chinese and pinyin).

Range: On,off,true,false
Default: off
Applies to Data Types: Text, String

At present, only Chinese characters are supported for inter-script matching.
When comparing Chinese and Latin characters with interScriptMatching=on, a pinyin transcription is used to transcribe the sound of Chinese characters into Latin script

4) spellCheckFactor - Sets the score for strings that get a fuzzy score of 1.0 but are not fully equal.

Range: 0.0 to 1.0
Default: 0.9
Applies to Data Types: Text, String

There are two use cases for option spellCheckFactor

a) This option allows you to set the score for terms that are not fully equal but that would be a 100% match because of the internal character standardization used by the fuzzy search.
For example, the terms 'Café' and 'cafe' give a score of 1.0 although the terms are not equal. For some users it might be necessary to distinguish between the terms.
The decision whether two terms are equal is based on the term representation stored in the column dictionary. Option spellCheckFactor therefore works differently on string and text columns, as described in the following sections.

b) The fuzzy search can return a 100% match for terms that are not identical but cannot be differentiated by the fuzzy-string-compare algorithm.
For example, the fuzzy search cannot differentiate between the terms 'abaca' and 'acaba'. In this case, the spellCheckFactor can be used to avoid a score of 1.0.

If A) and B) are not needed by an application, you can set the spellCheckFactor to 1.0 to disable the feature.

5) abbreviationSimilarity - Activates abbreviation similarity and sets the score.

Range: 0.0 to 1.0
Default: 0.0
Applies to Data Types: Text

Example:
Original Term: Café
Standardized term: cafe

6) andSymmetric - Activates a symmetric AND content search

Range: On,off,true,false
Default: off
Applies to Data Types: Text

7) andThreshold - Activates a 'soft AND' and determines the percentage of the tokens that need to match.

Range: 0.0 to 1.0
Default: 1.0
Applies to Data Types: Text

😎 bestMatchingTokenWeight - Influences the score, shifts total score value between best token score values and root mean square of score values.

Range: 0.0 to 1.0
Default: 0.0
Applies to Data Types: Text

9) composeWords - The maximum number of consecutive words from user input to be composed (default value 1 means composition is disabled by default).

Range: 1 to 5
Default: 1
Applies to Data Types: Text


10) decomposeWords - The maximum number of words into which a word from the user input is decomposed (default value 1 means composition is disabled by default).

Range: 1 to 5
Default: 1
Applies to Data Types: Text


11) compoundWordWeight - Term mapping weight for (de)compositions from (de)composeWords.

Range: 0.0 to 1.0
Default: 0.9
Applies to Data Types: Text

12) considerNonMatchingTokens - Influences the score, defines the number of terms used for score calculation.

Range: max, min, all, input, table
Default: max
Applies to Data Types: Text

13) excessTokenWeight - Defines the weight of excess tokens to improve sort order.

Range: 0.0 to 1.0
Default: 1.0
Applies to Data Types: Text

14) minTextScore - Minimum score of a TEXT field; if this score is not reached, the record is not part of the result

Range: 0.0 to 1.0
Default: 0.0
Applies to Data Types: Text

15) phraseCheckFactor - The overall fuzzy score of a text column is multiplied with this value if the search terms do not appear in the correct order

Range: 0.0 to 1.0
Default: 1.0
Applies to Data Types: Text

16) maxDateDistance - Specifies the allowed date distance when using fuzzy search on dates.

Range: 0 to 100
Default: 0
Applies to Data Types: Date

17) similarCalculationMode - Defines how the score is calculated for a comparison of strings (or terms in a text column).

Range: search, compare, symmetricsearch, substringsearch, searchcomparesee, typeaheadsee
Default: compare
Applies to Data Types: Text, String


Finally, the purpose for this blog post is intended to serve as a reference point while setting up Fuzzy search requirement. Usually the SAP Developer or Consultant gets requirement for matching in MDG Consolidation or creating HANA Search views where Fuzzy might be underutilized and not up to it's potential. When I started to explore this as a beginner, it took some time to understand what these annotations are, what values to be used and what combinations to be used. Sometime the uncommon annotations remains unused due to lack of awareness or understanding and thus unable to cater the need of the customer. I thought this could be a good starting point for the beginners and people who do not have much understanding of fuzzy search. In the future blog posts, I will deep dive more by taking some examples and share my learnings !

Well, that is all I wanted to share with you in this release, if you have any questions or thoughts feel free to post in the comments section or you can also ask directly in the SAP Community 

For deep dive in this topic please read more on SAP HANA Search.

 
1 Comment
Labels in this area