cancel
Showing results for 
Search instead for 
Did you mean: 

Fuzzy Search Performance using Full-Text indexes

matma24
Participant
1,231

Hello!

Unfortunately I've ran into some issues with performance of the fuzzy search. I'm running a fuzzy search on a table with 9-10m records, table is not partitioned, full-text indexes are present (TEXT data type so they are created by default) and fuzzy search index on is turned.

create column table company_data
(
id nvarchar(50),
company_name nvarchar(1000),
company_name_without_legal_type TEXT FUZZY SEARCH INDEX ON,
company_legal_type nvarchar(50) FUZZY SEARCH INDEX ON
)
select score,
	id,
	company_name
from (
	select top 10 to_decimal(max(score()),3,2) as score,
		cd.id,
		cd.company_name,
		row_number() over (partition by cd.id order by max(score()) desc) as row_num
	from company_data cd
	where contains(cd.company_name_without_legal_type, 'SAP', fuzzy(0.8, 'andSymmetric=on,andThreshold=0.75,bestMatchingTokenWeight=0.5,textSearch=compare,minTextScore=0.7'))
		and contains(cd.company_legal_type, 'SE', fuzzy(1, 'similarCalculationMode=compare,emptyScore=1'))
	group by cd.id,
		cd.company_name
	order by score desc, cd.id desc
) 
where row_num = 1 and score >= 0.85;

This is an example call.

Data in company_data contains all historical data of every company so row_number() is needed to get 10best unique companies that matched.

After this data is retrieved I'm running some more queries but only this part is causing slow runtime, for some calls I get runtime of over 700ms which is quite large and in my opinion should not be as that slow. Of course I know andSymmetric and andThreshold does influence performance a lot but not making it few times slower than without those two parameters.

On the 1000 companies that I've tested - I've got mean of 170ms, 90% of requests are below 430ms and max_time of over 800ms.

Would be really grateful for all hints and ideas how I can speed it up (ideal speed is 90% of requests below 250ms).

Best regards,

Mateusz

View Entire Topic
0 Kudos

Hello matma24,

I'm facing also performance issues when performing the FUZZY search, did you managed to get any solution?

BR,

André

matma24
Participant
0 Kudos

Please post your code and some details about table DDL, data volume etc. Maybe I will be able to help.

0 Kudos

Hey Mark,

-> How I created the FULLTEXT INDEX: CREATE FULLTEXT INDEX FZSR_FULL_NAME on "SAPDAT"."/SIE/FL_2BP_FZSR" ( "FULL_NAME" ) FUZZY SEARCH INDEX ON FAST PREPROCESS ON TEXT MINING OFF TEXT ANALYSIS OFF SYNC;

-> Table /SIE/FL_2BP_FZSR has 4.6M records

-> Field FULL_NAME has 160 chars

->SQL Sample:

SELECT DISTINCT SCORE() AS SCORE, MANDT, PARTNER, COUNTRY, NATION, BU_GROUP, BPEXT, ORGID, Z_ORGID_A, POST_CODE1, POST_CODE2, REGION, PO_BOX, FULL_NAME, LEGAL_FORM, NAME_MINUS_LFORM, FULL_STREET, CITY1, PO_BOX_LOC, TYPE, DUNS, XBLCK FROM "SAPDAT"."/SIE/FL_2BP_FZSR"

WHERE NATION = 'I'

AND COUNTRY = 'US'

AND( CONTAINS( FULL_NAME ,' MOHAMMADI ' ,FUZZY( 0.001 ,' interScriptMatching=on, similarCalculationMode=compare, textSearch=compare, spellCheckFactor=1.0, andThreshold=0.00, returnAll=on, andSymmetric=on '),WEIGHT( 1.00 ) ) )

AND SCORE() > '0.10 '

and MANDT = '100'

ORDER BY SCORE DESC , PARTNER ASC LIMIT 100

Normaly this query takes 4 seconds. That's too much for our mass processing processes.

Any tip to speed up this query?

Thank you.

BR,

André

jerryjanda
Community Manager
Community Manager

Hi, asousa84

Please stop posting new questions as answers to an existing question. The correct approach would have been to leave a comment or create a new question.

Since Mark has chosen to help you, we're going to leave this thread, but I ask that you post your responses as comments to your original "answer," not as new answers, so as not to confuse other members who might be searching for a solution to this problem (forcing them to scroll through a thread with a bunch of answers that aren't answers at all).

Kind regards,

--Jerry

Moderation Lead