cancel
Showing results for 
Search instead for 
Did you mean: 

solr free text search - why does fuzzy match score higher than exact match?

Former Member
0 Kudos
1,534

Hi,

We are using the B2B Accelerator and we are seeing some strange sorting in the results while searching on "ford" - we see products without "ford" but with "cord" getting a higher score (max 996.4704) in solr than the products that actually have "ford" in them (max 926.636). Can anyone explain the following scores? The first two products that we see are the two products that are in stock (we don't use ean, manufacterName or keywords and the codes are numbers only)

 {
   "responseHeader": {
     "status": 0,
     "QTime": 10,
     "params": {
       "q": "(ean_string:(ford^200.0 OR ford*^100.0)) OR (code_string:(ford^180.0 OR ford*^90.0)) OR (name_text_en:(ford^100.0 OR ford*^50.0 OR ford~^25.0)) OR (manufacturerName_text:(ford^80.0 OR ford*^40.0 OR ford~^20.0)) OR (keywords_text_en:(ford^40.0 OR ford*^20.0 OR ford~^10.0))",
       "indent": "true",
       "fl": "name_text_en, instockFlag_boolean, score",
       "start": "0",
       "fq": "(((catalogId:\"autoverProductCatalog\") AND (catalogVersion:Online)))",
       "sort": "inStockFlag_boolean desc,score desc",
       "rows": "20",
       "wt": "json",
       "_": "1446049871206"
     }
   },
   "response": {
     "numFound": 8363,
     "start": 0,
     "maxScore": 996.4704,
     "docs": [
       {
         "name_text_en": "2401850200 FORD GALAXY MPV 95-06 WS GN MODIFIED MIR BRACKET SDR (95-99)",
         "score": 542.94055
       },
       {
         "name_text_en": "2402651120 FORD FOCUS 3D HBK/2D CC/4D SED/5D BRK/5D HBK 04-11 WS GN BL VIN TC ABSORBING MODIFIED HARDWARE MIR BRACKET SDR",
         "score": 358.08783
       },
       {
         "name_text_en": "9102010100 CORD RUNNER SHORT",
         "score": 996.4704
       },
       {
         "name_text_en": "8751783030 FORD 655D 93- FQR GN",
         "score": 926.636
       },
       {
         "name_text_en": "2451003570 FORD CARGO 81-93 FVR BR",
         "score": 897.44885
       },
       {
         "name_text_en": "9102010120 CORD RUNNER 9MTRS",
         "score": 871.9116
       },
       {
         "name_text_en": "9102010121 REPLACEMENT CORD 9MTRS",
         "score": 871.9116
       },
       {
         "name_text_en": "9102010123 REPLACEMENT CORD 100MTRS",
         "score": 871.9116
       },
       {
         "name_text_en": "9102010124 REPLACEMENT CORD 100MTRS",
         "score": 871.9116
       },
       {
         "name_text_en": "9102010125 REPLACEMENT CORD 100MTRS",
         "score": 871.9116
       },
       {
         "name_text_en": "2451004010 FORD CARGO 81-93 FDR CL",
         "score": 835.01294
       },
       {
         "name_text_en": "2451004070 FORD CARGO 81-93 FDR BR",
         "score": 835.01294
       },
       {
         "name_text_en": "9201138152 CUTTER BLADE HOOK FORM FEIN 38MM 2 PCS",
         "score": 812.5509
       },
       {
         "name_text_en": "2401855500 FORD GALAXY MPV 95-06 DFS",
         "score": 789.41077
       },
       {
         "name_text_en": "2450050000 FORD A MODEL 74-77 WS CL",
         "score": 789.41077
       },
       {
         "name_text_en": "2450060000 FORD TRANSCONTINENTAL 75-85 WS CL",
         "score": 789.41077
       },
       {
         "name_text_en": "2450060200 FORD TRANSCONTINENTAL 75-85 WS GN",
         "score": 789.41077
       },
       {
         "name_text_en": "2450060600 FORD TRANSCONTINENTAL 75-85 WS BR",
         "score": 789.41077
       },
       {
         "name_text_en": "2450064000 FORD TRANSCONTINENTAL 75-85 FDL CL",
         "score": 789.41077
       },
       {
         "name_text_en": "2451000000 FORD CARGO 81-93 WS CL",
         "score": 789.41077
       }
     ]
   }
 }

Accepted Solutions (0)

Answers (1)

Answers (1)

Former Member
0 Kudos

Hi,

To diagnose query behavior, you can enable query debugging with the debugQuery query parameter.

We would expect that products with fields containing "Ford" would get top scores. Can you execute the query mentioned with debugQuery=On

Then you can analyse the debug output for mathematical breakdown of the various components of the score

Check how much boost is given for the term "CORD" compared to "FORD"

Fuzzy matching gives higher weights to stronger matches. However their are other factors which can pull final score in other direction. Check the fieldNorm for both terms "Ford" and "Cord" Product having more score has greater value for fieldNorm.

FieldNorm is calculated based on number of terms in the field e.g.

9102010121 REPLACEMENT CORD 9MTRS - 4 terms

2450064000 FORD TRANSCONTINENTAL 75-85 FDL CL - 6 terms

Another factor : It also takes TF/IDF into account.If IDF for "CORD" is higher thar "FORD" then "CORD" will be a closer match than "FORD" to the fuzzy query.

IDF matches on rarer terms count more than matches on common terms

Think about IDF as a measure of uniqueness. It helps to identify what it is that makes a given product special. This needs to be much more sophisticated than how often you use a given search term.

To fix this, you can enable omitNorms on search field in the schema; However that might reduce scoring effectiveness for other queries.

Regards,

Reena