cancel
Showing results for 
Search instead for 
Did you mean: 

Prevent solr search term to be splitted JA&ZH

fionnziegler2
Explorer
0 Kudos

Hi,

any solr expert knows how to prevent numbers and text to be splitted in ZH & JA?

The search "33test33" shouldn't be split into "33" "test" "33"

Using the default config:

<fieldType name="text_ja" class="solr.TextField" positionIncrementGap="100"><br>   <analyzer type="index"><br>      <tokenizer class="solr.JapaneseTokenizerFactory" mode="search" /><br>      <filter class="solr.JapaneseBaseFormFilterFactory" /><br><br>      <filter class="solr.JapanesePartOfSpeechStopFilterFactory" tags="1.txt"/><br>      <filter class="solr.CJKWidthFilterFactory" /><br>      <!--filter class="solr.ManagedStopFilterFactory" managed="ja" /--><br>      <!-- <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ja.txt" /> --><br>      <filter class="solr.StopFilterFactory" ignoreCase="true" words="1.txt"/><br>      <filter class="solr.LowerCaseFilterFactory" /><br>      <!-- <filter class="solr.KeywordRepeatFilterFactory" /> --><br>      <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" /><br>      <filter class="solr.JapaneseKatakanaStemFilterFactory" minimumLength="4" /><br>      <filter class="solr.RemoveDuplicatesTokenFilterFactory" /><br>   </analyzer><br>   <analyzer type="query"><br>      <tokenizer class="solr.JapaneseTokenizerFactory" mode="search" /><br>      <filter class="solr.JapaneseBaseFormFilterFactory" /><br>      <filter class="solr.JapanesePartOfSpeechStopFilterFactory" tags="lang/stoptags_ja.txt" /><br>      <filter class="solr.CJKWidthFilterFactory" /><br>      <filter class="solr.ManagedSynonymGraphFilterFactory" managed="ja" /><br>      <filter class="solr.ManagedStopFilterFactory" managed="ja" /><br>      <!-- <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ja.txt" /> --><br>      <filter class="solr.LowerCaseFilterFactory" /><br>      <!-- <filter class="solr.KeywordRepeatFilterFactory" /> --><br>      <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" /><br>      <filter class="solr.JapaneseKatakanaStemFilterFactory" minimumLength="4" /><br>      <filter class="solr.RemoveDuplicatesTokenFilterFactory" /><br>   </analyzer><br></fieldType><br>

Accepted Solutions (0)

Answers (1)

Answers (1)

mansurarisoy
Contributor
0 Kudos

I think you can find an answer if you search / ask your question in a more related forum like Stackoverflow since Solr is not a core component of SAP Commerce.

However, I think your problem stems from the tokenizer used in Japanese language (solr.JapaneseTokenizerFactory). There are some attributes to configure Japanese tokenizer (https://solr.apache.org/guide/8_4/language-analysis.html#japanese-tokenizer) that may help you to achieve what you need but I am not sure.

More general info about general tokenizer can be found in https://solr.apache.org/guide/8_4/tokenizers.html

You can navigate the related Solr version that you are using in the documentation.

Hope this helps.