on 2022 Jan 25 8:45 AM
Hi,
any solr expert knows how to prevent numbers and text to be splitted in ZH & JA?
The search "33test33" shouldn't be split into "33" "test" "33"
Using the default config:
<fieldType name="text_ja" class="solr.TextField" positionIncrementGap="100"><br> <analyzer type="index"><br> <tokenizer class="solr.JapaneseTokenizerFactory" mode="search" /><br> <filter class="solr.JapaneseBaseFormFilterFactory" /><br><br> <filter class="solr.JapanesePartOfSpeechStopFilterFactory" tags="1.txt"/><br> <filter class="solr.CJKWidthFilterFactory" /><br> <!--filter class="solr.ManagedStopFilterFactory" managed="ja" /--><br> <!-- <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ja.txt" /> --><br> <filter class="solr.StopFilterFactory" ignoreCase="true" words="1.txt"/><br> <filter class="solr.LowerCaseFilterFactory" /><br> <!-- <filter class="solr.KeywordRepeatFilterFactory" /> --><br> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" /><br> <filter class="solr.JapaneseKatakanaStemFilterFactory" minimumLength="4" /><br> <filter class="solr.RemoveDuplicatesTokenFilterFactory" /><br> </analyzer><br> <analyzer type="query"><br> <tokenizer class="solr.JapaneseTokenizerFactory" mode="search" /><br> <filter class="solr.JapaneseBaseFormFilterFactory" /><br> <filter class="solr.JapanesePartOfSpeechStopFilterFactory" tags="lang/stoptags_ja.txt" /><br> <filter class="solr.CJKWidthFilterFactory" /><br> <filter class="solr.ManagedSynonymGraphFilterFactory" managed="ja" /><br> <filter class="solr.ManagedStopFilterFactory" managed="ja" /><br> <!-- <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ja.txt" /> --><br> <filter class="solr.LowerCaseFilterFactory" /><br> <!-- <filter class="solr.KeywordRepeatFilterFactory" /> --><br> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" /><br> <filter class="solr.JapaneseKatakanaStemFilterFactory" minimumLength="4" /><br> <filter class="solr.RemoveDuplicatesTokenFilterFactory" /><br> </analyzer><br></fieldType><br>
Request clarification before answering.
I think you can find an answer if you search / ask your question in a more related forum like Stackoverflow since Solr is not a core component of SAP Commerce.
However, I think your problem stems from the tokenizer used in Japanese language (solr.JapaneseTokenizerFactory). There are some attributes to configure Japanese tokenizer (https://solr.apache.org/guide/8_4/language-analysis.html#japanese-tokenizer) that may help you to achieve what you need but I am not sure.
More general info about general tokenizer can be found in https://solr.apache.org/guide/8_4/tokenizers.html
You can navigate the related Solr version that you are using in the documentation.
Hope this helps.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
| User | Count |
|---|---|
| 1 | |
| 1 | |
| 1 | |
| 1 | |
| 1 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.