CRM and CX Blog Posts by SAP
cancel
Showing results for 
Search instead for 
Did you mean: 
PawelPot
Advisor
Advisor
431

Introduction 

This article highlights specific situations and issues to overcome during Solr upgrade and migration process that is a part of migration projects to Commerce Cloud v2. The article is based on a project with version change from Solr Standalone 8.11 to Solr Cloud 9.x. 

 

High memory demand and many product indexes 

Description 

Example setup contained more than 60 product indexes, each containing up to 150,000 documents. The on-premise setup required increasing amounts of RAM, which was not sustainable in the long run. The default cloud environment did not have enough resources to handle the required size of RAM for Solr. 

Resolutions 

The first approach that should always be taken is to optimize usage of the current resources. If that isn’t enough for the issue, then you should take into consideration scaling up the resources of Solr. In that case support proposed us few ideas how to reduce RAM consumption and how to accelerate the indexation process. 

Suggester change 

Change of usage of the suggesters to FSTLookupFactory was the first idea. That suggester was supposed to be built slower but consume less RAM. It was said it isn’t supposed to be working that way in every case, and in our case the result was the opposite to what we expected. 

How to apply that change? 

Change the configuration in solrconfig.xml as per below example: 

From: 

 

 

 <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>

 

 

To: 

 

 

 <str name="lookupImpl">FSTLookupFactory</str> 

 

 

Disclaimer: Link below and other links may redirect you to SAP knowledge pages to which you need to have an account as SAP Partner or Customer. 

KBA with steps: 

https://me.sap.com/notes/2823614/E 

Apache Solr documentation mentions: 

https://solr.apache.org/guide/solr/latest/query-guide/suggester.html 

 

Index replications change 

Solr had 4 replicas on target cloud environment so to lower usage of RAM there was a temporary idea to use less product index replicas to keep indexes distributed on pods, so every Solr pod would not exceed RAM usage. That idea was taken to tests, it could be used to prepare environment before the scale-up, but it isn’t recommended to keep as main resolution. It is not predictable if Solr distributes indexes equally. We had to remove some replicas manually to keep it equally distributed while we tried it for the first time.  

Scale up of Solr service 

This example of environment finally needed 16 GiB of RAM per Solr service pod to not reach OOM (Out of Memory) error again. During usage it could reach around 15 GiB of RAM, but without indexations running, with full replicas (1 replica of index per pod) and without traffic on application there was usage around 10 – 13 GiB of RAM. 

 

Customizations on Solr indexes 

Description 

Customer had changes in index creation request logic which was taking schema name based on index name. During the upgrade, it was overlooked, so the first resolution to keep the behavior the same was to implement usage of many schemas in a way described in documentation. It took a lot of time and finally gave us more issues with Solr service on cloud. Changes made to adopt Solr behavior are described in Resolution 1 and Resolution 2, but the second approach presents a better way for many schema files with example customization. 

Resolution 1 

Moving from a standalone setup to a cloud environment required significant changes in how we managed schemas. We had to ensure that every schema.xml file was correctly configured and uploaded to ZooKeeper, which Solr Cloud uses as the base for collections. The main issue was that ZooKeeper took the whole folder of configuration and used default schema.xml name for every index which used that configset. So, as documentation of Solr described how to prepare many configurations we prepared Solr Cloud configuration in code repository. It introduced a significant amount of file duplicates in the repository. Below you can see a screenshot of the project structure. From one folder with around 300 files, where around 170 were schemas, we finished with configsets folder with around 170 folders each with more than 100 files. Below the screenshot is a listing of single configset folder structure. That resolution introduced many issues with out of memory of Solr pods disks and ZooKeeper pods disks, also Solr RAM OOM issues. It is not recommended to use that solution if there is need to use many different configurations only of schema file. 

PPotuczko_0-1740496276282.jpeg

 

 

EXAMPLE_DocumentIndex (104 files) 
|-- clustering/carrot2 
| |-- kmeans-attributes.xml 
| |-- lingo-attributes.xml 
| `-- stc-attributes.xml 
|-- lang (47 files) 
|-- velocity (36 files) 
|-- xslt (5 files) 
|-- admin-extra.html 
|-- admin-extra.menu-bottom.html 
|-- admin-extra.menu-top.html 
|-- currency.xml 
|-- elevate.xml 
|-- mapping-FoldToASCII.txt 
|-- mapping-ISOLatin1Accent.txt 
|-- protwords.txt 
|-- solrconfig.xml 
|-- spellings.txt 
|-- stopwords.txt 
|-- synonyms.txt 
`-- update-script.js 

 

 

 

Resolution 2 

The second approach was the right solution as we finally catched that there was the customization in Solr search provider code. Using different names of schemas is available through Solr Core Admin API as it provides fields to configure schema file names used for indexation. Check code listings below which present Solr Standalone and Solr Cloud Create Index requests configurations. To check possible fields to be set please refer to https://solr.apache.org/guide/solr/latest/configuration-guide/coreadmin-api.html site with documentation. 

Standalone version has its specific method to set schema name. It also supports modifying other fields like core node name, data dir or other. 

 

 

public class ExampleSolrStandaloneSearchProvider extends SolrStandaloneSearchProvider {

    private String separator;

    @Override
    protected List<ClusterNodeResponse<CoreAdminResponse>> doCreateIndex(final Index index, final CachedSolrClient solrClient, final List<String> nodes)
    {
        final String indexName = index.getName();
        final String configSet = resolveConfigSet(index);

        final CoreAdminRequest.Create request = new CoreAdminRequest.Create();
        request.setCoreName(indexName);
        request.setConfigSet(configSet);
        // This if block is changing default schema used if configset is set to default used for product and document indexations
        if (DEFAULT_CONFIGSET_VALUE.equals(configSet)) {
            final String schema = "schema_" + indexName.substring(0, indexName.lastIndexOf(separator)) + ".xml";
            request.setSchemaName(schema);
        }

        return clusterRequest(request, solrClient, null, nodes);
    }

    public String getSeparator() {
        return separator;
    }

    public void setSeparator(final String separator) {
        this.separator = separator;
    }
}

 

 

 

In Cloud mode change is nearly identical as in Standalone, but setting of schema field is made with property in key value form. 

 

 

public class ExampleSolrCloudSearchProvider extends SolrCloudSearchProvider {

    private String separator;

    @Override
    protected CloudResponse<CollectionAdminResponse> doCreateIndex(Index index, CachedSolrClient solrClient) throws SolrServiceException {

        final String indexName = index.getName();
        final SystemInfo systemInfo = loadSystemInfo(solrClient);
        final String configSet = resolveConfigSet(index);
        final Integer numShards = resolveNumShards(index);
        final Integer replicationFactor = resolveReplicationFactor(index);
        final Boolean autoAddReplicas = resolveAutoAddReplicas(index);
        final CreateCollectionRequest request = new CreateCollectionRequest(index.getName(), configSet, numShards, 
                replicationFactor);
        request.setWaitForFinalState(true);
        if (systemInfo.getMajorVersion() != null && systemInfo.getMajorVersion() <= SOLR_8_MAJOR_VERSION)
        {
            request.withParam("autoAddReplicas", String.valueOf(autoAddReplicas));
        }
        // This if block is changing default schema used if configset is set to default used for product and document indexations
        if (DEFAULT_CONFIGSET_VALUE.equals(configSet)) {
            final String schema = "schema_" + indexName.substring(0, indexName.lastIndexOf(separator)) + ".xml";
            request.withProperty("schema",schema);
        }

        return cloudRequest(request, solrClient, null);
    }
    
    public String getSeparator() {
        return separator;
    }

    @Required
    public void setSeparator(final String separator) {
        this.separator = separator;
    }
}

 

 

 

Final considerations on migrating customized Solr 

After handling all these issues. We are putting our whole adaptation and migration journey here. If you find it helpful, check below other optimizations for Solr. These can be critical to reach peak performance. Additionally, we wanted to bring attention to how quite minor changes can reduce effort, and what happens if we miss them. 

 

Optimization for Solr indexation speed 

The most common causes of indexation slowness are: 

  • Committing after every update request 
  • Sending one document at a time in each update request instead of batching them 
  • Only using one thread/connection to index 

To combat this, we can use recommended configuration, which is: 

  • number of threads: 2x the cpu cores 
  • batch size: 1000  
  • Commit mode: after index 
  • solr.autoCommit.maxTime=60000  
  • Solr.autoSoftCommit.maxTime=300000 

 

Number of threads: 2x the cpu cores - Instead of running a single processing thread, the system is configured to run twice as many threads as there are physical (or logical) CPU cores to help maximize CPU utilization and reduce latency. 

Batch size: 1000 - Documents (or data items) are sent to Solr in groups of 1000 rather than one at a time. Processing 1000 documents in one go usually improves throughput and overall indexing performance. Too large a batch might consume too much memory, while too small a batch might make the process inefficient. 

Commit mode: after index - This setting shows that a commit operation (which makes changes durable and, depending on the type, visible for search) is performed after the indexing process is complete (or at the end of a batch), rather than committing after every document or at fixed time intervals. 

solr.autoCommit.maxTime=60000 - Solr is configured to automatically perform a “hard commit” at least once every 60,000 milliseconds (i.e., every 1 minute) if no manual commit occurs. 

Solr.autoSoftCommit.maxTime=300000 - Solr is set to perform an automatic “soft commit” every 300,000 milliseconds (i.e., every 5 minutes) if no soft commit has been triggered by other means. 

Auto commit setting acts as a safety net so that even if you are not explicitly committing (for example, when using commit mode “after index”), Solr will not go too long without making your data durable. 

 

The number of threads, batch size and commit mode can be configured using backoffice: 

PPotuczko_1-1740496276282.png

PPotuczko_2-1740496276283.png

And the autoCommit time can be set in the local.properties / hcscommons, with the following entries: 

  • solr.autoCommit.maxTime=60000  
  • Solr.autoSoftCommit.maxTime=300000 

 

High latency due to low disk throughput 

In scenarios, where the performance is hindered by the disk throughput, it is possible to increase the overall performance by adjusting the memory usage. 

Recommended actions to reduce the disk throughput: 

  • Reduce cache sizes 
  • Disable the least crucial caches 
  • Decrease the jvm heap size 

 

Other helpful resources: