Technology Blog Posts by Members
cancel
Showing results for 
Search instead for 
Did you mean: 
apenumatcha
Discoverer
0 Kudos
1,537

When dealing with large-scale product data in Solr indexing, the process can often become complex and inefficient, especially when data is pulled from multiple independent tables. A common mistake during this process is the repeated use of flexible search queries within the resolver for each product iteration. This not only slows down the indexing process but also places an unnecessary load on the database. Additionally, handling interruptions or errors during large data loads can further complicate the process, leading to inefficiencies if not managed correctly.

This article discusses best practices for optimizing Solr indexing and handling interruptions efficiently in Hybris.

Scenario 1: Efficient Data Retrieval During Solr Indexing

When indexing large volumes of product data, each product iteration often requires data from different independent tables. A frequent error is to call flexible search queries repeatedly within the resolver. This approach is inefficient as it necessitates a database hit for each product iteration, significantly slowing down the process. To address this, we can use the IndexerBatchListener and AbstractValueResolver classes provided by Hybris.

Step 1: Extending the IndexerBatchListener Class

To optimize the data retrieval process, create a new class that extends the IndexerBatchListener class. This allows us to execute the flexible search query before the batch indexing begins, fetching all the necessary data in a single query.

 

 

public class CustomBatchListener extends IndexerBatchListener {
  
  @Override
    public void beforeBatch(IndexerBatchContext context, Collection<ItemModel> itemModels) {
        // Construct your flexible search query here using the product PK’s in case needed        
String query = "SELECT {pk} FROM {CustomModel} WHERE {product} IN (?pks)";       
        // Fetch data in one go
        List<ProductModel> productData = flexibleSearchService.search(query).getResult();       
        // Store the fetched data in the IndexerBatchContext attributes map
        context.getAttributes().put("productData", productData);
    }

    @Override
    public void afterBatch(IndexerBatchContext context, Collection<ItemModel> itemModels) {
        // Additional logic after batch if needed
    }

    @Override
    public void afterBatchError(IndexerBatchContext context, Collection<ItemModel> itemModels, Exception e) {
        // Handle errors here
    }
}

 

 

  • beforeBatch: Fetch all necessary product data in a single call and store it in the IndexerBatchContext attributes map.
  • afterBatch: Execute any additional logic after the batch indexing is completed.
  • afterBatchError: Handle errors that occur during the batch indexing process.

Step 2: Extending the AbstractValueResolver Class

Next, extend the AbstractValueResolver class to load and manipulate the data fetched during the beforeBatch step.

 

 

public class CustomValueResolver extends AbstractValueResolver<ItemModel, Object, Object, IndexerBatchContext> {

    @Override
    protected void loadData(IndexerBatchContext context, Collection<ItemModel> models) throws FieldValueProviderException {

        // Access the data stored in the context's attributes map

        List<ProductModel> productData = (List<ProductModel>) context.getAttributes().get("productData");

        // Logic to process or transform the data as needed
    }

    @Override

    protected void addFieldValues(InputDocument doc, ItemModel model, ValueResolverContext<Object, Object> resolverContext) throws FieldValueProviderException {
        // Add field values to the Solr document based on the processed data
    }
}

 

 

  • loadData: Load the data stored in the IndexerBatchContext and process it as needed.
  • addFieldValues: Add the processed data to the Solr document.

Step 3: Listener Configuration

Add the CustomBatchListener to the listeners parameter in the SolrIndexedType configuration. This ensures that the listener is properly triggered during the indexing process.

By implementing this approach, you minimize the number of database calls, leading to a more efficient and scalable Solr indexing process.

Scenario 2: Handling Errors and Interruptions During Large Data Loads

During large data loads, errors or interruptions can cause the Solr indexing process to restart from scratch, which is highly inefficient. To avoid this, a custom attribute at the product level, such as indexRequired, can be introduced to track whether a product needs re-indexing. This attribute is then used to ensure that only products requiring re-indexing are processed when the operation is resumed.

Step 1: Creating a Custom Attribute

Add a custom attribute indexRequired to the product model. This boolean attribute will indicate whether a product needs to be re-indexed. Initially, it can be set to true for all products.

Step 2: Modifying the Indexing Process

Modify the indexing process to check the indexRequired attribute before re-indexing a product.

 

 

public class CustomBatchListener extends IndexerBatchListener {

    @Override
    public void beforeBatch(IndexerBatchContext context, Collection<ItemModel> itemModels) {
 // Additional logic before batch if needed
    }

    @Override
    public void afterBatch(IndexerBatchContext context, Collection<ItemModel> itemModels) {
final List<ItemModel> itemModels = batchContext.getItems();
        final Set<PK> pks = itemModels.stream()
                .map(AbstractItemModel::getPk)
                .collect(Collectors.toSet());
        final Set<String> codes = itemModels.stream().map(item -> item.getCode())
                .collect(Collectors.toSet());
        final String productCodes = codes.stream().collect(Collectors.joining("','", "'", "'"));
        // Update the indexRequired attribute to false after successful indexing
                String updateSql = " UPDATE Products SET p_indexingrequired =0 WHERE p_code IN (" + productCodes + ")";
        executeSqlQuery(updateSql);
    }

    @Override
    public void afterBatchError(IndexerBatchContext context, Collection<ItemModel> itemModels, Exception e) {
        // Handle errors here, possibly logging and retrying
    }
}

 

 

  • afterBatch: Update the indexRequired attribute to false after successful indexing to prevent re-indexing these products in case of interruptions.

Step 3: Using PrepareInterceptor to Handle the indexRequired Field

A PrepareInterceptor can be used to automatically set the indexRequired attribute to true whenever a significant change occurs in a product.

 

 

public class ProductIndexRequiredPrepareInterceptor implements PrepareInterceptor<ProductModel> {

    @Override
    public void onPrepare(ProductModel productModel, InterceptorContext ctx) {
        // Set indexRequired to true if any significant changes are detected
        if (ctx.isModified(productModel)) {
            productModel.setIndexRequired(true);
        }
    }
}

 

 

Step 4: Required Indexing Condition

Ensure that the indexRequired attribute is included in both the full and update indexing queries.

Performance Consideration: Using Direct SQL Queries

To minimize performance overhead, direct SQL queries was used instead of flexible search queries only for updating the indexingRequired to true.

Conclusion

Optimizing Solr indexing and efficiently handling interruptions in Hybris is crucial for maintaining performance and scalability in large data environments. By implementing the strategies outlined in this article—using IndexerBatchListener and AbstractValueResolver for efficient data retrieval, and tracking indexing status with a custom attribute—you can significantly improve the performance of your Solr indexing process. Additionally, careful use of PrepareInterceptor and direct SQL queries ensures that the system remains robust and efficient, even during large data loads and in the face of potential interruptions.