Introduction
Data Migration is an activity to migrate/move data between systems. Commerce (B2B/B2C) implementations require data migration from various systems to the commerce system. Application stability is dependent on data accuracy. This post explains the Data Migration process and how to validate the data accuracy as part of the Data Migration. This blog is specific to migrate large scale data sets from one system to another system (Asynchronous data transfer).
What is Data?
Data in commerce implementations is categorized as
- Product Master Data – Source is ERP (or) PIM systems.
- Products / Materials
- Price
- Promotions
- Product references
- Product Enrichment
- Enrichment- Source is Legacy systems, Content Management systems
- Media files
- Product description
- Customer master - Source is ERP (or) Legacy systems
- Customer
- Profile information
- Payments
- Addresses
- Payment - Source is ERP/Legacy/Payment gateway
- Order Master - Source is ERP (or) Legacy systems
- Orders
- Returns
- Order Status
- Any other custom objects- Source is ERP (or) Legacy systems
Commerce Data Migration Process?
Data Migration is an incremental process. The below table explains the data migration process
Scope Definition |
Build |
Load |
Initiation |
Data Requirements / Profiling |
Data Migration build |
Data Extraction |
Data Transformation |
Data Cleansing /
Enrichment |
Data Loads |
Scope Definition
Strategy and approach
Commerce Data Model
|
Source to Target field mapping
Profiling rules based on Commerce logic
Generate Profiling report from Source system
|
Develop Data migration pipeline
Cleanse data anomalies in the source system
|
Extract all source entities and fields required for target data migration
Filter records that are in migration scope
|
Tailor the source data to accurately fit target structures
|
Enrich data that is not available in source systems
|
Load of all converted and validated data
|
Initiation: Data Migration Scope
This process defines the objects, attributes for the migration scope. The below information is captured in a scope definition
- Objects / Types
- Data rules to migrate the objects - (Status, duration, migration criteria)
- Exclusion rules
- Profiling Requirements from commerce systems
Sample scope definition sheet
Initiation: Data Migration Strategy
This activity defines the strategy for data loads. Strategy includes how the data is getting migrated - One time / Phased.
When the existing system is replaced with commerce, the customer data needs to be migrated from legacy to the new system. Based on the criticality/complexity of the migration applications follows different customer deployment options
The below criteria helps the teams to define customer deployment strategy
- Volume
- Complexity
- Scope of retiring old system
The below details are captured as part of strategy sessions
- Data migration strategy – Phased migration or one-time load
- Strategy for initial load
- Strategy for delta load
- Support multiple systems incase of phased migration approach
- Strategy to handling of inflight data
- Strategy to handle data failures
- Mock data load strategy
- Test data load strategy
- Production data load strategy
Data Migration Build: Environment Readiness
Define environment strategy for testing, staging, and production loads. In many cases, the source data is staged in a staging area to run profiling rules, and enrich before the data is imported in commerce systems.
Typical Data Migration follows Sorce System (Data Extract) -> Staging DataBase (Cleanse and Enrich) - > Target System (Load and Profile)
The environment structure should support the below requirements
- Staging Data
- Generate Load Scripts
- Generate Mock Loads
- Test Data Loads
- Source System
- Target System
Data Migration Build: Data Mapping
Define data mapping between source and commerce systems based on the scope defined. This information is critical to build extract scripts from the source system. For each entity collect the below information
Create source and destination system mapping
- Table
- Attribute
- Type
- Length
- Description
- Required / Optional
- Default Value
- Possible Values
- Restrictions
- Hardcoded values
- Impex header – static header
- Comments
- Cross-reference table
- Transformation Rules
- Criteria to accept the data (Password criteria)
- Inclusion and exclusion rule
- Cross Reference tables
- Table name
- Column name
- Possible values
- Sequencing of data and elements
Example:
- Table: Cart
- Attribute: id, name, product, quantity
- Type: String, Number, Date
- Length: 10
- Description:
- Required / Optional: Required
- Default Value: 10
- Possible Values: P1,P2
- Restrictions:
- Hardcoded values
- Impex header – static header
- Comments
Workshop agenda and deliverables:
- Map Legacy attributes from Source to Commerce
- Map cross-reference legacy attribute values with Hybris values
- Identify and document transformation/conversion rules to be applied during data migration
Data Migration Build: Data Profiling
Profiling defines a ruleset to measure the data accuracy based on target systems needs.
Generate the profiling reports based on the rules defined in the mapping document. This report explains the data quality based on commerce business requirements / technical limitations (These are data conditions). The issues are categorized with different priorities based on issue criticality. Based on the profiling report the data team to perform the below activities
- Correct / Cleanse the data in the source system
- If the data cannot be corrected then define explicit handling in the commerce system to address the data cleansing issues.
Perform the below steps and collect the information in Profiling sessions
- Define profiling rules
- Check the data accuracy for each data element based on the mapping document
- Uniqueness and duplicate records
- Invalid data – invalid email address, SSN, Phone number
- Invalid address or postal code
- Validate data based on mapping rules
- Missing default values
- Missing reference objects (billing and shipping profiles)
- Missing payment information
- Inclusion rule requirements
- Exclusion rule requirements
- Attribute rules
- Type definition
- Null or blank check
- Criteria to accept the data
- Restrictions
- Special characters
- Boundary conditions of the data
- Extra record more than expected
- Duplicate unique identifier
- Right login format
- Invalid payment information
- Validate the data based on rules
- Not meeting boundary condition – Only one cart should be to a customer
- Element with missing required fields
- Data to support a business rule. Date should be Jan 01
- Requirements from destination system
- Define a profiling report and identify the issues
- Define a cleansing plan and fix the issues
Data Cleansing
Data cleansing is a data correction activity in the source system based on the profiling report. Define a clear action plan to correct data in the source system (production). The data to be corrected prior to user acceptance test to make sure the user acceptance testing is happening with correct data. Perform cleansing multiple times before the data is loaded in commerce to avoid last-minute data changes in the source system.
Data Extraction and Transformation: Data Loads
- Define data load plan for each element
- Develop load-scripts based on data migration scope and mapping requirements
- Make sure to have profiling scripts in the commerce system (Flexy queries). These scripts are going to help the team to validate the accuracy post data loads into the commerce system.
- Mock loads
- Mock Loads: Are test loads in test environments. These can be incremental based on the development plan. May not load all objects at once in the development phase. The team to plan multiple mock loads (Initial and Delta) in test environments with a full set of data. This will help the team to test all scenarios.
- Generate the mock loads files from source system
- Mock loads can be generated using load scripts to validate the functionality of the script
- Validate the mick loads and test the functionality
- Load live data in test environments
- Production Loads:
- Generate the actual load's files from the source system
- Validate the Live loads and test the functionality
- Production data loads
Conclusion
Data Migration into commerce systems is a critical task of an implementation. Make sure to have a clear plan for scope definition, profiling, cleansing, test loads and production loads. Few things to consider
- Define Data Migration Scope
- Create Mapping documents between source and target
- Perform Moch Loads in test environments multiple times
- Generate Profiling report from production
- Define a clear action plan to correct the data issues in source system (Cleansing)
- Practice data migration (onboarding) multiple times in testing phases
- Define a clean delta load process
- Use the staging DB, so that the production system is not impacted