Enterprise Architecture Blog Posts
Need a little more room to share your thoughts with the community? Post a blog in the SAP Enterprise Architecture group to explain the more complex topics.
cancel
Showing results for 
Search instead for 
Did you mean: 
TrijoySaikia
Explorer
407

As a systems integrator, modernising legacy systems is one of the most challenging problems we often face. Data migration from your legacy BW to BW/4HANA or Datasphere is a scenario we visit quite often. In such cases, transformations on data may lead to transformations on views, functionality and user interfaces. While SAP provides some standard architecture to execute it, they sometime become very large projects due to huge data footprint, quality of data and sometime the need to consolidate from many legacy applications to one. This blog looks into the scope of transformations on data models and their corresponding schemata.

 

Data migration hardly exists in isolation. It impacts the overall system migration and system upgrade projects. Traditional ETL methods are not efficient and needs high manual interventions in data cleansing and mapping. We need to established the links between abstract models of legacy and the target application by extending connections in abstract interpretation. Two data model spaces at different levels of abstraction are connected by a pair of abstraction. We need functions that translate between the two models. If we can establish this model taking help of #GenAI, we can than let standard approach for the physical data migration of extraction and loading. We are looking into the possibility of a general refinement scheme for migration and transformation. This need linking high-level specifications to executable codes in a way practitioners can systematically verify properties of data migration.

 

There are many solutions, but there is enough white spaces for a new idea to explore the three key question all data migration projects need to answer.

  • How can we control data quality within the data migration process?
  • How can we keeps track of inconsistencies between legacy data sources with the target system specifications and interrelated data?
  • How can we compare legacy data sources with the new data semantics and data integrity models in the target system?

 

Traditional process of data migration involves three stages: Extract, Transform, and Load (ETL). ETL in in an operational environment is simplified as you have established rules. In a Data Migration project it is much more complicated. We need to re-look at defining the following ETL processes definition.

 

What do we Extract? A legacy kernel (or shell) is first “extracted”. Use machine learning (#ML) to understand the high-level abstraction, data relationship and schema. When you have a need to consolidate heterogeneous data sources in a legacy or multiple data system, the indexing will be necessary to be consumed by the transformation. Data extraction rules will than be defined leveraging #GenAI models.

What do we Transform? This is done in a staging environment. We then “transform” legacy kernels into a new kernel by specifying migration transformations that will involve validating, cleansing and mapping data. #GenAI models will consume the abstraction model of legacy. #GenAI models will have knowledge of the target modern system. Their data models at various layers of mapping, relationship and integrity has to be manged by it. Data is stored in a structured staging table, and once ready to be uploaded will be consumed by the loader.

What do we Load? As “loading” to a new kernel into targeted data sources is often straightforward, we will use standard method provided by SAP for loading the data. There are data packet compression and other efficiencies that can be looked at but that will be mostly at the transport layer and not the application layer.

 

Theoretically, especially in order theory of mathematics, a Galois connection is a particular correspondence between two partially ordered sets (posets). This blog will not touch the theory. This principle holds the relationship between the abstract data set and the concrete data. This relationship model is of interest for us in the transformation of loosely defined data abstraction and the concrete data set it needs to handle. Use of Galois theory would be thus a handy tool to develop the transformation rules using #GenAI and #ML capabilities. Below is a pictorial representation of this intelligent transformation engine.

BI transformation image.png Representation of Data Flow in The Transformation Function

 

This article is an idea around building an intelligent transformation engine, that leverages the power of #GenAI and #ML. This is not a solution but an architectural model shared with the enterprise architect peer group for your inputs. Would like to hear from the community on ideas and solutions you might have found to automate data transformation rules discovery and executio