Data Migration : A Fact Of Life

Data Migration : A Fact Of Life

Implementation of a new application or consolidation of applications & data after a merger or acquisition or legacy data conversion to on-premises data center or on-cloud in any organization is a major IT change initiative to standardize business processes, automate back-office operations and achieve the associated business benefits and ROI. The success of such program depends on data-migration aka data-conversions.

Data-migration projects are unique, because data-migration are built for single execution, retire all the migration work post go-live, while other projects are supported and maintained post implementation.

The data-migration market size is expected to grow from $5.14 Billion in 2016 to $11.49 Billion by 2022
as per market research

Data Migration projects are risky and had tendency to fail. Research says 67% of data migration projects overruns in time or cost or fails. Return on investment is at risk if data-migration implementation is delayed.

Why data-migration project fails?

  • Lack of right solution, expertise, and right approach to data-migration
  • Lack of understanding of the legacy systems and their business processes/operations
  • Delay in follow up and finalizing business, functional and technical specifications
  • Getting ready for early development without environment planning and design
  • Missing components in Data Quality
  • Invalid cross reference for data conversion and missing application configurations
  • Lack of thorough data validation
  • Poor performance and load balancing estimations
  • Missing actual cut-over activities
  • Poor visibility of errors and handling reconciliation

Proposed Solutions

Organizations can minimize delay and risks by adopting a customized data-migration approach, leveraging relevant tools & technologies, look for potential areas to automate mundane tasks, planning for several mock-conversion runs to address the data quality and data conversion issues in advance before the actual cut-over and applying a set of best practices for the successful migration.

Data-migration approach – A six steps processes can be followed with a focus on data quality.

  1. Analysis & Discovery : Analyze source system to understand data content, quality, structure, and relationships. Review existing documentation or reverse engineering.
  2. Extract & Profile : Extract master and transactional data from diverse source systems. Data pattern.
  3. Cleanse : Cleanse data based on business rules. 
  4. Transform & Validate : Transform the data into target application structure and apply business validation checks.
  5. Load : Load the data into target system. Exception handling.
  6. Reconciliation : Reconcile to ensure the data was loaded correctly and completely. Post load report with error details.

Tools & technologies

ETL (extract transform load) middleware tools had below listed key advantages –

  • GUI to design data flows with in-built transformations to minimizes hand coding.
  • Provides connectivity to various structured and unstructured data sources.
  • One administration environment for code management, scheduling, security, and user management.

 Automation explore the opportunities to leverage automation tools like - Tosca, Selenium, RPA, etc. for automating the potential time-consuming testing areas, mock conversion runs (as multiple mock conversion runs are required before the go-live).

 Data Migration Framework explore the opportunities to develop a data conversion framework for simplified, consistent, cost-effective approach to accelerate the data migration process through various phases. Building such framework will also ease out data validation and reconciliation process.  Refer below an example of a high level architecture of Data Migration Framework built on Azure Data Framework.

No alt text provided for this image

Data Migration Toolkit – Build a centralized repository for the team to refer the best practice, industry standard templates for unit testing, code review, test case/scenarios, requirement gathering questionnaires, quality process, assets & accelerators. Explore the opportunities to automate the unit testing, code review, test cases creation, mock conversion runs, source data validation, rejected record analysis, data reconciliation and error report, etc. and make these artifacts available in the data-migration toolkit for the team.

Best Practices / Key things to keep in mind

i.  Data Migration Strategies – Decide on the business cutover strategy during planning phase, either “Big Bang Migration” Or “Incremental in phases”.

Advantages of a Big-bang approach are: No need to run the legacy (old) and new systems simultaneously. After successful migration, legacy system can shut down.

Disadvantages are long window of migration may impact businesses. The team need to stay attentive continuously for several days/hours to complete the data transfer. Risk of migration overruns in fixing the unexpected issues. Synchronization may not be an issue, but fallback strategies can be challenging if issues are found after the migration.

Due to the risks associated with big-bang approach, organizations are now adopting low risk approach of incremental migration in phases.

ii.   Data Migration Scope

Structured and Unstructured data – Understand the requirements of structured versus unstructured data-migration and accordingly do volumetric analysis and plan for the migration architecture and solution. For example, document and associated metadata migration may be a time-consuming exercise, can delay the overall migration because of slow document transfer between source to target server.

Volumetric analysis – Prepare an inventory of all the sources with data structure type and expected data volume. Also look for any non-functional requirements (if any). So, figure out all the factors that can influence the migration activities in advance and accordingly plan for optimizations and performance tuning exercise.

Data on Cloud – Understand the requirements for the data transfer from on-premise to cloud, as now a days organizations increasingly ask to migrate all data or only non-critical business data to cloud to meet the speed to market, scalability, and security requirements. Look for optimal / cost-effective network connectivity and data transfer services from on-premises to cloud to migrate large datasets in petabyte.  

iii.  Mock conversions – Introduce sufficient testing cycles and mock conversions runs to identify the data conversion, data issues and data cleansing needs to address them well ahead of actual cutover. Set pass % for each testing cycle and log all the defects in appropriate tool to track them from fixing to re-test and passing to ensure following a defined quality process.

iv.   Data Quality – Improve data quality beforehand not during migration. Plan to profile the data to know the key data quality anomalies listed below and illustrated through below diagram and accordingly, address them by applying cleansing rules.

  • Different or inconsistent standards in structure, format, or values
  • Missing data, default values, Spelling errors, data in wrong fields
  • Duplicate data, Buried information, Data anomalies, etc.

No alt text provided for this image

v.  Data masking – Identify the requirement of masking sensitive data like PHI, PII and write the program to mask them before starting the unit test, system test or mock conversion runs.

 vi.  Key things to keep in mind

No alt text provided for this image

Future direction

A data-migration program opens a door for ongoing Data Quality and Data Governance program.

Conclusion

Data-migration projects are unique, risky and had tendency to fail or overruns in time or cost, so it’s important that one should define the scope of migration very carefully, plan the business cutover strategy, adopt the 6 step data-migration process matured over the period of time, identify the potential areas to automate mundane tasks & mock migration runs for successful data migration, hence business benefits and ROI.

To view or add a comment, sign in

More articles by Prashant Singh

  • Township vs. Data Lake: A Developer’s Perspective

    Ever had someone ask you, “What exactly do you do?” and found yourself scrambling for the right analogy to explain your…

    3 Comments
  • Product Data Reliability: Key Considerations

    Product Data Reliability (PDR) refers to the trustworthiness and accuracy of the data pertaining to products or…

    2 Comments
  • Product Data Reliability Evolution

    The rise of Product Data Reliability (PDR) and the increasing presence of Chief Data Officers (CDOs) in organizations…

Others also viewed

Explore content categories