Data Migration Playbook: How to Move Terabytes Without Losing Your Mind
Your CTO says "we're migrating to the cloud by Q4." Your data team finds 47 source systems, 3 different date formats, and customer records that haven't been deduplicated since 2014. Here's the playbook that actually works.
Why 83% of Migrations Run Over Schedule
Data migration is the most underestimated project in enterprise IT. It sounds simple — move data from System A to System B. But the reality involves data quality battles, schema transformations, dependency mapping, parallel validation, and organizational politics about who "owns" the data being moved.
Bloor Research found that 83% of data migrations exceed their original timeline, with the average overrun being 68%. The cost overruns are even worse — especially when "quick" migrations turn into 18-month initiatives.
The 7-Phase Migration Framework
We've executed 30+ data migrations across industries. This framework has kept 85% of our projects on-time and on-budget:
Phase 1: Data Profiling & Discovery (Weeks 1-3)
This phase prevents 60% of migration failures. Before you move a single byte, you need to understand what you're moving:
- Source inventory: Catalog every source system, table, column, and data type. You'll find systems nobody remembers connecting.
- Data quality assessment: Measure completeness, accuracy, consistency, and uniqueness. Expect to find NULL rates of 15-40% in non-critical fields.
- Volume analysis: Row counts, growth rates, storage sizes. This determines your migration approach (batch vs. streaming) and infrastructure sizing.
- Dependency mapping: Which systems consume this data? Which reports break if a column changes? Who are the data owners?
Every day invested in data profiling saves 3-5 days in later phases. The organizations that rush past profiling are invariably the ones filing emergency change requests in Week 8 when they discover their customer table has 7 different date formats.
Phase 2: Migration Design (Weeks 3-5)
With profiling complete, design your migration architecture:
- ETL vs. ELT approach: Transform before loading (ETL) gives cleaner results but is slower. Load then transform (ELT) is faster but requires powerful target infrastructure.
- Big-bang vs. phased: Big-bang migrates everything in a single cutover weekend. Phased migrates by domain (customers first, then orders, then inventory). Phased is safer; big-bang is faster.
- Schema mapping: Source-to-target column mapping, data type conversions, business rule transformations. This document will be your bible for the next 3 months.
- Validation strategy: Row count reconciliation, checksum validation, business rule testing, parallel run criteria.
Phase 3: Data Cleansing (Weeks 5-8)
Clean data before you move it, not after. This phase typically takes 40% longer than planned — budget accordingly:
- Deduplication: Customer records, vendor records, product catalogs. Use fuzzy matching — exact matching misses 30% of duplicates.
- Standardization: Addresses, phone numbers, dates, currencies. Use standard formats (ISO 8601 for dates, E.164 for phones).
- Enrichment: Fill in missing data where possible — geocoding addresses, adding industry codes, normalizing company names.
- Archival decisions: Do you really need 15 years of transaction history in the new system? Moving less data = faster migration.
Phases 4-7: Build, Test, Execute, Validate
Phase 4: Build (Weeks 8-12)
Build your migration pipelines. Use code, not GUIs — pipelines built in code (Python, dbt, Spark) are version-controlled, testable, and repeatable. GUIs create ungovernable point-and-click configurations that can't be peer-reviewed.
Phase 5: Test (Weeks 12-14)
Run at least 3 full migration dress rehearsals. Each rehearsal should be identical to production — same data volumes, same transformation logic, same validation checks. Document every failure and fix.
Phase 6: Execute (Week 15)
The actual migration. If you've done 3+ dress rehearsals, this should be the most boring phase. Boring is good. Every surprise on migration day is a profiling failure from Week 1.
Phase 7: Validate & Hypercare (Weeks 15-18)
Post-migration validation: row count reconciliation, business rule testing, user acceptance, parallel runs of critical reports comparing old vs. new data. Budget 3-4 weeks of hypercare for production issues.
Migration Tooling Guide
| Tool | Best For | Cost |
|---|---|---|
| Azure Data Factory | Cloud-to-cloud, Microsoft ecosystem | Pay-per-pipeline-run |
| AWS DMS | Database-to-database, CDC support | Instance-based pricing |
| Fivetran | SaaS-to-warehouse, 300+ connectors | $1/credit (~row) |
| dbt | Transformation logic, testing, docs | Free (Core) / $100/mo (Cloud) |
| Apache Spark | Large-scale transformations (TB+) | Compute costs only |
| Custom Python | Complex business logic, edge cases | Developer time |
The Verdict
Data migration is a discipline, not a task. The organizations that treat it as a weekend project end up with corrupt data, broken reports, and angry stakeholders Monday morning. The ones that follow a rigorous framework — profile first, cleanse before moving, test 3x, validate obsessively — finish on time and with data they can trust.
The most important meeting in any migration isn't kick-off day. It's the data profiling review in Week 2 — because that's where you discover the real scope of what you're doing.
Planning a Data Migration?
We've migrated 30+ organizations to modern platforms. Let us profile your data and build your migration plan.