AWS Data Migration — Best Practices

Srinivas Mahakud
5 min readNov 28, 2020

Cloud migration is slowly gripping the entire IT Industry. There are various types of cloud migrations an enterprise can perform. One common model is the transfer of data and applications from a local, on-premises data center to the public cloud. Here we will focus on some of the best practices of Data Migration.

Driving Factors for Migration

  • Data Centre Consolidation
  • Digital Transformation
  • Facility or Real Estate Decisions
  • Large Scale Compute Intensive Workloads
  • Cost Reduction
  • Colocation or Outsourcing Contract Changes
  • Bring more agility to your build and operating model
  • Achieve Operational Efficiency
  • Bring more security to the workload environment

Data Migration Best Practices

#1 Know your data
Choosing the right tool for the job is paramount

VM Migration => Cloud Endure for AWS
Database Migration => AWS Data Migration Service
File Data Migration=> AWS DataSync, AWS Snowfall, etc

#2 Migrate Virtual Machines with Cloud Endure
Enterprises looking to quickly rehost a large number of machines to AWS can use CloudEndure Migration without worrying about compatibility, performance disruption. Cloud Endure continuously replicates any application or database from any source (Data Centre, Other Cloud) to AWS. CloudEndure Migration conducts continuous, block-level data replication of your source machines into a staging area in your AWS account without causing any downtime.

Cloud Endure Virtual Machine Migration

#3 Understand Network Bandwidth
Online transfer using AWS DataSync, AWS DMS, CloudEndure requires a good network bandwidth. If the amount of data that is planned for transfer is too large or If your network bandwidth is insufficient to move your data in the desired timeframe then we need to look at offline transfer services like Snowball etc. It’s important to note that the physical bandwidth is not always equal to the available bandwidth. Network connections to the cloud are often shared across by many applications within the organizations. So it’s essential to know how much usable bandwidth is available and accordingly decide on online/offline data transfer

Estimated time to transfer data based on usable network bandwidth
Estimated time to transfer data based on Network Bandwidth

#4 Migrate Database with AWS Database Migration Service (DMS)
AWS Database Migration Service helps in migrating databases to AWS quickly and securely. The source database remains fully operational during the migration, minimizing downtime to applications that rely on the database. The AWS Database Migration Service can migrate your data to and from the most widely used commercial and open-source databases. It supports both homogeneous and heterogeneous migration between different database platforms. Following are some of the key offerings from AWS DMS.

AWS Database Migration Service Offerings

#5 Know your Data Profile
We need to know both the total amount of measured in MBs, PBs as well as the number of files to be migrated. There is a big difference between transferring 1oo TBs of data containing 1000 files as compared to 100 TBs of data consisting of 1 million files.

We are much more likely to saturate our network bandwidth if we are transferring large files as compared to small files measured in KBs. We should have a good understanding of how many average size files, small size files, and large size files that needs to be transferred. I/O workload is very different based on the file size and it can significantly change the approach towards data migration.

  • AWS Snowball is best suited to transfer large files. We can also batch small files together to transfer them effectively.
  • In the case of AWS Data Synch, there is no need to batch small files together as Data Synch Agent installed in the source system will take care of optimizing the transfer over the network.

#6 Assess Operational Impact of Migration
We need to assess what impact Migration will have on our normal operational activities. For example:

  • If we are using Snowball, we need to order and track the physical units and get them integrated into your on-premises infrastructure and deploy workstations to manage the data transfer.
  • When we are using Snowball Edge with a database migration service, we will need to manage access to your database and monitor the DMS Process and tools for performing the database migration itself.
  • We also need to plan for one or more proof of concept to validate the assumptions about the migration process including tools, configuration, and performance of our transfer.
  • On the Data Sync side of things, our network will play a big role in affecting how well your migration goes.

All of this requires an additional level of planning and operational management to ensure that the migration goes smoothly.

#7 Preserving Metadata
Do we need to preserve the metadata as part of our data transfer? Metadata is information about your data and is frequently stored with the data itself.

Metadata Information

  • File Ownership
  • Permissions
  • Timestamps
  • File System attributes

Preserving Metadata information helps in Data Protection. Services like Data Sync and Cloud Endure can preserve metadata when used for migrating data to AWS

#8 Validate your Assumptions
Running a test before we start the full migration is very critical to validate your plan and the assumptions we have made. A test should transfer a subset of data and finish in a reasonable amount of time allowing you to make adjustments and course-correct. A good test run should validate the following things

  • Verify you can read and batch data
  • Verify source performance
  • Verify network works as expected both from connectivity and performance perspective
  • Verify Service configuration and other settings
  • Validate estimated time of migration

#9 Verify data migration
Verification ensures migrated data matches the source, For Example, Data Sync performs checksums on all data transferred data in flight. This protects against any data corruption that could occur in the network itself.

  • Write your own validation scripts to match the source data with transferred data.
  • Make sure to estimate the time for verifying data.

Links Referred:

AWS Data Migration
https://aws.amazon.com/cloud-data-migration/
AWS DMS
https://aws.amazon.com/dms/
AWS Data Sync
https://aws.amazon.com/datasync
AWS Snow Family
https://aws.amazon.com/snow
AWS Cloud Endure
https://aws.amazon.com/cloudendure-migration

If this post was helpful, please click the clap 👏 button below to show your support for the author 👇

--

--

Srinivas Mahakud

Technology Enthusiast | Cloud Practitioner | AIML Practitioner | Blogger