Database Migration With ETL

Published on Aug 27, 2020

Operating a database migration is the most stressful thing you can do in the lifecycle of a web application. An application can break, but losing data is unacceptable: you need a process to prevent any data loss while minimizing service interruption.

Whether you want to change your database system, reinvent your data schema, or switch to a new provider, you'll want to follow a carefully thought-out Extract-Transform-Load (ETL) process.

Let me give you an example of how I do it at Cowriters.

The Extract step consists in configuring a remote connection to the MySQL database and download the data. The problem is that I'm dealing with hundreds of megabytes of data, so I need something called data staging to transport the data little by little. It's a bit like delivering goods: if you use a cargo and said cargo sinks, you lose everything. With data staging, you transport the goods with an army of trucks, from HQ to local warehouses: the probability to lose your products is much lower, and you can recover the lost good much more effectively. In Cowriter's case, each table is downloaded separately to local JSON files containing a few thousand rows each. If the transfer were to fail at some point, each packet of data could be recovered precisely.

The Transform phase is about reading these JSON files, mapping each field to the new database according to the desired schema, and cleaning the data we don't need. Depending on the amount of processing we need, it can also be useful to divide the data in small manageable packets. 

Last but not least, we need to load the data in the target database. At this point, our data is ready to be used by our application but we have to prepare the database and tune it: indexes, foreign keys, full-text search... you name it. If nothing breaks during the incremental upload, you're good to go!

When it comes to my data, I have trust issues regarding third-party tools, so I use homemade scripts to perform all these tasks to optimize the process.