Mike Ferguson

By Mike Ferguson

November 2023

Upcoming events by this speaker:

November 24, 2023 Online live streaming:
Building a Competitive Data Strategy for a Data-Driven Enterprise

December 4-5, 2023 Online live streaming:
Data Warehouse Modernisation

December 6, 2023 Online live streaming:
Data Catalogs

Challenges of Migrating Your Data Warehouse to The Cloud

For many years Europe has been the most sceptical region in the world in terms of Its reluctance to adopt cloud computing. However, in recent years that has changed with many companies now aggressively adopting the cloud both for transaction systems and analytical systems. in particular the momentum behind analytical systems has been significant especially when it comes to both data science and also the migration of data warehouses to the cloud. It is this latter topic that I want to address in this short article.

Almost every major database management system (DBMS) supporting data warehouses on premises now also runs on the cloud. So, the choice is huge – from the major cloud service providers like AWS, Microsoft, and Google, to many others including Oracle, IBM, Teradata, Snowflake and Exasol. A large number of companies either have already migrated their data warehouse or are in the process of planning and implementing data warehouse migration.

However, this is not a simple exercise. Some data warehouses go back perhaps 20 years and so a huge amount of work has taken place on those systems and very significant changes over such a long period of time. As a result, migrating your data warehouse to the cloud needs very careful thought. There are three major stages to migrating data warehouses to the cloud. These are:
⦁ Pre-migration planning
⦁ The actual migration itself
⦁ Post migration activity

Pre- migration planning involves, the business case (why do it?), scope (e.g., data mart at a time, entire data warehouse), forming a migration team, training, deciding on any changes needed, selecting technologies, setting up your cloud environment and all defining what tests need to be done.

The actual migration involves:
⦁ Schema migration
⦁ Data migration
⦁ ETL processing and loading migration
⦁ Users, roles, and groups migration
⦁ Access security privileges migration
⦁ DW operations jobs / scripts
⦁ Migration of dashboards and reports
⦁ Migration of any in-database machine learning models

Post migration activity involves parallel running with the existing data warehouse, testing, performance benchmarks, post migration modernization and integration with other analytical workloads that may already be running on the cloud. There’s a lot to think about. Without doubt, planning is the key to success. A critical step is to do a detailed assessment of your existing data warehouse looking at everything from data models of your data warehouse and your data marts, ETL dataflows, dependencies (e.g., stored procedures), data quality, data redundancy and maintenance difficulty. You may also want to assess existing technologies in use.

Out of that assessment you will form opinions as to whether or not a straightforward ‘As is’ migration is required, whether some alterations are needed, or whether you feel your data warehouse is at the end of its life and a complete redesign is needed with just a migration of the data in your existing system. Without doubt, there’s always temptation in a data warehouse migration to make changes. However, care is needed here because the more you change the greater the risk to your data warehouse migration project. If you do need to make changes, a good question to ask is:

“Should I make changes before the migration occurs, after the migration occurs or during the migration?”.

Clearly the more you attempt during the migration itself, the greater the risk to the overall project. In my opinion, it’s better to make changes before or after the migration and minimise changes made during the migration. Nevertheless, you will definitely make some changes. This may include a change of DBMS. This change on its own will result in a number of challenges. For example, compared to your existing data warehouse you should expect to see table type differences, data type differences, integrity constraint differences, database privilege differences and of course SQL differences.

Some key questions to ask when planning a data warehouse migration include:
⦁ Are people trained and do they have skills in the target cloud and data warehouse DBMS?
⦁ How do you intend to prevent disruption to users during the DW migration?
⦁ Are you planning to introduce more agility into your data warehouse and if so, how?
⦁ Are you planning to change the DW data model design, e.g., Kimball to Data Vault?
⦁ Are you planning to change any underlying technology, e.g., DBMS, ETL tool, BI tools…?
⦁ What workloads and processes can you migrate with minimal effort?
⦁ Which processes have issues today and would benefit from re-engineering?
⦁ What workloads are outdated and require a complete overhaul?
⦁ What do you need to migrate and in what order should you do it in?

You also have a number of choices when it comes to your actual data warehouse migration strategy. The three main options are
⦁ Lift and shift your data warehouse “As is”
⦁ Simplify your existing data warehouse and then migrate it
⦁ Complete redesign on a cloud analytical DBMS

A key objective during data warehouse migration is to find a way to minimise the number of things that need to be migrated in order to speed up the overall migration project. Therefore, identifying redundant and unused data is well worth doing. For data warehouses that are 10, 15, 20 years old, you would be surprised how much data redundancy my exist. If you do have redundant data it is worth asking the question “Why migrate data that has never been used in the last two or three years?”. One thing you can do here is archive that data but keep it online and then migrate the rest of your data warehouse data to the cloud.

When it comes to schema migration, obviously if you do not change the DBMS, then migration should be relatively straight forward assuming you’re not drastically redesigning your data model. For example, migrating from a data warehouse with a Kimball design from Oracle running on-premises to Oracle Autonomous Data Warehouse running on Oracle Cloud. In this case, the tools to migrate your data warehouse should be available from the DBMS vendor (Oracle in this case). However, if you are changing the DBMS, you will need to look at many things when it comes to schema migration including data types, data compression ratios, database indexes, stored procedures, database views etc. Existing indexes can be a good indication of performance options but of course it depends on the DBMS you are migrating to. Some cloud DBMSs do not have indexes e.g., Snowflake.

Data migration also has its challenges bearing in mind that data warehouses hold a historical data that could go back years. So naturally, you need to ask some questions here also. For example:
⦁ What is the current data size?
⦁ Is the data compressed in the existing data warehouse and marts?
⦁ What is the data size likely to be when you decompress it on export?
⦁ Do you have spare disk space in your data centre to temporarily store exported decompressed data?
⦁ Do you need to get approval to make this spare disk space available in your data centre?
⦁ How many un-used tables exist in your existing data warehouse and marts?
⦁ What is your network bandwidth and is it sufficient to move your data?

Those are just some of the questions. There is a lot more than this to think about when migrating a data warehouse to the cloud. What about metadata, security, users, user groups etc.? If you are thinking about migrating your data warehouse to the cloud for the first time or are about to undertake such a project and need some practical guidelines, please join me on my Migrating to a Cloud Data Warehouse class in October.