Lean Data Architectures to Minimize Data Copying

From Data-by-copying to Data-on-demand


There was a time when people would visit a record store to buy a copy of an album to listen to at home. There was also a time when people went to a video store to rent a DVD to have a copy at home to watch the movie. Not anymore, music and video are streamed. People no longer listen to or watch copies. Music-by-copying has been replaced by music-on-demand and video-by-copying by video-on-demand.

Unfortunately, the world of data has remained unchanged. Data is still copied, in fact several times before it is even consumed. Many organizations are lightyears away from data Architectures that support data-on-demand.

Most data Architectures are duplication-heavy. So much data is duplicated multiple times. For example, data about a specific customer can be stored in a transactional system, a staging area, a Data Warehouse, several Data Marts, and in a Data Lake. Even within one database, data can be stored multiple times to support different data consumers. Additionally, redundant copies of the data are stored in development and test environments. But business users also copy data. They may have copied data from central databases to private files and spreadsheets. Also, data infrastructures currently consist of Data Lakes, Data Hubs, Data Warehouses, and Data Marts. And all these systems contain overlapping data.

In addition to these intra-organizational forms of data copying, massive inter-organizational copying takes place. When organizations exchange data with each other, the receiving organizations store the data in their own systems, creating even more copies of the data.

It is time for lean data Architectures that minimize copying of data. The advantages of this are manyfold, such as the architecture is more flexible, improves productivity and maintenance,

During this seminar, Rick van der Lans explains how to design a lean data Architecture and which solutions and technologies are available to develop one. Design guidelines for zero-copy and single-copy data Architectures and a comparison with duplication-heavy Architectures are discussed. How to minimize intra- and inter-organization copying is discussed. The impact on existing Data Warehouse, Data Lake, and Data Hub Architectures are presented. A complete picture of designing lean data Architectures in real-life projects is given.

What you will learn

  • How to design lean data integration Architectures using examples
  • What the real drawbacks are of creating too many copies of the data are, including higher data latency, complex data synchronization, more complex data security and privacy, and higher development and maintenance costs
  • How new database, integration, and Cloud technology can help to design lean data Architectures that contain less copied data
  • What the effect is of applying data minimization to Data Warehouse and Data Lake architectures
  • How to design the data in single-copy solutions
  • What the 1:1+ approach for data Architectures means.
  • How to replace managed-file-transfer solutions by data-on-demand solutions, and how to reduce inter-organizational data flows
  • How to design data Architectures from the perspective of data processing specifications and not data stores

Main Topics

  • Unrestrained Copying of Data
  • Justifying Lean Data Architectures
  • New Technologies Enabling Lean Data Architectures
  • Design guidelines for Lean Data Architectures
  • Minimizing Inter-organization of Copying Data
  • Transforming Current Data Architectures to Lean Architectures




02 Nov 2023


Online event
Share on: