Developing a Bi-Modal Logical Data Warehouse Architecture Using Data Virtualization
The Expiration Date of the Traditional Data Warehouse Architecture – Every technology, every architecture, and every design technique has an expiration date. And this is very true for the world of information technology. It would be inconceivable if assembler languages, hierarchical databases, and waterfall design techniques would still be used to develop the complex systems of today. No exception applies for business intelligence.
The Expiration Date of the Traditional Data Warehouse Architecture – Every technology, every architecture, and every design technique has an expiration date. And this is very true for the world of information technology. It would be inconceivable if assembler languages, hierarchical databases, and waterfall design techniques would still be used to develop the complex systems of today. No exception applies for business intelligence.
The heart of most current BI systems is formed by a traditional data warehouse architecture initially designed to support classic forms of reporting. For such systems, aspects such as improved governance, high-quality data, and stability play a key role. To offer such qualities, BI systems are accompanied by a rigid development, operation, and management process.
The traditional data warehouse architecture has had a great run for over twenty-five years and has served countless organizations well. But for many organizations it has passed its expiration date. Due to its rigid architecture, many new BI requirements are hard to implement with the existing BI systems. For example, BI systems have to support new forms of reporting and analytics, such as self-service BI; investigative analytics; data science; external users, such as online customers, partners, and suppliers; new storage technologies, such as Hadoop and NoSQL; external data sources, such as social media data and open data; and large quantities of data. In addition, the time to market for reports has to be accelerated.
The Big Business Intelligence Dilemma – Organizations know that their current BI system can’t be thrown away, because the existing reporting workload has to keep working. But how should they implement this new BI workload and integrate it somehow with the existing system? Many organizations struggle with this dilemma. Currently, organizations try to solve this problem by developing many analytical islands, and, because so few specifications are shared form island to island, business analysts are constantly reinventing the wheel. Plus, it’s close to impossible to guarantee report consistency across the classic reports and the new BI workload.
How can old and new forms of BI be supported by the same system? This big BI dilemma must be solved. New BI architectures are needed that support the traditional, somewhat rigid style of BI development with the new development style that is introduced by the new BI requirements.
Bi-Modal Architectures – In 2014, Gartner introduced the Bi-modal concept . The term “bi-modal” refers to two modes of IT development. Mode 1 is the classic style of development, in which every system must be reliable, predictable, and correct. Systems must be formally tested, governed, and managed, they must be auditable, etc. Mode 2 corresponds to the more agile development styles that focus on speed and agility, and support experimentation, flexibility, and self-service analytics. According to a Gartner survey, 45% of CIOs states that they currently operate in Mode 2 and Gartner predicts that by 2017 75% of all IT organizations will operate bi-modally, meaning that they will support both modes.
The terms Mode 1 and Mode 2 clearly correspond, respectively, to the development of classic reports and to the development style used by the new BI forms. In BI, classic reports (Mode 1) must be reliable, predictable, tested, governed, reproducible, etc, whereas the new BI forms (Mode 2) are more experimental, flexible and self-service oriented. The challenge, for most organizations, is how to transform their current BI system into a modern BI system that supports both development modes. In other words, they must transform their uni-modal BI system into a bi-modal one.
Requirements for Bi-Modal BI Systems – Bi-modal BI systems must support the following requirements: • All the classic requirements to support Mode 1, such as those for performance, stability, availability, data consistency, and correctness.
- The ability to access big data stored in new big data storage technologies, such as Hadoop and NoSQL.
- The ability to access open data sources and other external data sources.
- The ability to access any type of data: unstructured, semi-structured, fully structured, and all formats, including JSON, Excel, XML, and comma-separated.
- The ability to make all the data available to a wide range of users, from the ones working with standard reports (Mode 1), all the way up to the data scientists (Mode 2).
- The ability to industrialize a Model 2 development to Mode 1.
- Support for self-service data preparation and report development.
- Support for fast development to increase the time-to-market of new reports and new forms of analytics.
- The ability to share metadata specifications across all the reporting and analytical modules.
- The ability to help Mode 2 developers to find data through advanced discovery capabilities.
The Logical Data Warehouse Architecture – One new architecture suitable for developing bi-modal BI systems is the logical data warehouse architecture. It’s an agile architecture for developing BI systems in which data consumers and data stores are decoupled from each other; see the figure below. The logical data warehouse architecture presents all the data stored in a heterogeneous set of data stores as a single logical database. Data consumers don’t have to be aware of where and how the data is stored; all of the details of data storage are hidden for the data consumers. They should not have to know or care about whether the data they’re using is coming from a data mart, a data warehouse, or even a production database. They should not have to be aware that data from multiple data stores have to be joined, nor should they know whether they are accessing a SQL database, a Hadoop cluster, a NoSQL database, a web service, or simply one or more flat files. The structure of the data stores is hidden as well; data consumers only see the data in a way that’s convenient for them, and they only see data that is relevant to their task. This is all achieved by decoupling data consumers from data stores.
The primary goal of decoupling is to get a higher level of flexibility. For example, changes made to the data stores don’t automatically mean that reports must be changed as well, and vice versa. Or, replacing one data store technology by another is easier when that data store is “hidden” behind the logical data warehouse architecture. In this architecture, adopting big data is relatively easy, access to real-time data is less complex for data consumers, and dealing with cloud-based data becomes simple.
BI systems with this architecture can exhibit the same robustness as the traditional data warehouse for the standard forms of reporting. In addition, they are more suitable to support new data sources, such as big data and open data; they can more easily handle new data-storage technologies, such as Hadoop and NoSQL; they match better with the dynamic world of self-service BI; they simplify support for investigative analytics and data science; and they speed up development and ease maintenance with fewer resources. Several technologies enable the development of a BI system built on a logical data warehouse architecture, such as in-memory database servers and data grid technology. However, data virtualization technology offers the best fit, as it supports almost all the functionality required to develop, run, secure, and manage a logical data warehouse. Data virtualization provides features for data security, scalability, performance, development, the reuse of metadata, discovery and search, big-data access, etc. But most importantly, it offers a comprehensive abstraction layer that decouples data consumers from data stores. For a more detailed description of data virtualization technology, see Data Virtualization for Business Intelligence Systems.
Closing Remark – The big BI dilemma is a real problem for many organization. Unfortunately, there is not one supernatural tool that can magically turn an existing BI system into a bi-modal BI system. No such silver bullet exists. The solution must be found in setting up BI systems with a different architecture. The logical data warehouse architecture is by definition bi-modal-ready. In addition, data virtualization technology has reached the level of maturity required to develop a logical data warehouse architecture today. In other words, nothing stands in your way on your logical data warehouse journey.