Is an Enterprise Data Warehouse Still Required for Business Intelligence?
The number of business intelligence (BI) solutions appearing in the marketplace continues to grow. Most of these solutions still employ the services of a traditional enterprise data warehouse, but an increasing number do not.
In some operational BI applications, for example, event data volumes and/or the need for fast action times may prevent the data from being persisted in a data warehouse before it is analyzed. These latter applications do not replace the enterprise data warehouse, but often work in conjunction with it – the results from the analytical processing may be stored in the enterprise data warehouse, for example.
As the BI industry evolves, so too does the role of the traditional enterprise data warehouse. The title of this article is deliberately provocative. My objective is to encourage BI professionals to consider the role of the data warehouse in new BI projects. In the past, the enterprise data warehouse has been the cornerstone for such projects, but I believe in many situations this is no longer true.
A Historical Perspective
Essentially there are three main types of IT processing involved in running the business: business transaction processing, business intelligence processing, and business collaboration processing. Business transaction applications run day-to-day business operations, while BI applications analyze those operations with the objective of optimizing and improving them. Collaboration systems enable business users to share information and expertise about business operations.
Prior to the introduction of the concept of business intelligence, most companies analyzed their business operations using decision support applications that queried and reported directly on data stored in business transaction databases. There were several problems with this approach. The five key ones are, 1) the data was not usually in a suitable form for reporting, 2) the data often had quality issues, 3) decision support processing degraded business transaction performance, 4) data was often dispersed across many different systems, and 5) there was a general lack of historical information.
Data warehousing was introduced to help solve these data and performance issues. While there is no question that data warehousing helped improve business decision making, it is important to realize, nevertheless, that it was introduced primarily to solve design issues in business transaction systems, and also for performance reasons.
The emergence of BI and business performance management (BPM) applications and tools further enhanced business decision making by giving business users simpler interfaces, improved data analysis features, and the ability to compare actual and planned performance. Although BI and BPM applications typically process data in a data warehouse, this is only because of the five issues outlined earlier concerning direct access business transaction data. If these issues could be resolved then there would be no need for a data warehouse.
Traditional Business Intelligence
Traditionally business intelligence has been used for many years for strategic and tactical decision making. This type of processing involves intensive analytical processing of historical and summarized data managed in an enterprise data warehouse. Data performance issues caused by centralizing data in an enterprise data warehouse have led to the creation of data marts, which solve performance problems by spreading the BI processing across multiple data stores.
The problem with data marts is that organizations often build them directly from business transaction databases, rather than the enterprise data warehouse. This is because it is often quicker and easier to build a data mart than to incorporate additional data into the enterprise data warehouse and then build the data mart from the data warehouse. Another problem is that many organizations have more than one “enterprise” data warehouse. Multiple disconnected data warehouses and data marts leads to data consistency issues, which data warehousing was supposed to solve in the first place.
There are many articles written about the problems of building dozens of independent data marts directly from operational data, and I will not discuss these problems here. It is worth pointing out, however, that a data mart solves most of the same issues addressed by an enterprise data warehouse with the exception that decisions sourced from multiple data marts may be inconsistent in the same way that decisions based on multiple business transaction databases can be inconsistent. This multi-source issue could be mitigated by first integrating the business transaction data, for example, into an operational data store or master data store. The issue of historical data would have to be addressed, but this is solvable.
Historical Data and Current Data
The distinction between current data and historical data should be easy to define, but it is not. Data in a business transaction data store is usually current, while data in a data warehouse is usually considered to be historical. The issue here is, “What is meant by current and historical?” Let’s look at an example.
If I have multiple telephone accounts with a telephone company, then (in simplistic terms) the company will have a record showing my customer data, and a record for each of my accounts listing the telephone number, account balance, billing data, and so forth. When I make a telephone call, send a text message, or access the Internet, I create a call detail record (CDR) that can be collected with other CDRs to analyze customer calling patterns, detect fraud, etc.
At any given moment in time the customer and account records will show the latest information about my current status. This can be considered to be current data. As this data is updated with new address and account data, the old data may be captured into a data warehouse for analysis purposes. The capturing process may be done at particular moments in a time (a snapshot), or continuously, depending on how the data will be analyzed. Regardless, the data in the data warehouse is historical.
The CDR data is a different situation. In general, once I make a phone call, the CDR for that call never changes. When a telephone company captures the CDR into a data store for analysis, is the data current or historical data? The answer is it is current data, because, even though the data ages over time, it is always the current version of the data. What name do we give the CDR data store? I suspect most people would call it a data mart. Is this the correct term? Does it really matter what we call it, other than the fact some people have the need to give things labels? Regardless, we most probably don’t want to keep the CDR data in an enterprise data warehouse. It is useful, however, to keep a historical record of the CDR for analysis results because this shows trends and patterns over time.
If the CDR information is used to detect fraud, then the quicker the analysis can be done the faster fraud can be detected. The BI application can process the data in-flight as it flows through the system, or can analyze it in a persistent data store that is updated continuously with the CDR data. Data mining of past CDR records can help set the business rules up for doing this detection. This fraud detection application is a good example of an operational BI application.
Operational Business Intelligence
There are a growing number of operational business applications similar to those described above for CDR analysis. The business benefit of these applications is that they can help companies become more agile by analyzing data during intra-day operations. The ultimate example of this type of processing is algorithmic trading, which employs event analytics to optimize trading operations. The response time of these analyses are a fraction of a second. In this type of processing, it is not feasible, or even necessary, to store the huge volumes of data involved in a data warehouse. The results of BI processing, however, may be kept for future use.
The model for many operational BI applications is capture data, analyze data, persist results, i.e., they analyze data before persisting it. This is different from the traditional BI model of capture data, persist data, analyze data, persist results. BI applications that analyze web traffic and business activity on commercial web sites to track buying trends, optimize prices, etc., is another example of operational BI. Although these applications analyze data in-flight, they may also use historical data in a data warehouse to assist in the analysis.
Why is this discussion important? The main reason is at present business intelligence is synonymous with data warehousing. This thinking is wrong and needs to be changed. Data warehousing is a component of BI, but BI may employ data in other data stores. In some cases a BI application may not even use data managed in a data warehouse. The tight connection between BI and data warehousing is causing terms such as virtual data warehousing to be used to describe other types of BI processing. These terms are unnecessary and just confuse everybody.
Another issue is that people have forgotten that data warehousing was created to overcome deficiencies in business transaction systems. Many of these issues are now solvable. My concern is data warehousing has become a system in its own right and companies are now extending the data warehouse into other application areas such as master data management and content management. This is completely the wrong direction and must be argued against.
The bottom line is that data warehousing is still an important component of business intelligence, but it is no longer the foundation on which all BI projects have to be built.