Integrating Master Data Management and BI (part II)

Having discussed Master Data Management (MDM) in the first part of this article, the second part of this article looks at the question of how does MDM and business intelligence fit together.

How Do MDM and BI Fit Together?

The first thing to point out is that master data is represented as dimensions in BI Systems. Master data is not associated with facts (i.e. transactions) in BI systems. The introduction of a master data management system within the enterprise should have a positive impact on BI systems. As an example, it is typically the case in an MDM system that the attribute data names and data definitions used to describe master data entities are likely to be the standard data names and data definitions for the enterprise. These master data definitions are often referred to as a shared business vocabulary (SBV) for the enterprise. The SBV is therefore master metadata.We can take advantage of a master data SBV in a BI system to enforce the reuse of the same data definitions across all dimensional models, cubes and BI tool business views in order to drive consistency across dimensional data (Figure 8). Adopting a master data SBV in this way can only improve the understanding of the data presented in BI system reports, OLAP analyses, dashboards and scorecards. It also contributes to the demand for compliance and the perception of trusted BI.

In addition to consistent metadata, the arrival of an MDM system in the enterprise can also impact data integration in a BI system (commonly referred to as data warehousing or ETL processing). If you do not have a MDM system today then your BI system will likely be based on a classic data warehouse architecture whereby your master data is fractured across multiple data stores in different line of business operational systems. Therefore to create integrated dimensional data in a BI system, you typically make use of a data integration tool to integrate disparate master data maintained in multiple operational systems to build dimensions.In that sense it is also true to say that your BI system is probably the only place today where you consolidate and integrate master data (Figure 9).


Figure 9 – A Classic BI System Architecture

Therefore it is no wonder we get confused between the concept of a master data hub and a data warehouse when both are integrating master data. So it raises the question of why do you need an MDM system when you already have a data warehouse? For me this is the wrong question. The question should be, why are we doing master data integration in a BI system? Should the master data not already be in integrated and treated as a data source by the data integration tools used in a BI system? In effect this would be better in my opinion because master data has to be supplied to more than just a BI system. It needs to be supplied to operational and BI systems. Master data can be supplied to BI system data integration tools in at least three ways:

  • By using integration and matching services of a SOA enabled data quality software to supply master data directly into a data warehouse or to ETL jobs populating data warehouses.
  • By using Registry based MDM solution to create a master data virtual data source that can be accessed by a BI system data integration tool
  • By using a built or bought MDM data hub as a persistent data source to a BI system data integration tool

DataFlux is an example of a vendor that can support the first of these options.Figure 10 shows the Registry MDM option. This can be supported by Purisma for example who supply a connector for Informatica PowerCenter whereby it can provision master data to the Informatica data integration tool for the creation of dimensional data. PowerCentre can apply Purisma determined Global IDs to cross-system customer data to create integrated customer data records in a BI system. Purisma also have a partnership with Business Objects as part of the Business Objects Enterprise Information Management (EIM) initiative to pass master data to Business Objects Data Integrator.Registry MDM systems could also provision master data changes to data integration tools for slowly changing dimensions processing.

 To some extent the first two of the above options should result in some degree of simplification of classic ETL processing in BI systems since this work may not have to be handled in ETL jobs.Figure 11 shows how an MDM data hub can be used as a data source to BI systems. Doing this of course would significantly simplify the building of BI systems and improve the quality of dimensional data because in this case the source of the dimension data already contains high quality and integrated master data. There is a much larger impact of this approach however in that it actually means we are separating dimension data from fact (transaction) data. This could potentially mean that all you need to do is to combine master data with transaction data to generate data marts with conformed (common) dimensions from master data and transaction data sources. This is very like the approach taken by the Kalido MDM solution. In fact it could be said that MDM Systems like Kalido could potentially negate the need for the ‘one huge single database’ approach to an enterprise data warehouse if the data marts all use conformed dimensions. Let’s be clear what I mean here. In other words the enterprise warehouse becomes the collection of data marts (that could be deployed on a single server) all of which share dimension data. It is a subtle change in that the concept of enterprise warehouse is still alive and well but the creation of it is done by combining master data and transaction data to create a set of linked data marts that share dimension data.


Figure 11

MDM can also strengthen BI analysis and reporting. For example in the area of statutory and regulatory reporting needed for compliance reasons, consistent master data is needed to ensure reports are correct across one or more BI systems. Accurate master data also prevents reconciliation problems occurring across BI systems, cubes and reportsAlso if an MDM system supports central control of hierarchies, it means that changes to these hierarchies can be easily propagated across multiple dimensional models in multiple line of business BI and operational systems. MDM vendors like HyperionStratature and Siperian all support master data hierarchy management including tracking change history to hierarchies. This means that multiple versions of hierarchies can be created. Therefore multiple versions of dimensions can be re-created for more accurate historical reporting and analysis. Also cubes can be automatically re-built with different versions of hierarchies for more accurate performance analysis.Figure 12 shows an example of Product master data. Here if a product that exists in 2005 was moved out of its 2005 product category into a different category in 2006 then changes to the Product master data hierarchy are needed. These changes need to be tracked and propagated to all necessary systems. With respect to BI, this allows us to correctly report on product sales and compare results from 2005 with results in 2006 by correctly accounting for the product category in 2005 and also in 2006 . MDM systems can therefore automate the re-creation of earlier versions of a product hierarchy for more accurate BI reporting.


Figure 12

In summary therefore it could be said that master data management strengthens DW/BI systems by

  • Providing master metadata for use in dimensional data models and cubes
  • Providing high quality master data as a trusted data source to ETL processing
  • Providing federated views of master data across disparate systems for reporting
  • Tracking versions of hierarchies over time
  • Automate re-creation of different versions of a dimension in a cube or star schema to reflect changes in hierarchies
  • Providing trusted data for reporting and analysis

MDM systems will feed master data changes to both operational and analytical systems. It is unlikely however that master data and DW dimensional data will be the same physical data instance because master data is typically normalised to support OLTP processing while dimensions are typically de-normalised to support dimensional analysis and to simplify reporting. Companies will likely move towards full enterprise master data management over a number of years and may not go the whole way to a single system of entry. One possible route if you are buying MDM solutions is to perhaps start with a registry based MDM system, then move to a data hub and finally Enterprise MDM. As for the question of building MDM systems or buying them, if your organisation has many bespoke point solutions then consider the purchase of a data hub MDM system moving towards Enterprise MDM over time. If your organisation has the majority of its core applications already using a common operational database, then it may be better to build a MDM system to reuse core business entity master data in that data store. This data store could then become a source to the data warehouse.

Mike Ferguson
November 2019

Upcoming events by this speaker:

Integrating Master Data Management and BI (part II)

In recent years, software development practices have changed dramatically. Gone are the old ‘batch style’ ways of building monolithic software products with all the functionality built into one huge application and a new release every eighteen months.

Today we have a new agile software development approach with software made up of components (microservices) running in containers managed by technologies such as Kubernetes and deployed in serverless computing environments like the cloud. The impact on the software market has been dramatic. Component based software development combined with agile, lower cost, faster continuous development / continuous delivery methods has resulted in new software and new functionality being delivered at a pace. You just have to look at the data and analytics market over the last five years. We have seen an explosion of new software in the market including new types of database products, new data management tools and analytical tools. It’s been a technology deluge happening at a pace that many companies buying software can barely keep up with.

Many would argue that the pace has quickened so much that many companies can no longer keep up with it. There are lessons to be learned from what has happened in the software world especially when you apply them to an aspiration I have heard executives tell me so many times. That is that they want to become data driven. They want the business to be driven by the insights produced from analysing data. The problem is that they want to be data driven now. The expectation is that despite the fact that the number of data sources and data types is rapidly on the increase and so many different types of data store are now being used both on-premises and on multiple clouds, that it should still be easy to archive this. So much so that executives are approving investments in cloud, data management and machine learning technologies as a matter of urgency.

The problem is that with business units now in control of spend, many companies have ended up buying multiple overlapping technologies in a rush to deliver value without necessarily knowing what workloads are best suited to what technologies all before creating any kind of data architecture that could deliver the business value they want. The result is technology overspend, silos, slow speed of development, no metadata sharing and in some cases the wrong technology for the wrong workload. Using NoSQL document databases as a data warehouse for example, is just a poor choice.

It would be much better if companies designed the data architecture first to enable them to become data driven and deliver the business value they need and then select technologies that can be work together and be integrated to bring the end-to-end data architecture to life. In addition, if we look at the what has happened in the software market, the question is, could be do the same with data and analytics?

In other words, could be create a common data architecture and a component based data and analytics development combined with agile, lower cost, faster continuous development / continuous delivery methods has resulted in new data, new analytocs, new insights and new business value being delivered at a pace? Could this be the way to deliver data-driven?

The answer in my opinion is yes. We need to build data products (assets) and analytical assets (e.g. predictive models, prescriptive models, BI reports, dashboards, stories etc.) that for the components that we need to assemble to deliver value. It should be the case that data-driven becomes a continuous development / continuous delivery process. To do this we need to have people building data and analytical components, while other consume them, assemble them and drive value with them.

So what is required if you want to become data driven? Companies need to:

  • Establish a common business vocabulary of common data names and definitions for logical data entities. This is critical to being able to build, trust, share reusable data products. It also helps people understand what data means.
  • Make use of Common Data Fabric software as opposed to everyone using different tools in silos to prepare and analyze data. A stronger approach is to rationalize and use a common data fabric software that can connect to both on-premises and cloud based data stores and that allows you to create pipelines to prepare and analyze data.
  • Establish a Multi-Purpose Information Supply Chain A critical success factor here is that companies should organize themselves to become data driven. That means establishing an enterprise information supply chain through which to produce business ready data products that can be published in a data marketplace from where information consumers can shop for data An enterprise information supply chain is similar to a manufacturing production process where information producers curate business ready data products for information consumers to find and use. It is a continuous development process. The whole point here is to produce trusted, commonly understood, business ready data products that can be re-used in multiple places. The benefit is that we can save information consumers considerable amounts of time because they do not have to prepare all the data they may need from scratch. Instead we create ready-made data products or assets that are already fit for consumption. An example might be customer data, product data, orders data etc. Having data ‘ready-to-go’ should therefore shorten the time to value and reduce costs. It is important to recognize the role of the data lake in an information supply chain. A data lake is too valuable to be restricted to just data science – it needs to be multi-purpose and so can be used to produce data assets that help build master data in MDM systems, data warehouses and data needed in data science
  • The Need For Data And Analytical Pipelines In A Supply Chain Component based development of data pipelines is needed to accelerate delivery of business ready data in an enterprise information supply chain. Also, it should be possible to add analytics and visualizations to a pipeline even if they are developed in other tools. To do this means that it needs to be possible to add data products, analytics and visualizations to a marketplace (catalog) to maximize their ability to be reused. If this occurs, the time to value will get shorter with every new project as more business ready data, analytics and other artifacts become available
  • Publish trusted data and analytical assets as services in a marketplace for consumption across the enterprise A enterprise data marketplace is a catalog inside the enterprise where people can go to shop for business ready data and analytical assets. Esstablihing this means you can introduce publish and subscribe based operations into an information supply chain to speed up delivery even more. This is because reusable data and analytical components gives every project a jump start in the information supply chain. By focusing on value, an information supply chain can be used to create business ready data, predictive analytics and pre-built decision services to automate actions in the digital enterprise. Information consumers can shop for and find ready made data and analytics using the data marketplace to help them deliver value. It is this availability of re-usable business ready data, predictive analytics and pre-built decision services that enables mass contribution to common business goals across the enterprise. In other words, it helps the business to become a self-propelling data-driven business

Data architecture is critical to becoming a data driven enterprise. We shall be talking about this and all its related topics at the International Big Data Conference in Rome in December 2019. I hope you can join us.