Integrating Master Data Management and BI (part I)
Over the last six months the hype surrounding master data management (MDM) has been overwhelming at times. Yet it seems that there is still a lot of confusion around as to what MDM is and how it integrates with various core operational and analytical systems across the enterprise. This two-part article will focus primarily on the integration of MDM with business intelligence (BI).
In the first part I will look at what MDM is about and the types of MDM systems while the second part of the article looks in particular at the question of how master data management can integrate with BI systems. I will also look at what kind of impact the introduction of an MDM system will have on existing data warehousing and business intelligence practices.
What is MDM?
Without wanting to labour the point about what MDM is, I nevertheless feel compelled to state my understanding of this relatively new technology area before I get into discussing integration of it with BI. This is done mainly for the sake of clarity.Master data management is a set of processes, policies, services, and technologies used to create, maintain and manage data associated with a company’s core business entities as a system of record (SOR) for the enterprise. Core entities include Customer, Supplier, Employee, Asset etc. Note that master data management is not associated with transaction data such as orders for example.
Whether you build your own MDM system or buy a MDM solution from a MDM vendor in the marketplace, it should meet a number of key requirements. These include the ability to:
- Define and maintain metadata for master data entities in a repository
- Acquire, clean, de-duplicate and integration master data into a central master data store
- Offer a common set of shared master data services for applications, processes and portals to invoke to access and maintain master data entities i.e. system of entry (SOE) MDM services
- Manage master data hierarchies including a history of hierarchy changes and hierarchy versions
- Manage the synchronisation of changes to master data to all operational and analytical systems that use complete sets or subsets of this data
By defining and maintaining metadata in the above list, I mean that an enterprise wide standard set of data names and data definitions should exist to describe each master data business entity. In addition, mappings from disparate system holding ‘fractured’ master data to standard master data definitions should also be defined. This last point in some cases can be a daunting task and cries out for tooling to help us automate the discovery of subsets of master data that are ‘out there’ and to discover the relationships between these data so as to determine all the disparate definitions for the same thing. The good news here is that we are now seeing the emergence of such tooling from vendors like DataFlux, IBM, Informatica and Sypherlink.
Why Do We Need MDM Systems?
One of the most common questions I run into from my clients is why is it that we need yet another type of system? Why is MDM is so important? One of the key reasons for getting control of master data is to improve business performance. Many companies today now realize that their master data is heavily duplicated and fractured across multiple line of business operational systems. If you simply take order entry and billing as two basic systems of any company you realize right away that customer data, for example, is highly likely to be stored in both systems. But which one is the master? Is it possible for example to create new customers and maintain the same customer data in both of these systems? Also, how does data associated with core business entities in each of these systems remain synchronized as changes occur?
If you have subsets (some attributes) of a core business entity master data resident in multiple systems you are probably well aware of the problems not least of which is the issue of identity mania. Every system has a different identifier for the same master data. Take Retail Banking as an example. Figure 1 shows a picture of yours truly as an example of a customer that holds a VISA credit card, a VISA debit card, a Master Card credit card, a mortgage and a loan. To top all that, assume that I am also an employee at the Bank. Looking at this, you don’t have to be a rocket scientist to realize immediately that a customer is likely to have a different identifier in each of the systems holding data about them including in this case all the product oriented line of business systems (where a customer holds that product) and in the HR system. Therefore one thing that is clear is that an MDM system needs to support a Global Identifier (sometimes referred to as a surrogate key). Sound familiar? I can already hear you saying, “yeah, yeah….been there done that in my data warehouse years ago……”. Hold that thought for a while and I’ll come back to it.
Continuing on, it is also clear that all the disparate identifiers need to be mapped to the global identifier to allow a single version of a customer to be built. Of course you quickly realise that this is not enough to solve the problem. This is because it is not just the identifiers that differ across systems housing subsets of master data. The problem is that ALL of the attribute data definitions and code sets may be different (Figure 2).
What you actually need is a common set of data names and data definitions for all attributes describing each core business master data entity. In addition, to get a single view of a master data entity you need to have global identifiers, a master shared business vocabulary (SBV), AND know all the data mappings from disparate systems to common master data definitions (Figure 3).
For those of you well practiced at data warehousing, I would assume that you still recognize this as something very familiar and something that you in fact most probably already do in the course of building dimension data in a data warehousing system. However, even if you have integrated customer data and product data in your data warehouse, MDM goes much wider that this – it is an enterprise wide issue. The real problem is still back in operational systems. Given that master data is often heavily fractured in operational systems, and also that it is typically maintained in more than one line of business operational system of entry (SOE), the problem of master data synchronization often resembles the proverbial ‘spaghetti ball’. We’ve all seen a spaghetti ball slide that took forever to figure out and draw and that makes you realize what a mess you are in! In this maze of interfaces you realize that data is flowing in all kinds of directions between systems, primarily just to keep it synchronized. It happens in batch, via application messaging and of course via the main culprit – Excel spreadsheets attached to emails! I have a feeling this latter point may have struck a chord with a few of you. Yes folks Excel mania is still out there alive and well.Even if we replaced all the batch interfaces with electronic messaging interfaces using popular messaging products (as many companies have tried over the years) it still does not fix the problem which is of course that there is no master data hub, no common master data store where a complete system of record (SOR) exists for each core business entity.Therefore many companies are now (if they haven’t done so already) looking to create master data stores by capturing, cleaning and integrating subsets of master data from disparate line of business operational systems (Figure 4) using tools from an enterprise data management suite .
Again MDM data hubs can be built or bought. Either way, companies still struggle with the problem of multiple systems of entry (SOEs). It would be a lot easier if multiple SOEs could share common services to access and maintain common master data (Fig 5).
Even though Figure 5 seems an obvious thing to do, the problem is that this may require applications to be changed to invoke master data services if master data is consolidated into a master data store. While this could be done for custom built applications it is more difficult to achieve for packaged applications. This problem can be solved if integrated master data is maintained via a single portal based system of entry and master data changes are propagated to other systems that need it (Figure 6).
This works very well in a services oriented architecture (SOA) and I have already discussed this in Figure 9 in my article entitled Information Integration and Master Data Management which is also available on the B-EYE-Network.So assuming we understand all this and can see the reasons for needing an MDM solution, the question is how can this be created? There are of course two main approaches – build or buy. Companies are likely to build when they already have a data store and/or dominant system that already holds a significant set of integrated master data that is shared by at least some of a company’s main operational systems. This may avoid the creation of yet another data store which is often an issue that several of my clients have raised. Alternatively you could buy MDM technology in the marketplace to more rapidly deploy an MDM solution. There are of course different types of MDM system that you can buy. These include:
- Registry based MDM systems
- Hub based MDM systems
- Enterprise MDM systems
The differences between these types of system are as briefly outlined below.
Registry based MDM systems use Global Identifiers and data federation to create a virtual master data system of record (SOR). Therefore master data elements in disparate underlying systems are assembled on-demand in real-time to dynamically produce integrated master data records. These systems are remarkably similar to EII tools in many respects and in some cases partnerships with EII vendors exist to enable this virtual view of master data. They are also similar to data quality products that can provide on demand matching services to match data to create single integrated versions of master data records. It is no surprise therefore to see data quality vendors like DataFlux moving in on the MDM market to compete with Registry based MDM vendors. Registry based MDM systems support several data sources and also often have a web services interface and therefore offer on-demand master data integration services. On-demand master data can be requested by
- Applications and portals
- BI reporting and analysis tools
- ETL tools that need master data to populate dimensions in a data warehouse or data mart
Registry based MDM systems also use data quality services to clean and validate master data. Finally, operational systems still remain the systems of entry (SOEs) for master data in the registry approach. Examples of Registry based vendors include Initiate and Purisma.Hub based MDM systems (Figure 7) propagate master data changes between disparate operational systems and can persist a complete integrated set of master data in a master data store as a system of record (SOR). This means that master data and master data changes can be integrated into a MDM hub. Each hub contains master data for a single master data entity. Some hub based MDM systems also provide federated master data integration across the MDM hub and other systems to create a virtual SOR as per the registry approach. This is valuable if the hub is not a complete SOR. Once again, with a hub based MDM system, the line of business operational systems still remain the systems of entry (SOEs).
Figure 7
Enterprise Master Data Management solutions are more comprehensive than MDM data hubs. These systems are top of the range in that they are both the system of entry (SOE) and the system of record (SOR) for master data. They offer master data integration and master data synchronization as well as a fully integrated complete set of master data in a master data store. The master data store supports multiple master data entities and changes are logged and propagated to other operational applications and data warehouses. In addition Enterprise MDM systems go beyond operational master data by integrating operational master data, related master BI and related unstructured content associated with an instance of a core entity e.g. a customer, a product etc. All entities are defined in a shared business vocabulary (SBV) of common enterprise wide data names and data definitions. In addition, data quality services are offered to validate and clean master data. Common web services exist for access and maintenance of master data and changes are tracked. Finally a history of metadata changes is also maintained.Vendors in the MDM Hub and Enterprise MDM markets include:
- FullTilt Perfect Product Suite
- Hyperion MDM
- I2 PIM
- IBM WebSphere Product Center and Customer Center
- Kalido 8M
- ObjectRiver MDM
- Oracle Customer and PIM data hubs and Sunopsis AIP
- SAP NetWeaver MDM
- Siperian Hub, Master Reference Manager, Hierarchy Manager and Activity Manager
- Stratature Enterprise Dimension Manager
- Teradata MDM
In the second part of this article I will discuss how MDM and BI Systems can fit together and how the use of an MDM system will speed up BI development.