Managed Meta Data Environment (MME): A Complete Walkthrough (II Part)
Meta Data Integration Layer
The meta data integration layer (Figure 6) takes the various sources of meta data, integrates them, and load it into the meta data repository. This approach differs slightly from the common techniques used to load data into a data warehouse, as the data warehouse clearly separates the transformation (what we call integration) process from the load process. In a MME these steps are combined because, unlike a data warehouse, the volume of meta data is not nearly that of data warehousing data. As a general rule the MMEs holds between 5-20 gigabytes of meta data; however, as MME’s are looking to target data audit related meta data then storage can grow into the 20-75 gigabyte range and over the next few years you will see some MME’s reach the terabyte range.
The specific steps in this process depend on whether you are building a custom process or if you are using a meta data integration tool to assist your effort. If you decide to use a meta data integration tool, the specific tool selection can also greatly impact this process.
Meta Data Repository
A meta data repository is a fancy name for a database designed to gather, retain, and disseminate meta data. The meta data repository (Figure 7) is responsible for the cataloging and persistent physical storage of the meta data.
The Meta Data Repository should be generic, integrated, current and historical. Generic means that the physical meta model looks to store meta data by meta data subject area as opposed to application-specific. For example, a generic meta model will have an attribute named “DATABASE_PHYS_NAME” that will hold the physical database names within the company. A meta model that is application-specific would name this same attribute “ORACLE_PHYS_NAME”. The problem with application-specific meta models is that meta data subject areas change. To return to our example, today Oracle may be our company’s database standard. Tomorrow we may switch the standard to SQL Server for cost or compatibility advantages. This situation would cause needless additional changes to the change to the physical meta model4.
A Meta Data Repository also provides an integrated view of the enterprise’s major meta data subject areas. The repository should allow the user to view all entities within the company, and not just entities loaded in Oracle or entities that are just in the customer relationship management (CRM) applications.
Third, the Meta Data Repository contains current and future meta data, meaning that the meta data is periodically updated to reflect the current and future technical and business environment. Keep in mind that a Meta Data Repository is constantly being updated and it needs to be, in order to be truly valuable.
Lastly, meta data repositories are historical. A good repository will hold historical views of the meta data, even as it changes over time. This allows a corporation to understand how their business has changed over time. This is especially critical if the MME is supporting an application that contains historical data, like a data warehouse or a CRM application. For example, if the business meta data definition for “customer” is “anyone that has purchased a product from our company within one of our stores or through our catalog”. A year later a new distribution channel is added to the strategy. The company constructs a Web site to allow customers to order our products. At that point in time, the business meta data definition for customer would be modified to “anyone that has purchased a product from our company within one of our stores, through our mail order catalog or through the web”. A good Meta Data Repository stores both of these definitions because they both have validity, depending on what data you are analyzing (and the age of that data). Lastly, it is strongly recommended that you implement your Meta Data Repository component on an open, relational database platform, as opposed, to a proprietary database engine.
Meta Data Management Layer
The Meta Data Management Layer provides systematic management of the Meta Data Repository and the other MME components (see Figure 8). As with other layers, the approach to this component greatly differs whether a meta data integration tool is used or if the entire MME is custom built. If an enterprise meta data integration tool is used for the construction of the MME, than a meta data management interface is most likely built within the product. This is almost never the case; however, if it is not built in the product, than you would be doing a custom build. The Meta Data Management Layer performs the following functions:
- Archive
- Backup
- Database modifications
- Database tuning
- Environment management
- Job scheduling
- Load statistics
- Purging
- Query statistics
- Query and report generation
- Recovery
- Security processes
- Source mapping and movement
- User interface management
- Versioning
Archive
The archive function allows the meta data architect to set the criteria or event that triggers the MME archiving process. It should be able to archive all of the meta data in the Meta Data Repository and the related Meta Data Marts, and it should allow for specific meta data tables to be archived when necessary.
Backup
Backup functionality is often confused with archiving. Archiving targets the storage of older and less needed versions of the MME, while backups refer to the process of making sure the current MME is stored in a separate database so that if the production version of the MME is corrupted or if any of its components fail, a backup version can be brought online. Often the backup strategy is implemented at a hardware level through the use of disk mirroring. Best practices in this area include storing the copy in a different physical location than the production version of the MME.
Database Modifications
Since the meta model is implemented in an open, relational database, often tables and columns within the meta model need to be added, modified or deleted. The Meta Data Management Layer needs to not only assist in this process, but also track the changes that have been made to the MME.
Database Tuning
Tuning of the Meta Data Repository and its associated Meta Data Marts is a very important part of the Meta Data Management Layer. First, identifying indexes ensures that reports run efficiently. When analyzing physical meta model structures it is common to only see indexes on primary keys. This is typically a sign of poor or no indexing strategies.
Second, database tuning helps you identify and remove dormant meta data within the repository. A large MME that has been in production for a few years commonly has a good deal of dormant meta data. A sound MME will contain meta data that provides operational measurement statistics on the use of meta data in the MME to assist in the identification of dormant meta data.
Environment Management
Many meta data professionals make the mistake of believing that when they implement a MME that they are implementing and maintaining one system. In actuality they are building and maintaining three (possibly more) systems:
- Production
- Testing
- Development
The production version of the MME is the system that is in the “production environment” of an organization and is the version of the MME that the end users are accessing. The testing environment is the version used to test changes to system “bugs” found in the Production version of the MME. The development version of the MME is used to test future, major MME enhancements.
The names and number of MME environments differ based on the organizations internal IT standards; however, the three environments mentioned above are the most common. In any event, a good meta data management layer can handle whatever number of environments and names that are required. The environment management portion of the meta data management layer needs to organize and control the management and migration between these three system versions.
Job Scheduling
The program and process jobs that are executed to load the MME and to access the MME need to be scheduled and managed. The job scheduling portion of the Meta Data Management Layer is responsible for both event-based and batch triggered job scheduling.
Load Statistics
The meta data extraction and integration layers of the MME generate a great deal of valuable MME loading statistics. These MME load statistics need to be historically stored within the meta data repository portion of the MME. Examples of the most common type of load statistics include:
- How long did it take a particular process tread to run (clock time and CPU time)?
- How long did the entire meta data extraction and integration layers take to run (both clock and CPU time)?
- What errors were encountered in the meta data extraction and integration layers?
- What were the categories (e.g. informational, warning, severe, critical, etc.) of the errors that were logged?
- How many rows were inserted, changed, or deleted in each table of the meta model?
Purging
This part of the Meta Data Management Layer defines the criteria for MME purging requirements. Your MME’s purging specific purging requirements and criteria will be governed by its business requirements. As a general rule, meta data that is inaccurate or improperly loaded should be purged; all other meta data should be archived.
Query Statistics
Once the various Meta Data Delivery Layer reports and queries are generated, it is important to capture the user statistics associated with the access to these reports and queries. At a minimum, the Meta Data Management Layer needs to historically capture:
- Who is accessing the report or query?
- What report or query are they accessing?
- When are they accessing the report or query?
- How often are they accessing the report or query?
- How long do they access the report or query?
Query and Report Generation
The reports and queries used in the meta data delivery layer are defined and generated in the report generation section of the meta data management layer. How this is accomplished depends on whether a custom meta data delivery application is developed or a meta data access tool is implemented. This component also needs to manage any publish and subscribe capabilities that are required.
Recovery
There are many situations that can cause a company to have to recover or reload a previous version of their MME: hardware failure, database corruption, power outage, errors in the Meta Data Integration Layer. The recovery process needs to be tightly tied to the backup and archiving facilities of the Meta Data Management Layer. Recovery processes my be manually built or utilize existing recovery processes within the meta data integration tool. Both approaches need to be integrated into the organization’s existing application recovery processes.
Security Processes
The Security processes manage:
- Creation of new MME users
- Grouping of MME users
- Setting MME user privileges and profiles
- Firewalls/Infrastructure
- Password management
- User location verification
- Meta data masking (data level or access level)
This extensiveness of security processes is dependent on the business requirements of the MME. For instance, the security requirements around a Department of Defense or Federal Bureau of Investigation (FBI) MME are far greater than that of a bank.
Source Mapping and Movement
Meta data sources need to be mapped to the correct attributes and entities of the Meta Data Repository. This mapping and movement process needs to be captured and managed in the Meta Data Management Layer of the MME.
User Interface Management
This is the processes for the building, managing and maintaining the Web site (recommended user interface) that the end users will visit to navigate through the MME. Typically the view (version of the Web site) end users see depends on their security privileges and profiles. A business user would not be interested in looking at program code changes, so it makes sense to not have meta data reports or queries related to highly technical meta data available to the traditional business user through control of role access.
Versioning
As meta data is changed, added, and purged it will be important for the meta data management layer to historically track (or version) it. There are two common techniques for meta data versioning. The first is the use of date stamps. Date stamping — assigning a date to each row of your meta model entities — allows a firm to make whatever changes are necessary to the meta data, while maintaining its historical relevance.
The second technique is to assign version numbers to each row of the meta model. Version numbers are merely generated unique values (e.g. 1.1, 1.2, 1.a, etc.). Version numbers are more limiting than date stamping, which offers more meaning to MME users. Version numbers can be associated with specific dates and times; however, this adds additional complexity to the loading of the meta data and an additional join in the access queries and reports.
Another challenge of using versioning is capturing meta data that will become current at some point in time in the future. For example, a new physical table may be moved into the production environment at a future date. To handle these versioning situations an “effective dated rows” can be useful. Effective dated rows is a process that allows data to be entered into a group (table) for subsequent processing when the effective date becomes current. Effective dated data can be historical, current, or future. Below are the key effective dated rows concepts:
Effective date. The date on which the row of data becomes effective; the date it can be acted on.Effective status. Allows a application to select the appropriate effective dated rows when combined with the effective date (Domain value list: “Active” or “Inactive”).Current row. First row of data with an effective date equal to or less than the system date and an effective status of “active”. Only one row can be in this state.Future row. Meta data rows that have a date greater than the system date.Historical row. Rows of meta data that have an effective date less than the current row.
Table 1 illustrates the effective dated rows technique. In this example the current system date is January 20, 2004. The meta data row dated “2004-01-27” has an effective status of “Inactive”. However, once the current date reaches January 27, 2004 the meta data row dated “200401-18” will become a historical row and the row dated “2004-01-27” will have its “Effective Status” changed from “Inactive” to “Active”.
EFFECTIVE DATE | EFFECTIVE STATUS | EFFECTIVE STATUS COMMENTS |
2003-12-01 12:00:00 | Active | Historical row |
2004-01-18 12:00:00 | Active | Current row |
2004-01-27 12:00:00 | Inactive | Future row |
Table 1: Effective Dated Rows
Meta Data Marts
A Meta Data Mart is a database structure, usually sourced from a Meta Data Repository, designed for a homogenous meta data user group (see Figure 9). “Homogenous meta data user group” is a fancy term for a group of users with like needs.
There are two reasons why an MME may need to have meta data marts. First, a particular meta data user community may require meta data organized in a manner other than what is in the Meta Data Repository component. Second, an MME with a larger user base often experiences performance problems because of the number of table joins that are required for the meta data reports. In these situations it is best to create meta data mart(s) targeted specifically to meet those user’s needs. The Meta Data Marts will not experience the performance degradation because they will be modeled multi-dimensionally. In addition, a separate meta mart provides a buffer layer between the end users from the Meta Data Repository. This allows routine maintenance, upgrades, and backup and recovery to the repository without impacting the availability of the Meta Data Mart.
Meta Data Delivery Layer
The Meta Data Delivery Layer is the sixth and final component of the MME architecture. It delivers the meta data from the Meta Data Repository to the end users and any applications or tools that require meta data feeds to them (Figure 10)5.
The most common targets that require meta data from the MME are:
- Applications
- Data warehouses and data marts
- End users (business and technical)
- Messaging and transactions
- Meta data marts
- Software tools
- Third parties
- Web sites and e-commerce
Applications
Quite often applications (customer relationship management (CRM), enterprise resource planning (ERP), and so on) need to receive meta data from the MME for their own use. In these situations it is most common to have the meta data repository create an extract file that can be brought into the application. Typically the repository will generate a flat file and place it in a holding area that, when the application is ready, can read it in.
Data Warehouses and Data Marts
The Meta Data Delivery Layer data warehouses and their associated data marts (usually query and reporting are executed at the data mart level) are separate from applications because of some subtle differences in the use of the meta data. Figure 11 shows the data mart query and report bringing in meta data from the MME. Typically data marts are accessed via front-end tools (Business Objects, Cognos, Hyperion, Microstrategy, and so on). These tools generate SQL code. Since the Meta Data Repository component is stored on an open, relational database it is easy enough to “point” these tools at the repository and bring the meta data directly into the query/report (See Figure 11 for an example).
For some data warehousing applications where the data in the data mart(s) is voluminous or the number of end users is high, the overhead involved with going to a separate database may create too great a time delay for end users. These technical implementation issues can be remedied by loading the meta data directly into the data marts (Figure 12) during the marts load cycle.
End Users
The meta data delivery layer typically brings meta data directly to both business and technical end users. Usually this meta data is delivered directly to the user’s computer in the form of a document or spreadsheets or through a “thick” or “thin” client front-end meta data access tool.
Messaging and Transactions
As previously discussed, many companies use some form of messaging and transactions — whether EAI or XML — to transfer data from one system to another. Although most companies are not very advanced in their use and application of EAI or XML, these types of applications do utilize meta data. If companies continue to grow these applications, their need for meta data will continue to rise.
Meta Data Marts
As discussed earlier in this article, there are situations where meta data will be extracted from the repository and brought into a Meta Data Mart component. These Meta Data Marts are a database structure designed for a homogenous group of meta data users.
Software Tools
Sharing and interchange of meta data among various software tool’s internal meta data repositories is particularly desirable for global enterprises with dispersed teams trying to solve similar or related data analysis problems using an integrated computing approach 6. Hopefully, industry meta model standards like Common Warehouse Metamodel (CWM) and ISO11179 should make this effort easier. Today, most companies have to analyze the software tool’s meta models and then build technical processes that will share meta data between these tools. Anyone that has had the opportunity to build and maintain these types of processes will attest to how difficult it is.
Third Parties
Some MMEs need to send meta data to third parties; vendors or suppliers, customers, government agencies or regulatory bodies, and business partners (Figure 13). Typically when meta data is exchanged with these third parties, it is done through the generation of a flat file; however, more and more companies are beginning to use XML as transportation syntax.
Web sites and E-Commerce
Web sites and e-commerce applications also need meta data. Meta data-related Web sites are a very effective way to present meta data to the end users7. E-commerce — with its trend towards XML and its need to interact with customers and partners — will continue to trend towards needing meta data in their processes.
For professionals that have built an enterprise meta data repository they realize that it is some much more than just a database that holds meta data and pointers to meta data. Rather it is an entire environment. The purpose of the MME is to illustrate the major architecture components of that managed meta data environment.
4 See Chapters 4 – 8 of “Universal Meta Data Models” (David Marco & Michael Jennings, Wiley 2004) to see various physical meta models5 See Chapter 10 of “Building and Managing the Meta Data Repository” (David Marco, Wiley 2000) for a detailed discussion on meta data consumers and meta data delivery6 See Chapter 2 of “Building and Managing the Meta Data Repository” (David Marco, Wiley 2000) for a more detailed discussion on this topic7 See Chapter 3 of “Universal Meta Data Models” (David Marco & Michael Jennings, Wiley 2004) for several examples