By Mike Ferguson

June 2024

Upcoming events by this speaker:

June 7, 2024 Online live streaming:
Data Catalogs

The Enterprise Data Marketplace – A New Mechanism for Governing Data Sharing

In recent years we have seen the importance of data governance rise and rise in priority. This is happening in enterprises of all sizes both in Europe and around the world. There are several reasons for this. The first is data complexity caused by an increasing number of data silos holding enterprise data. Data now exists both inside and outside the firewall in multiple different types of data store on-premises, in multiple clouds, in software-as-a-service (SaaS) applications and at the edge. This includes data stored in cloud storage, cloud, and on-premises relational database management systems (RDBMSs), cloud and on-premises NoSQL DBMSs, Hadoop Distributed File System (HDFS) and other file servers.

A second reason for the rise in priority of data governance is the amount of legislation around the world that businesses must comply to govern data privacy and data security to avoid data breaches. This legislation may be specific to a geographic region (e.g., GDPR in the European Union), a country (e.g., Italy, Germany, Australia, Japan etc.) or to a state (e.g., California Consumer Privacy Act in the US). Companies that operate internationally may need to comply with multiple data privacy laws associated with different geographies they operate in.  Also, industry governing bodies in specific countries are increasingly introducing industry specific regulations.

When you look at data complexity, data privacy legislation and industry specific regulations together, you can see why governance of data access security, data sharing and data usage has grown in importance.  Nevertheless, data sharing is becoming critical to commercial success, but the question is, how can you do that in a compliant way when dealing with a highly distributed data estate and without increasing the risk of non-compliance?

The answer lies in a new type of technology – the data marketplace.  The term data marketplace has emerged over the last few years but what exactly is it? A data marketplace is a data catalog application that governs the publishing, sharing, consumption and use of ready-made, high-quality, compliant, data and analytical products. By data catalog application we mean that they data marketplace is an application that uses an underlying data catalog to store metadata that it makes available via the marketplace application.  A data marketplace is a shop window for the enterprise that makes data products available as services. There are many examples of data products including Customers, Suppliers, Orders, Shipments, Payments etc. The data attributes comprising each data product are described using common business data names fully documented in a business glossary. In addition, full metadata lineage exists explaining how they were created, who the owner is, who the data stewards are and who else is using the data product. In addition, data products are tagged and organised to make them easy to find, access, share and reuse across the enterprise. It is not just data products that can exist in a data marketplace. Analytical products such as BI reports, dashboards and machine learning models are increasingly also being published to encourage sharing.

The operation of a data marketplace is shown below.

We are seeing different types of data marketplaces appearing. Public data marketplaces are already here. Examples would be the Snowflake Marketplace and the Databrick Marketplace. These contain already loaded, business ready, public data products such as markets data products from financial data providers like Bloomberg, Standard and Poors and many others. Public data marketplaces are governed by the vendor offering them so there is no burden on business using this business ready data except to agree to the terms and conditions before consuming that data. It is the responsibility of the public data marketplace vendor to ensure data products are always kept up to date. 

The other type of data marketplace is an internal enterprise-wide data marketplace. This publishes internally produces business ready data products available for sharing around the enterprise and beyond.  There is no reason why the same underlying technology cannot support both public and internal data marketplaces.

But this is more than just software that acts as a shop window for sharing business-ready data. It is software that offers the ability to fully govern the sharing of data and analytical products. That means, it provides processes to govern:

  • The approval and publishing of data products available for consumption
  • Declaration of ownership of data and analytical products to be shared
  • The creation and maintenance of policies and terms of use that govern the sharing of data
  • The creation, routing, and collaborative approval of requests to access and consume data and analytical products by owners and other authorised personnel before allowing access
  • Consumer acceptance of terms of use governing the use of data and analytical products before allowing consumers access to the data or analytical product requested
  • Control of delivery of shared data products especially if it contains personally identifiable data
  • Auditing, monitoring and usage tracking of shared data and analytical products

Governance of data sharing is critical to protect sensitive data, uphold data sovereignty, and also for data access security and loss prevention.

The enterprise data marketplace should inform the user as to what trusted data products are available. This can be done by organising data and analytical products into related groups and also by creating a taxonomy to enable faceted search to find what is available for use.  Typically, a data marketplace should allow you to:

  • Search for data products via search box using business glossary terms, faceted search, and other filters e.g., ratings, data products with anonymised sensitive data
  • Determine if the data or analytical product is high quality by being able to see a quality score, a common vocabulary to explain data meaning, lineage to see how it was created and a consumer rating
  • Determine if data products containing sensitive data have been anonymised
  • Determine if it is of high business value using reviews and ratings
  • See recommendations of any other related data or analytical product that you should know about
  • Monitor and report on all data sharing approvals, denials, and deliveries
  • Monitor all data requests and shares for each consumer
  • Maintain an audit history, with reporting and analytics on data consumption patterns

Besides governing data sharing, it is obvious from the discussion of data products that the data marketplace fits very well into the implementation of a Data Mesh where data producers in business domains can use data fabric software to create data integration pipelines that, when executed, produce data products that can be made available in a Data Mesh. In that sense the Data Marketplace is the user interface whereby data consumers can find, request access, and consume data products in a Data Mesh.

If you would like to know more about Data Marketplaces and how they can be used both in a Data Mesh and in Data Governance, please join me on my Practical Guidelines for Implementing a Data Mesh, my Data Warehouse Modernisation and my Centralised Data Governance of a Distributed Data Landscape classes with Technology Transfer in April, May, and June.