Creating Re-Usable Data Products for Analytics
Data Lake vs. Lakehouse vs. Data Mesh
Most companies today are storing data and running applications in a hybrid multi-Cloud environment. Analytical systems tend to be centralised and siloed like Data Warehouses and Data Marts for BI, Hadoop or Cloud storage Data Lakes for Data Science and stand-alone streaming analytical systems for real-time analysis. These centralised systems rely on Data Engineers and Data Scientists working within each silo to ingest data from many different sources, clean and integrate it for use in a specific analytical system or Machine Learning models.
There are many issues with this centralised, siloed approach including multiple tools to prepare and integrate data, reinvention of data integration pipelines in each silo and centralised data engineering with poor understanding of source data unable to keep pace with business demands for new data. Also Master Data is not well managed.
To address these issues, new Data Architectures have emerged attempting to accelerate creation of data for use in multiple analytical workloads. Data Mesh is a decentralised Data Architecture with domain-oriented data ownership and decentralised self-service data engineering to create a mesh of data products serving multiple analytical systems.
Also, Data Lakes can be used for the same thing and integrated with Data Warehouses or Lakehouses so lower latency data products can be created once and used in streaming analytics, Business Intelligence, Data Science and other analytical workloads.
This 2-day class examines the strengths, and weaknesses of Data Lakes, Data Mesh and Data Lakehouses and at how multiple domain-oriented teams can use common data infrastructure software to create trusted, compliant, reusable, data products in a Data Mesh or Data Lake for use in Data Warehouses, Data Lakehouses and Data Science to drive value.
The objective is to shorten time to value while also ensuring that data is correctly governed in a decentralised environment. It also looks at the organisational implications of these architectures and how to create sharable data products for Master Data Management and for use in multiple analytical workloads. Technologies discussed includes Data Catalogs, self-service Data Integration, Data Fabric, DataOps, Data Warehouse Automation, Data Marketplaces and Data Governance platforms.
What you will learn
- Strengths and weaknesses of centralised Data Architectures used in Analytics
- The problems caused in existing analytical systems by a hybrid, multi-Cloud data landscape
- What is a Data Mesh a Data Lake and a Data Lakehouse? What benefits do they offer?
- What are the principles, requirements, and challenges of implementing these approaches?
- How to organise to create data products in a decentralised environment so you avoid chaos
- The critical importance of a Data Catalog in understanding what data is available as a service
- How business glossaries can help ensure data products are understood and semantically linked
- An operating model for effective federated Data Governance
- What common data infrastructure software is required to operate and govern a Data Mesh, a Data Lake or a Data Lakehouse?
- An implementation methodology to produce ready-made, trusted, reusable Data Products
- Collaborative domain-oriented development of modular and distributed DataOps pipelines to create data products
- How a Data Catalog and automation software can be used to generate DataOps pipelines
- Managing data quality, privacy, access security, versioning, and the lifecycle of data products
- Publishing semantically linked data products in a data marketplace for others to consume and use
- Consuming data products in an MDM system
- Consuming and assembling data products in multiple analytical systems to shorten time to value
- What is a Data Mesh, a Data Lake and a Lakehouse? Why use them?
- Methodologies for creating Data Products
- Using a Business Glossary to define Data Products
- Standardising development and operations in a Data Mesh, Data Lake or Lakehousehouse
- Building DataOps Pipelines to create Multi-Purpose Data Products
- Implementing Federated Data Governance to produce and use compliant Data Products