By Mike Ferguson

November 2019

The Importance of Data Architecture In A Data Driven Enterprise

In recent years, software development practices have changed dramatically. Gone are the old ‘batch style’ ways of building monolithic software products with all the functioality built into one huge application and a new release every eighteen months.

Today we have a new agile software development approach with software made up of components (microservices) running in containers managed by technologies such as Kubernetes and deployed in serverless computing environments like the cloud. The impact on the software market has been dramatic. Component based software development combined with agile, lower cost, faster continuous development / continuous delivery methods has resulted in new software and new functionality being delivered at a pace. You just have to look at the data and analytics market over the last five years. We have seen an explosion of new software in the market including new types of database products, new data management tools and analytical tools. It’s been a technology deluge happening at a pace that many companies buying software can barely keep up with.

Many would argue that the pace has quickened so much that many companies can no longer keep up with it. There are lessons to be learned from what has happened in the software world especially when you apply them to an aspiration I have heard executives tell me so many times. That is that they want to become data driven. They want the business to be driven by the insights produced from analysing data. The problem is that they want to be data driven now. The expectation is that despite the fact that the number of data sources and data types is rapidly on the increase and so many different types of data store arenow being used both on-premises and on multiple clouds, that it should still be easy to achive this. So much so that execuutives are approving investments in cloud, data management and machine learning technologies as a matter of urgency.

The problem is that with business units now in control of spend, many companies have ended up buying multiple overlapping technologies in a rush to deliver value without necessarily knowing what workloads are best suited to what technologies all before creating any kind of data architecture that could deliver the business value they want. The result is technology overspoend, silos, slow speed of development, no metadata sharing and in some cases the wrong technology for the wrong workload. Using NoSQL document databases as a data warehouse for example, is just a poor choice.

It would be much better if companies designed the data architecture first to enable them to become data driven and deliver the business value they need and then select technologies that can be work together and be integrated to bring the end-to-end data architecture to life. In addition, if we look at the what has happened in the software market, the question is, could be do the same with data and analytics?

In other words, could be create a common data architecture and a component based data and analytics development combined with agile, lower cost, faster continuous development / continuous delivery methods has resulted in new data, new analytocs, new insights and new business value being delivered at a pace? Could this be the way to deliver data-driven?

The answer in my opinion is yes. We need to build data products (assets) and analytical assets (e.g. predictive models, prescriptive models, BI reports, dashboards, stories etc.) that for the components that we need to assemble to deliver value. It should be the case that data-driven becomes a continuous development / continuous delivery process. To do this we need to have people building data and analytical components, while other consume them, assemble them and drive value with them.

So what is required if you want to become data driven? Companies need to:

• Establish a common business vocabulary of common data names and definitions for logical data entities. This is critical to being able to build, trust, share reusable data products. It also helps people understand what data means.

• Make use of Common Data Fabric software as opposed to everyone using different tools in silos to prepare and analyse data. A stronger approach is to rationalise and use a common data fabric software that can connect to both on-premises and cloud based data stores and that allows you to create pipelines to prepare and analyse data.

• Establish a Multi-Purpose Information Supply Chain A critical success factor here is that companies should organise themselves to become data driven. That means establishing an enterprise information supply chain through which to produce business ready data products that can be published in a data marketplace from where information consumers can shop for data An enterprise information supply chain is similar to a manufacturing production process where information producers curate business ready data products for information consumers to find and use. It is a continuous development process. The whole point here is to produce trusted, commonly understood, business ready data products that can be re-used in multiple places. The benefit is that we can save information consumers considerable amounts of time because they do not have to prepare all the data they may need from scratch. Instead we create ready-made data products or assets that are already fit for consumption. An example might be customer data, product data, orders data etc. Having data ‘ready-to-go’ should therefore shorten the time to value and reduce costs. It is important to recognise the role of the data lake in an information supply chain. A data lake is too valuable to be restricted to just data science – it needs to be multi-purpose and so can be used to produce data assets that help build master data in MDM systems, data warehouses and data needed in data science

• The Need For Data And Analytical Pipelines In A Supply Chain Component based development of data pipelines is needed to accelerate delivery of business ready data in an enterprise information supply chain. Also, it should be possible to add analytics and visualisations to a pipeline even if they are developed in other tools. To do this means that it needs to be possible to add data products, analytics and visualisations to a marketplace (catalog) to maximise their ability to be reused. If this occurs, the time to value will get shorter with every new project as more business ready data, analytics and other artefacts become available

• Publish trusted data and analytical assets as services in a marketplace for consumption across the enterprise A enterprise data marketplace is a catalog inside the enterprise where people can go to shop for business ready data and analytical assets. Establihing this means you can introduce publish and subscribe based operations into an information supply chain to speed up delivery even more. This is because reusable data and analytical components gives every project a jump start in the information supply chain. By focusing on value, an information supply chain can be used to create business ready data, preductive analytics and pre-built decision services to automate actions in the digital enterprise. Information consumers can shop for and find ready made data and analytics using the data marketplace to help them deliver value. It is this availability of re-usable business ready data, preductive analytics and pre-built decision services that enables mass contibution to common business goals across the enterprise. In other words, it helps the business to become a self-propelling data-driven business

Data architecture is critical to becoming a data driven enterprise. We shall be talking about this and all its related topics at the International Big Data Conference in Rome in December 2019. I hope you can join us.