December 2021
Upcoming events by this speaker:
May 26 – May 27 2022:
Modern Application Architectures
Jun 20 – Jun 21 2022:
Real-Time Big Data Systems with Spark Streaming and Kafka
What It Looks Like When a Team Is Missing
Data teams require all of their parts to be complete and succeed. When one of the teams of a data team is missing, the other teams will suffer.
Often, organizations or team members don’t understand what’s happening when a team is missing. They blame themselves or technology for their perceived issues. Instead, the problems are much more profound and require more fundamental changes to fix. I wrote Data Teams to help management understand and uncover how successful data teams work.
The Data Teams
A common misconception with data teams is that you only need one team (data science) to do everything. The reality is that you need three different teams to be successful with big data. Each team compliments and fills in the gaps of the other teams.
The data science team is responsible for doing advanced analytics. This includes creating machine learning models.
The data engineering team is responsible for the software engineering and architecture required to run analytics at scale. Data engineering are the creators of data products that the rest of the organization starts using.
The operations team is responsible for keeping everything running smoothly. This day-to-day operational excellence makes it so business and end customers can rely on the system.
Given the need for all three teams, what does it look like when one is missing? Will you see your organization in this mirror?
Missing Data Engineering
Lacking the data engineering team means there isn’t the software engineering rigor that needs to part of big data projects.
A lack of data engineers means that the data scientists are acting as data engineers. Data scientists lack the software engineering skills needed for large-scale projects. The lack of software engineering skills commonly manifests as the data scientist’s code and models being “extremely inefficient” and “not robust.” It also means that the data scientists made their own technology and architectural choices. These choices lack a sound engineering basis and often use the wrong technology for the job.
For the operations team, a lack of data engineers means the data science code isn’t operationally sound. The data scientist’s code has trouble at scale, takes far too to run, or doesn’t run at all. The operations team spends its time trying to plug the leaks with their fingers caused by poor coding instead of improving operational efficiency.
Missing the data engineering team is the most common mistake organizations make. The lack of data engineering teams are because the industry put an undue focus on data scientists and didn’t fully understand their abilities. Even after a data engineering team is put in place, it will have to spend a significant amount of time dealing with the technical debt created by the data scientists.
Missing Operations
Data teams without an operations team create a system that customers – whether end customers or internal business customers – can’t rely on. Whenever these customers use the system, they’re wondering if the system will work this time or how long it will be down. The adoption of the system is severely inhibited because of the constant operational issues.
A lack of operations means the data engineers act as the operations engineers. Often, data engineers aren’t well-suited to being operations engineers. They will take longer and create a less-stable system than an operations engineer would. A lack of operations means the data engineer’s time is going into constant operations issues instead of writing code.
The notable exception to a lack of operations teams are organizations that practice DevOps. In these cases, the data engineering team is doing both software engineering and operations. Organizations looking to do DevOps with big data systems will need to understand the operational rigors placed on a team that does both the coding and operations.
Poor operational excellence inhibits the data science team too. A model with poor uptime or performance means the model isn’t performing at the level expected. For example, the model may fail so often that the end customer rarely sees the model’s results, and the fallback option is always used.
A missing operations team is somewhat common. It usually comes from misunderstandings on the continued – and elevated with scale – need for operational excellence when dealing with big data.
Missing Data Science
Data teams without a data science team are limited in the value and sheer complexity of analytics that the team can create. The missing data science team means that the data engineering team or a data analytics team is trying to handle advanced analytics.
While the data analytics team is not a member of the data teams, they may have the mathematical background to create advanced analytics. However, they will lack the programming and technical skills to carry it out effectively. Many data analysts have rudimentary SQL skills or GUI analytics programs; the next level of analytics require intermediate-level programming skills. The lack of programming skills will keep data analysts from doing data science.
Some data engineers will have a mathematical background but the vast majority have a computer science background. This lack of a hardcode mathematical and statistical background limits the complexity of their analytical skills.
While missing a data science team is comparatively rare, I question the value of creating data teams – with all of their related expenses – without making sure the last layer of value creation – the data science team – is present.
What To Do?
You may have seen your team or organization as missing one of the teams I’ve talked about. You may be experiencing the issues first-hand as the manager of a team or are the business customer who doesn’t have the right data products. I want you to know these problems don’t go away on their own, and it will take a concerted effort to fix the current issues as well as previous issues.
I invite you to read my latest book Data Teams’ chapter 10, “Starting a Team.” This chapter goes through the step-by-step process of establishing one or all of the data teams. I also encourage you to read chapters 3-5, where I go in-depth into what data science, data engineering, and operations teams do and what the required skills are.
Should you want to accelerate the process or find you need more help, my company mentors other organizations on this journey to creating successful data teams.