Jesse Anderson

By Jesse Anderson

September 2020

Upcoming events by this speaker:

Oct 26 – Oct 27 2020:
Big Data for Managers

Nov 2 – Nov 5 2020:
Real-Time Big Data Systems with Spark Streaming and Kafka

Big Data and Analytics in the COVID-19 Era

Big Data and a Analytics are going to change in this COVID-19 era. I want to share with everyone the same messages that I’m giving my own clients. I’m hoping that this post will help those data teams that aren’t living up to their potential to start creating real value.

Focusing on Value Creation


Data teams need to stop looking for a problem to solve and solve a real business problem. The days of just looking for a business problem to solve or just storing data in the hopes the data team will eventually create value are gone (this shouldn’t have been possible in the first place). If teams aren’t creating any business value, management needs to go back to the business to see how existing and emerging problems can be solved with data.

We need to be focusing on creating models that optimize or improve efficiency. These models should save the company money or improve a process within it. Some examples would be models for pricing optimization, inventory management, ad spend, or customer acquisition. Some organizations have been delaying the deployment of a model to production until there is a higher improvement in optimization over an existing model or even the lack of a machine learning model. Pre-COVID-19 era, this sort of delay was feasible until the model improvements met the data science team’s approval. In this COVID-19 era, even marginal improvements could be the difference between a company staying afloat versus layoffs. The models could be the driver for some improvements that weren’t critical before but are vital now.

While deploying models is all well and good, the organization is still constrained by the quality of its data. If it hasn’t already, the data engineering team needs to be creating value with high-quality data products and correct infrastructure so the data scientists can create meaningful models.

Implications of Working From Home

As part of many organizations’ responses to COVID-19, the employees are starting to work from home. For most IT-related employees, that means working with code on a laptop. For data teams, we deal with both code and data. Data teams need to be cognizant of the security and nature of the data that could be on their laptops. This data could range from public datasets all the way to PII (Personally Identifiable Information).

If organizations do decide to copy data to laptops, processes must be put into place to prevent uncontrolled dissemination of data that could hurt you as much as the spread of COVID-19. For example, employees should be clearly told that data and code can’t be copied onto a personal computer or laptop. Putting data on unsecured computers could put the organization at incredible risk. At a minimum, company data should be stored on encrypted disks with a strong key or password. The laptops should have antivirus software installed, up-to-date, and running. To combat weak or non-existent firewalls for the person’s internet connection, there should be a software firewall installed.

Working from home can expose an inadequate data infrastructure. If a member of the organization feels the need to copy data locally, apart from test data, that could be a sign that the organization doesn’t have an infrastructure that properly supported its data engineers even when they worked in the office. Ideally, the path of least resistance should be to use the organization’s existing infrastructure because it is easier than copying it locally. For organizations without the right infrastructure, the path of least resistance, or even a requirement for getting the job done, is to download the data locally. Another reason that data teams will download locally is to circumvent security measures they perceive to be excessive or difficult to deal with. They might have to connect to a VPN, then SSH to another computer, the login to another website just to get the data. The data engineering team should be watching out for usage patterns and what they say about why the staff is bypassing the infrastructure.

Changes in Models

The models in production will need to be retrained or have their parameters tweaked. Over the past decade, most of the models were running with good to great economic conditions. In the COVID-19 era, they’ll need to be trained for more pessimistic economic growth.

Ideally, the economy will recover quickly. We’ll need to save these currently running models and revert back to them once the economy recovers.

Focus on Efficiency

For some organizations, minimizing the spending on compute resources wasn’t a focus in the good times. I’ve seen as high as 50% of an organization’s cloud spend being either underutilized or completely wasted. Organizations should take a look through their current usage to see what could be shut down or utilized better. They may put new processes in place to quickly identify the person who spun up some resources or the type of workload running on a cluster.

At some organizations, the move to a cloud has been put off. Moving to a cloud provider could allow for efficiency gains that aren’t possible with an on-premises cluster. Some organizations don’t understand what they could save because they looked at cloud efficiency gains purely from an IT perspective. In this perspective, most programs run 24/7 and can’t gain as much efficiency. For analytics, the demand can be spiky. During the workday, the cluster is used heavily. After the workday is over, the cluster is virtually unused. Use cases such as this are ripe for the efficiency gains that only the cloud can offer. In my experience, analytics and big data use cases can leverage the efficiencies gained from the cloud the most when compared to the rest of the organization.

Efficiency gains and losses can be achieved through new technologies. If an analytic is inefficient and the user spends large amounts of idle time waiting for the query to finish, new technology could completely change the efficiency of the person or the entire team. Adding a new technology could lead to losses of efficiency where the team seeks to operationalize a new technology that wasn’t improving a specific business need. Adding unneeded technologies shouldn’t be done and management may want to reevaluate their project roadmap to really establish the business need for new technology. Operationalizing a technology could even lead to downtime and loss of customer goodwill. The data engineering team should be cognizant of the potential pros and cons of adding new technologies.

Workforce Reductions

 

Some organizations may be forced to make the difficult choice to have workforce reductions. If there is a reduction in workforce, data teams tend to have large amounts of tribal knowledge. This tribal knowledge could cause how something works or issues to be lost because that person was let go. Managers should take into account how well a data pipeline is documented and functioning.

In my forthcoming Data Teams book, I ask managers to think about a hypothetical situation where the entire data team was fired. This is a scenario that should be thought about in the best and worst of economic times. In the worst economic scenarios, this will be an actual discussion from high-level managers, although they may not find it necessary to lay off the entire team. In the best economic scenarios, this exercise provides a metric for the business value created by data teams.

In the book, I show that there are generally four levels of value created by data teams. Instead of asking the data teams how much value they create, I ask the business how much value data teams created for them. Here are the 4 general responses from the business:

  1. The data team is creating the most value. The business leaders will give a vehement, “No way!” The business is so opposed to making any changes to this lifeblood of data that’s creating incredible business value. Making a slight change or removing the teams altogether would affect their day-to-day usage of data products and, ideally, decision making. These projects and teams are creating extreme value for the business.
  2. The project is creating minimal value. The business leader’s reaction is “meh”. Their ambivalence shows that the business isn’t really using the data products on a day-to-day basis.
  3. A stagnated project that isn’t creating any value. The business’s reaction to a proposed cancellation is a snarky or pained “what project?” In such cases, managers promised the business that they could take advantage of data to make better decisions, but the data teams left this dream completely unrealized. The business has never had anything delivered into in their hands and couldn’t ever achieve any value.
  4. A project is in the planning stages. The business has been promised new features, analytics, and the fixing of previous can’ts. There is a huge amount of anticipation from the business to finally get what they’ve been asking for. Now it’s time for the data teams to deliver on these promises.

As you read through these scenarios, I invite you to take an honest look at the value created by your organization’s data teams. Any project that isn’t scenario #1 isn’t living up to its potential and faces a high risk of layoffs or cancellations.