by Larissa Moss

September 2012

An Agile Approach to Enterprise Data Warehousing and Business Intelligence

There is unanimous agreement among Agile authors, experts, and practitioners that Agile software development methodologies work for small stand-alone systems with self-motivated developers and a participating user. However, there is considerable disagreement among the same experts whether Agile can work for any and all types of projects. What about extremely large projects with dozens of people on the team? Or projects that are so complex or so regulated that they cannot be dissected into multiple smaller releases? What about highly interdependent projects or highly interdependent resources? What if the user won’t accept time and effort speculations one iteration at a time, but insists on estimates cast-in-concrete for the entire project? What if the user refuses to participate in project activities? All these complexities exist on EDW/BI projects, which leads to the question whether an Agile approach can or cannot be used with EDW/BI projects.

How is Agile different from Waterfall and Spiral methodologies?
But before we answer the question whether or not an Agile approach can or cannot be used with EDW/BI, let’s first examine the major differences of these three categories of methodologies.

Waterfall methodologies were developed in the 1970s for managing operational systems projects. These methodologies are organized by phases that follow traditional engineering practices: planning, requirements, analysis, design, construction, and deployment. Each phase must be completed before the next phase can begin. The majority of development time is spent on paper, creating a requirements document, external design models, internal design specifications, and so on. Even with operational stovepipe systems, this type of methodology has been a problem because estimates are highly unreliable since each system is different, each project team is different, and each set of users is different. In addition, users don’t see their system until acceptance testing, at which time they frequently notice errors and omissions that have to be corrected with future enhancements.

Spiral methodologies became popular in the 1990s to support building large systems iteratively. These methodologies are popular in enterprise data warehousing where we build the EDW one BI application at a time. This type of methodology has an enterprise perspective. That means that spiral EDW methodologies have many additional tasks that need to be performed and some of these tasks involve stakeholders other than the primary user of the BI application. But, with the exception of developing the EDW in iterations, spiral methodologies basically still follow a waterfall approach within the iterations.

Agile methodologies started to become widely published and promoted in the 2000-2001 timeframe by developers of operational systems. These methodologies do not recognize a service request for a new system to be the final set of requirements. Instead, the developers view the service request as a vision for a system that may or may not end up looking the same when it is finally delivered. With the participation of the user, the developers dissect the requirements into desired features, which are put on a product backlog. The user (not IT) controls the product backlog where he or she can add or remove features at will. The user is also responsible for prioritizing the features on the product backlog. The developers select a few features from the prioritized list for the first (or next) sprint (software release). Rather than come up with estimates that are cast in concrete, the developers speculate how long it might take to turn the selected features into working code based on what is known to them at that point in time. Progress for developing the requested features is measured by the number of features delivered and not by the number of tasks performed. When it becomes evident that the trajectory of current (used up) effort in time will miss the deadline, the project is immediately re-scoped.

Scrum and XP
Two of the most popular Agile software development methodologies are Scrum and XP.
Scrum is a term borrowed from Rugby and XP stands for eXtreme Programming. The authors of these methodologies, as well as most other prominent Agile practitioners, are project managers and seasoned developers with decades of experience in developing stand-alone operational systems – most written with object-oriented code. They are not EDW/BI practitioners, and thus, Scrum and XP were not developed specifically for enterprise data warehousing. Writing software to create stand-alone operational systems does not require data integration efforts like data standardization, enterprise data modeling, business rules ratification by major business stakeholders, coordinated ETL data staging, common metadata, collectively architected (designed) databases, and so on. Instead, the basic premise behind Scrum and XP is to write and deliver quality software (code) in short prescribed intervals, but inherently without significant regard or focus on data standardization and data architecture from an enterprise perspective.

Can Agile be used for BI?
That brings us to the next question: Can Agile be used for BI? Well, that depends on what you call BI. There are a growing number of companies that boast to be using Agile methodologies on BI projects. My research shows that most of those companies restrict their development effort mostly to writing code for stand-alone BI applications. In other words, the BI application developers don’t deal with data standardization and integration – or at least not very effectively and rarely from an enterprise perspective. Many complain about the dirty data negatively affecting their aggressive deadlines, evidently not realizing that cleaning up dirty data, standardizing data, and integrating data across the enterprise are – or should be – three key objectives of delivering BI. However, as long as the primary goal is to build separate BI solutions for individual users or departments, the popular Agile software development methodologies like Scrum or XP can certainly be made to work.
Some BI teams try to wait for the data to be ready in the EDW (placed there by a separate EDW team) before they develop selected BI features using Scrum or XP. Many companies using this approach have gone so far as to separate their BI team from their EDW team and have both teams report to different managers. This organizational change not only disrupts the cohesion of the total EDW/BI effort, but also creates an unfair competition and ill feelings between the two teams. I hear BI teams complain bitterly about their EDW team being too slow, and I hear the EDW teams complain bitterly about their BI team not understanding their data efforts and thus having unreasonable expectations about the speed at which cleansed, standardized, and integrated data can be loaded into the EDW. I also see many of the BI teams trying to force their counterpart EDW teams to adopt Scrum or XP. Most EDW teams resist, recognizing that their projects are data-intensive and not code-intensive, and that the prescribed Agile rules in Scrum and XP cannot work for them. Other EDW teams try to adhere to the strict rules of these Agile methodologies and fail.

Can agile be used for EDW?
That brings us to the core question: Can agile be used for enterprise data warehousing? Let’s first agree on what we mean by enterprise data warehousing.
If your definition of BI includes building or expanding the necessary EDW components and having that effort be part of every project that delivers BI applications, and if you want to apply an Agile method to building the entire end-to-end solution (including data cleansing, data standardization, enterprise data modeling, coordinated EDW ETL, and meta data repository), then – in my opinion – using the popular Agile software development methodologies Scrum and XP will not work. Remember that these methodologies were never designed for data-centric business integration projects. However, that does not mean that you cannot go Agile.

Extreme Scoping™
Shortly after publishing my Business Intelligence Roadmap methodology, I developed an EDW-specific Agile approach called Extreme Scoping™. It includes all the business integration activities that are so vital to EDW projects. Extreme Scoping™ uses all of the agile principles that can be used on business integration projects and discards those agile principles that don’t apply. It does not seek to replace the Agile coding methodologies Scrum and XP. Instead, it provides the necessary Agile EDW umbrella for the entire project effort, not just the coding.
Extreme Scoping™ has several distinct project planning steps, which are performed by a 4-5 member core team, not by a single project manager. The core team members start out by reviewing their EDW methodology and selecting the tasks into a preliminary WBS. Using this WBS as a guide, the core team members create a high-level project roadmap to give an understanding of the overall effort, resources, cost, schedule, risks, and assumptions for the entire new BI application. This is necessary in order to come up with the right number of software releases, the right sequence of those releases, the dependencies among the requirements, and thus, the deliverables and scope for each release. Without this crucial step, the process of breaking an application into software releases would be completely arbitrary.
Once the core team members are comfortable with the scope and sequence of the proposed software releases and are confident that each software release can be accomplished within the allotted time-box (deadline), they create a detailed project plan with weekly milestones for the first software release. Starting with the deadline and working backwards, the core team members determine how far along they must be the week before the deadline in order to make the deadline. Put another way, they determine in what state the project or deliverable must be the week before the deadline. They repeat this process by backing up another week and another week and so on. If they pass the project start date, the core team members must determine if the scope is too large for the release deadline or if the activities between the milestones are overestimated.
After the project activities for the first software release are organized into weekly milestones, the core team members self-organize themselves into the appropriate number of work teams. Knowing the makeup of the work teams and knowing the weekly milestones, the core team members decide on the detailed tasks and task deliverables for each milestone, referring to the work breakdown structure they created earlier. They also decide which tasks and deliverables are assigned to what person on what work team. The detailed daily task assignments and task deliverables are documented on a white board, a flip chart, a spreadsheet, or other informal media, which can be modified quickly and easily. The core team members use this informal detailed project plan on a daily basis to guide the day-to-day work activities, manage the change control process during prototyping, and monitor the progress of the project. They do not use this detailed plan to report the project status to management. Instead, they create a short one-page Milestone Chart showing whether weekly milestones have been completed, delayed, or eliminated.
If the first software release was completed on time and without problems, the core team members can plan the second software release in the same manner. However, if there were problems with the first software release, such as underestimated tasks, incomplete deliverable, friction on the core team, constant adjustments to the scope, and so on, the core team members must review and adjust the high-level project roadmap produced in the first step. They must revisit their understanding of the overall effort, resources, cost, schedule, risks, and assumptions for the entire application. Then they must make the necessary adjustments to the remaining software releases. That can include changing the scope for the second software release, changing the number of software releases, reprioritizing and changing the sequence of the software releases, changing the deliverables for one or more software releases, changing the deadlines, or changing resources. Only then can the core team proceed with the detailed planning of the second software release.

In summary, Extreme Scoping™ is an EDW-specific Agile project planning process, which is based on my robust methodology Business Intelligence Roadmap. It uses all of the Agile principles that work for EDW/BI projects, and it does not force you to use other Agile principles that do not work for EDW/BI projects.