Open Source Business Intelligence: low cost, high value?
Every day, a growing number of organizations are attracted by the promise of Open Source (OS) software: low-cost full-featured solutions that will help drive down the total cost of ownership of an IT infrastructure. Software like Linux, OpenOffice, MySQL and Firefox are considered mainstream solutions nowadays and are being widely adopted. But how about solutions for Business Intelligence (BI) ? Are BI Suites like Pentaho or JasperSoft already mature enough to compete with the established proprietary vendors? Can MySQL be used as an analytical or a Data Warehouse database and if not, which alternatives are available? Which open source business intelligence tools are available for reporting, OLAP, analytics, corporate performance management? What are the costs and pitfalls that should be taken into account? All these question were answered during a two day Technology Transfer seminar in December. A short abstract is given in this article.
Open Source BI maturity
Answering the first question regarding the maturity of OS BI suites is easy: we don’t have to look any further than for instance Unionfidi S.C., one of Italy’s premier financial institutions, facilitating credit access to small- and medium-sized businesses. Unionfidi decided to replace all their existing BI and reporting solutions with an open source BI suite in 2006. Or take for example the Italian Ministry of Health which chose an open source BI suite to develop a new decision support system. Many organization, both private and public, are currently deploying OS BI solutions like JasperSoft, Pentaho or SpagoBI. These three leading OS BI suites offer a full range of BI capabilities, ranging from ETL to ad-hoc analysis and reporting. SpagoBI has the added benefit of being an Italian product, developed and supported by Engineering, one of the largest Italian system integrators. Although these suites share many common characteristics, there are some notable differences too. Both JasperSoft and Pentaho offer a Community (CE) and a Professional or Enterprise edition (PE), where the CE version is fully open source and PE offers additional, closed source components. For the PE version these companies offer various subscription contracts which not only cover additional features for dashboards and ad-hoc reporting, but also a very good level of support. In fact, Gartner Group, a well known international analyst firm, states in their most recent BI Magic Quadrant report that “both Jaspersoft and particularly Pentaho scored strongly on the customer support question – higher than any of the megavendors”. This does, however, not mean that these suites are fully on par with their closed source counterparts; the suites from e.g. IBM-Cognos, SAP-Business Objects, Microsoft, Oracle and SAS are still more feature rich and advanced than their open source competitors. But, as Gartner stated in a recent research paper: “while commercial vendors are still far ahead in their overall technology capabilities, open- source adoption increases, because it is often considered ‘good enough’ ”. Gartner therefore expects a five fold increase in open source BI adoption through 2012.
Integrating and querying data
A typical BI solution does not access data in a source system like an ERP package or a financial application directly but usually contains an intermediate database or data warehouse which is better optimized for analytical workloads such as ad-hoc reporting and analysys. In order to get data from various source systems into a central data warehouse, a data integration tool is needed. This class of tools is commonly referred to as ETL tools, short for Extract, Transform and Load. Again, there are many enterprise grade open source ETL tools available such as Kettle (part of the Pentaho BI suite), Talend (used within JasperSoft where it is called ‘JasperETL’), Jitterbit, Snaplogic or CloverETL. Since traditional closed source ETL tools tend to be very expensive and open source tools often offer a comparable feature set, this is an area where cost savings can be made very quickly and easily.
The cornerstone of a BI solution and critical to the success of a BI project is a high performing database. Many open source BI implementations start by using the well known MySQL database, only to find out a little later that is doesn’t offer the performance needed for answering typical BI queries. MySQL is an excellent database for transaction processing and perhaps smaller data warehouses or data marts, but not for serving as an enterprise data warehouse or analytical database. This is why open source vendors Infobright and Calpont have developed specialized database solutions that look and work like MySQL from the outside but have a completely different storage engine to support BI workloads. Kickfire takes this concept even further and has developed a special SQL chip for handling all the queries. Kickfire too is based on MySQL, offering a familiar interface and toolset for developers and DBA’s, while at the same time delivering stellar performance. Of course the choice in open source analytical databases is not limited to the MySQL based solutions; MonetDB, Ingres/VectorWise and LucidDB are also belong to the category of very fast analytical database solutions. What makes these databases so fast is their storage engine which is based on a column based format instead of the row based format used in more traditional databases like Oracle or SQL Server. That, combined with special indexing and compression techniques and sometimes efficient in memory processing makes these databases unbeatable when it comes to raw query performance. The importance of query performance shouldn’t be underestimated, as is shown in the yearly BI Survey conducted by renowned analyst Nigel Pendse. Year after year this research shows a strong causal relationship between query performance and user acceptance.
What about Excel?
Although many BI solutions are built around a complete suite or ‘stack’ consisting of a database, an ETL tool, a reporting and analysis tool and sometimes even data mining and statistical analysis tools, there is one toolset available that takes a little different approach. The idea behind this toolset is the fact that the most widely adopted tool for analysis and reporting is of course Excel! But as great and powerful it is, Excel has a couple of serious drawbacks too. The biggest drawbacks are the lack of a high performance centralized data store, limited security options and cumbersome version management. So what if there would be a tool that lets you use Excel as a budgeting, analysis and forecasting front end and at the same time takes care of the drawbacks of Excel? That’s exactly what Jedox Palo does. Palo (OLAP spelled backwards) offers an in memory multidimensional database on the server side, complemented with client tools for building databases, develop models and manage security. These client tools are all part of an Excel (or OpenOffice) add-in, resulting in a complete spreadsheet based BI solution. Getting data into the database can be done directly by entering it via a spreadsheet, or by loading it using PaloETL, the custom ETL solution developed by Jedox. The crown jewel of the suite is the Worksheet Server, which is capable of turning a PC based Excel spreadsheet into a powerful multi-user web application with all the bells and whistles of the original local spreadsheet.
It should be clear by now that open source BI is here to stay; there are many good products to choose from, support is generally excellent and most of the solutions mentioned in this article are easy to set up and use. Many organizations are able to save costs by implementing or switching to open source solutions, but what might be even more important: open source also means that the software can be tailored to exactly meet the requirements of an organization.