Technology Transfer - since 1986
Leading Edge Information Technology Education
First Class Speakers
Top Venue
A splendid location in the center of Rome close to Piazza del Popolo, Piazza di Spagna, Trevi Fountain, Piazza Navona and the Pantheon
Online Events
Due to time zones, events presented by American speakers will be spread over more days, and will take place in the afternoon from 2 pm to 6 pm Italian time
Data Quality: A “must” for the Business Success
ONLINE LIVE STREAMING
Apr 08 - Apr 09, 2024
By: Nigel Turner
Designing, developing and deploying a Microservices Architecture
ONLINE LIVE STREAMING
Apr 12, 2024
By: Sander Hoogendoorn
Practical Guidelines for Implementing a Data Mesh
ONLINE LIVE STREAMING
Apr 15 - Apr 16, 2024
By: Mike Ferguson
Embedded Analytics, Intelligent Apps & AI Automation
ONLINE LIVE STREAMING
Apr 17, 2024
By: Mike Ferguson
Artificial Intelligence, Machine Learning and Data Management
ONLINE LIVE STREAMING
Apr 18 - Apr 19, 2024
By: Derek Strauss
Upcoming Events
Brief History of Data Engineering
In the beginning, there was Google. Google looked over the expanse of the growing internet and realized they’d need scalable systems. They created MapReduce and HDFS in 2004. They published the papers for them in the same year.
Doug Cutting took those papers and created Apache Hadoop in 2005.
Cloudera was started in 2008, and HortonWorks started in 2011. They were the first companies to commercialize open source big data technologies and pushed the marketing and commercialization of Hadoop.
Hadoop was hard to program, and Apache Hive came along in 2010 to add SQL. Apache Pig in 2008 came too, but it didn’t ever see as much adoption.
With an immutable file system like HDFS, we needed scalable databases to read and write data randomly. Apache HBase came in 2007, and Apache Cassandra came in 2008. Along the way, there were various explosions of databases within a type, such as GPU, graph, JSON, column-oriented, MPP, and key value.
Hadoop didn’t support doing things in real-time, and Apache Storm was open sourced in 2011. It didn’t get wide adoption as it was a bit early for real-time, and the API was difficult to wield.
Apache Spark came in 2009 and gave a unified batch and streaming engine. It gained in usage and eventually displaced Hadoop.
Apache Flink came in 2011 and gave us our first real streaming engine. It handled the stateful problems of real-time elegantly.
We lacked a scalable pub/sub system. Apache Kafka came in 2011 and gave the industry a much better way to move real-time data. Apache Kafka has its architectural limitations, and Apache Pulsar was released in 2016.