By Chris Date

March 2013

Leonardo was Right!

Would you want to fly in an airplane that hadn’t been constructed in accordance with solid principles of mechanical engineering and aeronautical science?

Would you want to live or work in a high rise building that hadn’t been constructed in accordance with solid principles of architecture and material science?

Would you want to run your business on a database that hadn’t been constructed in accordance with solid principles of computing and database science?

Well, your answer to the first two of these questions is surely no. But what about the third one? Clearly the answer there ought to be no as well. To judge from current practice, however, all too often it’s yes. Why is this? What accounts for this state of affairs? Maybe it’s because all too many people are simply not aware that the pertinent scientific principles exist. By way of evidence for this suggestion, consider the following quotes. (They’re are all taken, with acknowledgments, from Fabian Pascal’s website, but I’ve omitted the precise sources in order to protect the guilty. I’ve also taken the liberty of editing some of them slightly for reasons of readability.)

“I have done an Entity-Attribute-Value database design before. The big advantage is that much of your logical model is stored as data rather than as schema, so changes to the logical model can be made without changing the schema. And if you write your [stored procedures] correctly, you won’t need to change them either. The drawback? VERY complicated SQL code. The main reason I used this was for a client that did not have a clear understanding of their own requirements, but needed a working application on a deadline.”

“[In my opinion] databases are supposed to be dumb storage, they are intended to manage the storing of data, not business logic … Let [the] database be dumb and [the] application smart, after all the application is a model of the business process, the database is just storage to support and expedite the data being pulled into business objects.”

“[Data] is best kept closest to its natural source rather than at the intersection of a row and tuple [sic!] of a database.”

“[The trouble with relational databases is that] you need to adapt the table structure every time you want more data …With RDF you can just pour data into your triple store, and then work with it immediately. This is much more agile.”

“SQL is inadequate to the task of fully utilizing the RELATIONAL MODEL because it doesn’t have any way to EXPRESS RELATIONSHIPS other than “table”. Isn’t this a limitation of the relational data model itself? I think only the entity-relationship model can distinguish between relationships and other data. Once you convert your model to the relational data model, you’ve lost that distinction. This is not an SQL-specific problem. At least this is what we learned at our university on a database course.”

“Question: I am a novice user of Microsoft Access. I typically do not use Primary Keys. Everything I read indicates that the primary key is used mainly to speed up the search of a database. I am currently just using two small tables which contain a common record ID. I am doing queries on these tables and I have not discovered any errors. Based on what I have seen I still obtain the correct information from my queries. Is it critical that I use a primary key or is [it just] a ‘nice to have’ with large databases?
Response: Establishing a primary key is never necessary even in a large database. There are advantages to having a primary key—mainly that Access will make sure that the field will contain only unique values. This will stop a user from accidentally entering a duplicate record. Access will also make sure that the primary key contains a value—again this is necessary to keep each record uniquely identified.”

“Relational databases are basically two-dimensional matrices with pointers … high-class spreadsheets. Object and XML databases are more likely to accept the architectures and data models you’re working with every day, and are more adaptable, flexible, and customizable: You’ll be able to apply extreme programming principles and add fields as you go, connect to other systems and maintain focus on your data model and your project, not that of the database and the best possible model that fits into that database. Before choosing to ignore this suggestion and go back to relational databases, struggling to fit your designs into its data model, consider XML and object databases. Import/export from/to RDBMSs is straightforward, and has been set up in real time to truly garner the design/maintenance benefits of OODBMSs while retaining the atomicity and existing relational tools without sacrificing data quality and currency.”

“Question: I never had to have database design explained or taught to me. It was just so intuitive, so obvious, so simple, that there was no other way you could do it. That’s my problem. My brain can’t even comprehend the idea of any other database structure design. I’m really looking forward to learning something new [viz., data warehouse], whose fundamental concepts I can’t even grasp right now…
Response: The first rule of data warehouse is: Forget everything you think you know about data modeling.”

“To deliver great reports fast without having to have a team of PhD statisticians pulling all-nighters, [we] just had to do one trivial, minor, insignificant thing. Scrap the whole ‘the database is at the heart of everything we do’ mentality. To quote [a certain] website: Stop architecting the database! Applications tied directly to relational databases are costly to develop and maintain and, while once considered state-of-the-art, are by today’s standards notoriously lacking in sophistication and adaptability.”

Now, I could spend a long time deconstructing these quotes in detail, but I don’t think that should be necessary; I think the overall message is pretty clear. To spell it out: There’s a vast amount of ignorance out there in the industry. Indeed, I have another quote that sums up my feelings in this matter pretty well. It’s from a letter in the July 2011 issue of Communications of the ACM (Volume 54, Number 7). Note: The writer is referring specifically to “object/relational mappings” and (another currently fashionable piece of nonsense) “NoSQL systems,” but I think his remarks are actually of wider applicability.

“Object Relational Mappings and NoSQL systems attempt to solve (through technical means) a nontechnical problem: reluctance of talented people to master the Relational Model, and thus benefit from its data consistency and logical inferencing capabilities. Rather than exploit it and demand more relational functionality from DBMS vendors, they seek to avoid and replace it, unwittingly advocating a return to the fragile, unreliable, illogical systems of the 1960s, minus the greenbar fanfold paper.”

The fact is, there’s a great deal of good theoretical science available for addressing “the database problem,” but there doesn’t seem to be much evidence that it’s widely used in practice. Indeed, many people seem deliberately to avoid that theory; there seems to be a widespread perception that if something is theoretical, then it can’t be practical. (“Don’t bother me with that theoretical stuff, I’m a practical person.”) My own position is exactly the opposite: namely, that if something isn’t based on solid theory, it’s likely to be very unpractical indeed (not least because it’ll probably be hard to understand). And I’d like to close this brief article with an appeal to Leonardo da Vinci, who wrote the following (some 500 years ago!):

Those who are enamored of practice without theory are like a pilot who goes into a ship without rudder or compass and never has any certainty where he is going. Practice should always be based upon a sound knowledge of theory.

Leonardo was right.