Please Model Your Databases

I'm just going to come out and say it. More than once, I've heard presumably experienced developers make claims to the contrary, and I could not state this in more absolute terms. If you don't take the time to model your database, then you have no business building databases in the first place. Everyone is better off if you leave this work to someone who knows what they are doing.

This does not mean that you should not do your best if you are a novice. It also doesn't mean that there is never a reason to use a system that makes it difficult or impossible to model your data properly (I'm looking at you, SharePoint). Nor does it mean that there is never a reason to denormalize your data.

But it does mean that if you have the means and opportunity to plan your database and your first inclination is to use some arcane black-box tool to do it for you, then stop. For the love of everything that is good and right in the world, please stop. Take your data to someone who knows what they are doing. You're probably going to end up doing this later, but by then they may not be able to save you from the damage you're almost certainly about to do to your project.

You see, planning databases is a skill you will need if you plan to create one. There is both a science and an art to creating robust and resilient databases. Since this is where your important data resides, and since your system is (presumably) just a means to transfer data between users, you should invest time to making sure it's built well instead of clumsily battering it around with a digital mallet.

Some people will read that statement and say "duh, Steve, thanks for pointing out the obvious." To this group, I say "thanks" because this article isn't directed at you (though I encourage you to read on anyway). If, however, you started thinking about your favorite black-box and how awesome it is that "it'all abstract away all of the work of modeling your database", then I am most certainly talking to you. This goes double if you can't explain and reproduce whatever algorithm that CMS uses to do this.

I say this because requirements change and systems evolve. The assumptions you make when you let your CMS generate your database structures for you are going to create problems, and these mistakes are going to bite whoever has to deal with them later — even if that person is you.

But {{ your favorite CMS }} has a really great abstraction layer...

I don't care. I really, genuinely, 100% could not care less. If you are taking advantage of a great abstraction layer to build a database that you planned out, then this article isn't about you. If you are taking advantage of a great abstraction layer because then you don't have to plan your database at all, then this is the equivalent of driving a dump-truck through your living room to drop off your new TV — because who cares how you do it, so long as your TV winds up in the right place?

Let's illustrate this point with an example. You are building a system to track attendance in a classroom. The customer wants to be able to identify which seat a student occupied on each day. You decide that the easiest way to do this is to number columns so that they're easy to parse into a grid in your code: "seat_x1_y1", "seat_x2_y1", "seat_x3_y1" and so on until all of the seats are accounted for. It's easy. It's intuitive. Anyone with even a passing knowledge of spreadsheets can figure out what each column represents. You explain this restriction to the customer, they accept the constraints, and you happily go on building your system around the assumption that everyone is on the same page.

But everyone is not on the same page. A week into the class, the teacher rearranges all of the desks from a 5x4 grid into staggered rows of 6 and 5 because that's most convenient for the class. Suddenly, your entire system is wrong in a way that simply cannot be reconciled with the layout of the classroom. Surely, after paying so much money for your system, it should be able to handle something as simple as moving two desks out of the classroom, right? Turns out that you never planned for change, and now it's burning you and ticking off the customer. So what now?

I'll just update the models in {{ your favorite CMS }} and it'll automatically...

Stop right there. That's exactly the kind of thought that got you into this mess in the first place. Let's say you restructure the database to accommodate the new setup. How are you going to differentiate between the old 6x4 data and the new 5-4 data? Your new database isn't going to be accurately representative of the old content it used to house. This is going to impact your ability to retrieve old content, which leaves you with three bad options.

  1. You can kick the can down the road and try to parse multiple data formats out of one poorly designed table. Good luck with that logic.
  2. You can kick the can down the road and try to compensate by creating new database models for each data set. Have fun maintaining a different set of functions for every variant you've had to build.
  3. You can call it a wash and accept that the old data is just going to be wrong or inaccessible. Let's be honest, though. This is only an option because you don't want to deal with #1 and #2, and you're going to spend the whole time hoping nobody else pitches those options while you're selling this one. ProTip: they will.

Alternatively, you could do what you should've done in the first place and take your content to a developer who knows what they're doing. Except now your database models — the same database models that (let's face it) are probably scattered throughout the entire project — are probably totally useless and need to be swapped out. Which means that you need to rebuild everything that touches those old database models. Good luck selling that idea to your boss, customer, and co-workers.

So please. Take some time and model your database properly the first time. Everyone will be glad that you did. If you don't have time in the project to do this, you're probably better off just storing JSON files directly on the disk.

Comments

Back to top