Looking back at my years in college studying Computer Science, I remember quite well all the classes about disparate technologies and programming languages. It seemed that our future was going to be turbulent and full of options, but one thing remained constant throughout all the courses: we used relational databases for data storage.
Relational databases were the only choice back then. Every company used them, and every software engineer had to know the ins and outs of tables, relationships, foreign keys, and indexes to have a successful career in the software business.
But the world is rapidly changing, and with it the utter dominance of relational databases has come to an end. There’s a new kid in town, or better said, an entire family of the so-called NoSQL databases.
What’s wrong with our ol’ good relational model?
There is one thing that has drastically changed the current state of enterprise applications: the Internet. As a whole it has grown past my wildest dreams back when I was a student, to a massively global phenomenon, bringing every day more and more connected users who are producing, storing, and consuming more and more data.
Unfortunately, all this data is usually disparate, unstructured, and hard to frame into the rigid schema constraints imposed by relational databases. On top of that, accessing this data when structured in a relational way requires collecting bits and pieces from multiple tables to conform the final result presented to the user. This process is slow, and although it works well for medium-size applications, with so many new users and so much more data, the performance implications are amplified to the point where alternative solutions are needed.
When faced with these scalability problems, engineers used to vertically scale their infrastructure. Vertical-scaling means adding more hardware to the problem: the database server would get a better and faster CPU, more memory, and more hard drive space. As you might have guessed already, this type of scaling is not only limited, but also extremely expensive.
But the internet doesn’t care and keeps growing. And more and more users are online accessing our application and producing more and more data. And at certain point, our database is the single bottleneck of the entire architecture. Setting up a server farm is a complicated process, and replication models to horizontally scale a relational database require too much money, too many trade-offs, and too much time from a team.
Making horizontal scaling a possibility
If instead of having one super-powerful database server attending our increasing number of users, we have multiple regular machines sharing the load we would be able to scale indefinitely for a fraction of the price, right?
Right. But this means that we’d need to relax some of the schema constraints imposed by relational databases in order to easily shard our data into multiple partitions distributed in multiple servers. Breaking apart a strong relational database into several chunks of data is not an easy task, especially because the existing intricate web of relationships and constraints. What makes relational databases great, is also their main drawback for scalability purposes.
Here is where NoSQL databases have something to offer. Among all their specific characteristics, they usually don’t impose a fixed schema for your data, and you can structure them in a simple way for the database motor to partition it into multiple shards. This flexibility allows engineers to very easily map any piece of information without worrying where that data will live or how it will be distributed.
Of course, this comes at a price; NoSQL databases lack all the structure that make relational databases easy to use and understand. This lack of structure often leads to duplication of the data, and to a less expressive query language. Like any other problem in computer science, whether we use NoSQL or stick with our relational database is all about context.
The NoSQL database family
If you are reading this article, you are probably familiar with relational databases. No matter what vendor is behind your preferred engine, they all have a lot in common. The steps you followed yesterday to model your database are probably the same steps you’ll take tomorrow. Relational databases are very well known and their similitudes make working with them and moving from one to another a very straightforward process.
On the other hand, the NoSQL family is huge with 150 different databases at the time of this writing. These databases are categorized in different groups based on their main characteristics and features. There's certainly something out there for every specific need that you might encounter with your particular application.
Why should you care about NoSQL?
There are multiple reasons, but I'm going to mention the main three from my point of view:
Is where the market is going. Take a look at some of the job posts from big technological companies and you'll see how demand for NoSQL database is on the rise, and with more data being produced every day, the NoSQL family will become more and more useful.
Scalability for your own application. If you are building an application (or are planning to) and you are struggling with the idea of a massive amount of information, start taking a look into the NoSQL world.
It will take you to a different dimension of Computer Science. If you are up to increase your value as a professional, pick your tool and dive in. If your experience is anything like mine, you'll be for an unforgettable ride into all sort of technical challenges.
It will be fun for sure. If you haven't already, find the opportunity and start learning.