Keep an eye on these Next-Gen databases

Keep an eye on these Next-Gen databases

Why RDBMS databases may soon be a thing of the past

Introduction

While I'm a sucker for a good Postgres instance, I do like to keep my finger on the pulse of new technologies to see what is out there. In this article, I'll play a little bit of a devils advocate against standard SQL databases while exploring some next-gen databases to see what they bring to the table.

While I'm not a DBA or any sort of database expert, as a backend engineer, I regularly work with various databases in different capacities so I will focus on their implementations and how they can work in different workflows.

Problems with RDBMSs (and SQL)

So to start we'll look at the issues with the current Relational Database Management Systems (RDBMS). While this term isn't as commonly used, it's pretty synonymous with SQL. I like to think of it as RDBMS being the book while SQL is the language it's written in. Today the main ones are MySQL, PostgreSQL, Oracle, and SQL Server. Lets look at what issues they have!

Not natively distributed

Standard SQL databases aren't distributed natively, but have been some innovations to change this. The birth of sharding SQL databases came from needing a way to distribute a database. This could be done for various reasons but this begins to show some cracks in the benefits of SQL.

While their are certain extensions like Citus for extending Postgres to a distributed database, most people see this is a workaround to the issue compared to a real fix. With most of these extensions needing specific work to make your database distributed. Additionally, you are now tied to this extension for your database setup to work. While it's not a bad setup for legacy databases, it seems like a hacky workaround if you're building something new and need the distributed functionality.

Not natively serverless

As this ties in with natively distributed, there's a difference. Distributed means the data is either partitioned across multiple instances or there are multiple replications of the same data. Serverless means there is no permanent server for the database. These work nicely together as a serverless database can spin up more instances when needed in a distributed way to scale as needed. Additionally, it can scale down as use lowers to save cost.

This issue has had less support for extending existing SQL databases and was a main push for the creation of NoSQL (and eventually NewSQL) systems.

Relationships are second-tier citizens

Although this is a more specific issue with SQL databases, I think it will become a continuously growing issue as the world becomes more interconnected. With the acceleration of network-based data like recommendation engines and distributed systems (web3), there may be a growing need for relationships being core to databases.

While there are relationships present in relational databases, many critics state they're treated as second-tier citizens to the system. The object is the core of most of these databases, with the connections to others being slow for more complex reads. This is the main argument for the benefits of graph databases.

Have issues handling unstructured data

SQL was built for structured tables that fits nicely into strict table schemas. But what if your data isn't like that? A gap between SQL data structures and the use of unstructured data has been growing for years now and is a major issue. With data-types for key-value pairs to documents, there are a growing number of needs for storing data that doesn't fit into a typical SQL database.

There has been some introductions like JSON (and JSONb) to Postgres but again these are small additions to allowing some unstructured data but they definitely don't feel like the main focus.

The Saviors

Now lets look at the new databases that are created to save us from all our problems! I've broken these up into NoSQL and NewSQL. While these are pretty generic categories (especially NoSQL), I'll specific how each one is unique in it's benefits.

NoSQL

Even though it's a pretty generic term and grouping of databases, it typically refers to databases which a "non-relational". With databases ranging from Firestore to Redis, these databases have made incredible innovations in the way we store and access data around the world. While great the main con is they usually don't have relational actions like joins and can become quite cumbersome to use if relations are needed.

Firestore

I have a soft spot for Firestore as it's an inexpensive and easy to use database. It's a document store which I like to think about as a group of searchable, filterable, and indexable jsons. With an incredibly open schema, you're able to quickly adapt to new business requirements while keeping it backwards compatible. Additionally, as a fully managed service (with great extensions) you're able to focus on your product and no worry about the database. Lastly, with it being serverless you only pay for what you use and with it being distributed it can scale to handle an incredible amount of reads and writes.

Redis

I feel like Redis is typically forgotten from the NoSQL talk but I think it's a great database to highlight. Made by Salvatore Sanfilippo in 2009, to help with his startup's own scaling issues, it is a persistent in-memory key-value store that's extremely fast and scalable. It's typically used as a database, cache, or message broker. The in-memory aspect allows for reads to be blazing-fast which works well for caching or any set of data that is queried often.

NewSQL

NewSQL is newer term than NoSQL. It's typically characterized as mixing the cloud native functionality of NoSQL with the transaction and consistency requirements of SQL. This ideally would allow for a serverless, distributed, and relational database all in one.

CockroachDB

Initially created in 2015 as an open-source project by ex-Googlers, it moved to a proprietary license in 2019 and started on their enterprise offerings. CockroachDB is a NewSQL database made by CockroachLabs. Self-described as "The open source, cloud-native distributed SQL database, it allows for an distributed and fault tolerant database that's powered by SQL (specifically Cockroach SQL). Although it's still a new technology, it's been wildly adopted by a lot of big companies and has shown great success.

EdgeDB

Built from the same people behind MagicStack, EdgeDB is still very young but incredibly interesting. The first stable release was in 2022 and looks to have a lot more to show. Stated as "the first graph-relational database" it hopes to bridge the gap between SQL and graph databases. With using SQL, strict schemas, and built-in migrations it acts like an SQL database with graph database functionality. It allows for deep linking that doesn't require joins to fetch deeply connected data.

I'm extremely excited to see how well the underlying graph relations work and if this paradigm with become more common in the future.

Summary

Thanks for reading! I hope this helped shine some light on some relatively new databases. If there's any interesting databases that I missed, feel free to drop and comment! I always love learning about new technologies.