Navigating the Maze: Evolving Projects and Database Dilemmas
Exploring the complexities of managing projects and the challenges posed by database management. Delving into strategies for navigating these issues.
Join the DZone community and get the full member experience.
Join For FreeImagine this: A new project starts with a clear vision and well-structured code. Over time, it takes on a life of its own, growing into a web of features, requirements, and code modules. Despite this growth proves the project’s success, it also increases its complexity, which can become overwhelming if not managed properly. You've heard this story before.
As all of us know, actual programming begins to steer this complexity into a maintainable and scalable form. We need to ensure that the expansion of our project is linear or at least predictable in its complexity. The project’s relational database schema is often a prime battleground for this effort.
Relational databases have been the bedrock of software systems for decades, and their story is far from over. It is like a well-oiled machine that offers a structured and dependable method to store and retrieve the data—the blood of any application.
You’ve already noticed that we started talking about complexity. The sad story is that once explicit schemas begin expanding into networks of tables and relationships, the well-normalized database may evolve into a monolith that is cumbersome to query and expensive to maintain. Surprise! So instead of providing benefits, it starts to create problems. The big risk is that it happens slowly and implicitly.
What issues can we have here?
- Query complexity: Business logic usually needs several models, which can increase the complexity of the SQL queries.
- Joins: And here we’ve got joins again. They become so numerous that they have an impact on performance and slow down development in general.
- Applications require magical ORM frameworks to manage data. But I hope all of us do not like software magic.
- The structure changes: Changing the complex structure of tables can be uneasy.
- The more tables you need to replicate on secondary servers, the more fun you have (sarcasm: it is more complex).
Just an example of an entity in payments (blurred).
This topic can be really hot, for example, payment systems. But what to do?
Solving the Puzzle
First, we must define our domain's boundaries clearly. Consider a payment system that facilitates transactions; the primary boundary context is the 'payment' module. I believe that payments can contain approximately ten distinct entities that collectively encapsulate an order entity. Clear boundaries are essential in any project, especially when dealing with complex puzzles. Those familiar with Domain-Driven Design (DDD) principles will notice their influence here.
Second, if we write down all entities in 3ed normal form, we will most likely end up with three or four dozen tables. Managing such a schema can be depressing, and changing it is even more so. What can we do now? JSON is an effective solution for a highly normalized database structure. Yes, store it as a JSON serialized entity.
What do we have here? At first glance, this may seem to be a NoSQL approach, but it is based on a relational database. Let's enumerate the benefits; there are many.
- Data is stored in JSON format. It is the most widely used serialization/deserialization format, supported by dozens of frameworks. Jackson is familiar with polymorphism and a variety of other Java tricks and options.
- Today's databases natively support JSON. At the very least, normal databases, such as Postgres, have "::json" and "::jsonb". In my opinion, Postgres provides the best JSON support available.
- Atomicity of change. Such a design requires everyone to think in domain logic. This creates clear boundaries between entities. Overall, it gives a significant benefit. We still have transactions, but this approach makes them more obvious.
- It's easy to add audit capabilities to the entity.
- Encryption. This can be critical. A compact entity representation enables the encryption and decryption of an object, which is critical in certain legal cases. This approach gives a single-read operation for the domain entity. Otherwise, we need to join and seek this data from multiple tables. There is a clear performance advantage.
- With such straightforward table structures, there’s no need for complex/magical ORM frameworks. It makes working with the entity simple: read/serialize, modify, and save/deserialize (in addition, any real systems will use ORMs).
- Even though the table represents a complete domain entity as a JSON blob, it is still a table. It ensures the advantages of ACID properties and other relational database features, such as consistency guarantees. Today, few systems can compete with relational databases in this regard.
- Optimistic locking is free. Just add a revision column, and that’s it. I don't even want to think about how to add an optimistic lock to an entity spread across two tables. Do you?
Make It Work
However, this approach has a cost. What essential things do we need to do?
Schema Versioning
It is critical to store the schema of the JSON structure in a separate column. For example, the orders table contains a schema column. The goal is straightforward: keep the JSON structure under control. The structure of your object will inevitably change, sometimes dramatically and without backward compatibility. As a result, it is prudent to prepare simple tools for schema migration; they will prove invaluable.
Building Backfilling Tools in Advance
Yes, this can be painful, but it is critical to develop backfilling code and tools that can convert data structures from one schema version to another. Ideally, these tools should be developed well in advance. The best practice to reduce complexity is to avoid keeping more than two versions of a document.
Indexes
Simply duplicate the fields as columns in the table and create standard database indexes for them. This approach combines JSON's flexibility with record searchability, while still allowing us to think in relational database terms. It ensures that all queryable fields are explicitly indexed, removing queries from unindexed fields!
Deserialization/Serialization
Using a strong serialization framework is critical here. Make sure it supports backward compatibility and can handle polymorphic types and other nuances. Many frameworks provide these features, including Jackson (my preferred choice when using Java).
It can be annoying to read and write full JSON just to update a single field. BUT, it could still be done.
Not all databases handle blob objects well during replication. Postgres looks good here, but other databases may be different.
Summary
As we wrap up our exploration, it looks like we are re-inventing NoSQL. I have doubts here. Using this approach, we still use a database, but in an intelligent way.
By actively utilizing all the benefits of a relational database, such as joins, transactions, and locks, we’re not just managing data; we’re thinking in terms of domains. This isn’t just about choosing a database technology; it’s about adopting a mindset that prioritizes clear, logical structuring of data.
A relational database is coolest if it relates to guarantees of data persistence and consistency. I think almost no NoSQL can give this guarantee. Just try to find good NoSQL with normal WAL!
This approach might look like a key-value database. But this is not true. This is a traditional database that needs to be thought of in terms of tables and this whole relational approach. We just simplified the data a little.
So, in short, we’re not just re-inventing the wheel—we're reshaping it to better fit the problem and evolving needs of the development.
Opinions expressed by DZone contributors are their own.
Comments