Moving past the monolith, Part 2 – Separating your data

This is part of a series of posts about Moving Past The Monolith. To start at the beginning, click here.


Separating your business logic code into modules is one thing, but in many cases, that is the easy part. The tough part is the data.

In most large applications, you can probably think of a few database tables that are at the core of the application, have many other tables linking to it, and are pretty much used everywhere.

While you probably won’t be able to achieve total separation of concerns in your data, there are things you do to separate things as much as you can. You will probably always have some shared tables, but knowing that you’re only sharing some of them is better than having to assuming that you’re sharing all of them.

Schemas

One simple approach is to create separate schemas in your database (assuming you’re in a database like SQL Server or Oracle that supports multiple schemas within a database). Schemas are a good way to keep things separate – you can assign permissions at the schema level, and they are a good indication of which code module owns a database table. Since the schemas are all in the same database, you can write queries across schemas if you want to (as long as you acknowledge that you’re introducing more coupling by doing so).

What about the shared things?

What do you do with the core tables that need to be shared across all modules? You still have options here.

  • You could acknowledge that every module needs to link to a given table, and just live with it
  • You could have one shared table that contains the primary key and shared data points, and then each schema has its own table which has columns that only it cares about
  • Each schema could have its own version of the shared data, with some batch process keeping the data from a master table in sync with copies of the data in other schemas

Here are some questions that you should ask:

  • Can you keep this table in a schema for a given module, or do you need the same data in multiple modules?
  • Could you easily move a schema and its tables into a completely separate database if you wanted to? What would happen if you did, and how hard would it be?
  • If you have multiple versions of the same data in different schemas, how will you keep them in sync? Does the benefit of having the separation outweigh the work it will take to keep the data in sync?
  • Does all of your data need to be completely up to date in real time, or can certain data be updated nightly or at some interval with a batch job?

There are a lot of trade-offs to consider here, and I imagine that most people will have some shared tables that are used throughout the application. That is fine, but what I think is important is that you consider the trade-offs that come with a shared table, and if you choose to accept that because it’s worth it, then go with it. But the default should be to try and keep table separated, and move them together intentionally.


Read the next post in this series, Think about how you share data.