A NoSQL Hypothesis

Software developers are notorious for falling in love with new, shiny things. I always try to use the right tool for the job, whether that’s new or old technology. But every now and then I find something that makes a whole lot of sense in my head and I can’t ignore it.

I’ve been learning about NoSQL databases for the last year or so mainly because there’s a lot of noise about them and I was curious, and I also find data access to be this annoying problem that always seemed to be harder than it should be. First I looked into MongoDB and then eventually RavenDB. I haven’t used either of them in production yet (although I work with people who have), which is why I titled this article a “hypothesis” because I don’t really have any real life stories to tell to back up what I’m going to say.

In my research of NoSQL databases I noticed two particularly interesting things:
1) I find that ORMs make data access with relational databases much easier in most cases over using stored procs for everything, but even with all of the experience I have with various ORMs, data access is still a pain.
2) Ayende Rahein was one of the main contributors to NHibernate for years, and he probably understands ORMs better than almost anyone in the world. And yet even he decided to create RavenDB and move to NoSQL. If the ORM expert decides that using an ORM is too hard, maybe he’s onto something.

Here’s how I see it: relational databases are really good for reporting and querying, but not so good for loading information in an application. NoSQL databases are really good for loading information in an application, but not so good for reporting and querying.

The application I’m working on is 2 years old now and we use SQL Server. We are starting to accumulate a lot of data in the database, which is exposing performance problems in our application and everything is starting to process a little slower as we get more data in the database. Our database is not good at loading information in the application (it takes a lot of queries to load up the main screen because it has to query so many tables).

We are also starting to move away from reporting against our database now and moving more data into our business intelligence data warehouse so that our reports can run faster and be written against a relational schema that is optimized for the types of reports that we tend to write.

So let’s recap.

1) Our app is not good at loading data into the application using a relational database
2) Our relational database is not going to be used for much reporting and querying because we’re going to move the data into another database for that purpose.

So with that in mind, why am I using a relational database? Shouldn’t I be optimizing for the loading and saving of the primary objects in my application since that’s my pain point? (I haven’t even started talking about the increased in developer productivity from being able to do more in the business layer and less in the data access layer and stored procs.)

I’m not saying that I’m going to switch my application from SQL Server to RavenDB, because I don’t know that it would be worth the time at this point. But if I were to do so, I would be able to delete a slew of complicated stored procedures and turn my data access layer to a really really thin layer than essentially just passes a C# object to RavenDB. I would be able to load, modify, and save our primary object (and all it’s children) with 2 database calls instead of over 20 (in some cases). If I use RavenDB, there are built in mechanisms to write triggers that will take updates from RavenDB and replicate them to a relational database.

When people start a new application, it’s just assumed that they will use a relational database. But why? Is a relational database really the best data store for every application? Given that we have good NoSQL alternatives, maybe it’s time we start evaluating all the options.

I’m sure it’s not all rainbows and unicorns, and in some ways you’re just trading one set of problems for another. Dealing with large amounts of data is never easy. But I’m seeing a lot more solutions than problems.