Task Parallel Library – multi-threading goes mainstream

The Task Parallel Library is nothing new (it came out with .NET 4), but recently I had to diagnose some slowness in my application and I thought that the TPL might be what I was looking for, so I dove in.

Doing work in multiple threads is nothing new. You’ve been able to do new Thread(...).Start() and ThreadPool.QueueUserWorkItem(...) for some time now. But if you’ve used either of those methods, you know that it’s not simple and it usually feels like more work than it’s worth.

This is where the Task Parallel Library is awesome — it makes multi-threading easy. This doesn’t mean that it’s trivial (you can still have deadlocks, race conditions, etc.), but now you can introduce multithreading into your application with minimal effort.

I’m guessing that many of you have had the same problem that I had. You have a screen which will load up all of the details about an account/customer/order/whatever your system deals with. If you use a relational database (which most of you probably do), this probably means that you go and run a series of SQL queries (using an ORM or not) to load up all of the data that you need on the screen.

In my case, I had several slow SQL queries that had to run (and by slow, I mean ~2 seconds). The queries weren’t all that slow, but if you do 20 SQL queries and each one waits until the previous one finishes, you end up with it taking 12 seconds to load all of the data. Not good.

So what I did was to do each call as a Task and let the framework do it as fast as it can with as many available threads as possible. This is really easy to do, actually, once you learn a few basics.

Here’s a sample of what my old code looked like:

dto.Enrollments = EnrollmentRepository.GetEnrollmentsForAccount(accountId);
dto.Usages = UsageRepository.GetUsagesForAccount(accountId);
dto.Invoices = InvoicesRepository.GetInvoicesForAccount(accountId);
return dto;

(Obviously, I’m simplifying the code a bit for the sake of this post, there were about 20 calls in the real code.)

Here’s what the code turned into:

var tasks = new List();
tasks.Add(Task.Run(() => dto.Enrollments = EnrollmentRepository.GetEnrollmentsForAccount(accountId)));
tasks.Add(Task.Run(() => dto.Usages = UsageRepository.GetUsagesForAccount(accountId)));
tasks.Add(Task.Run(() => dto.Invoices = InvoicesRepository.GetInvoicesForAccount(accountId)));
Task.WaitAll(tasks);
return dto;

I really only added two things. Task.Run() tells the framework to do an action on a new thread. Task.WaitAll() tells the framework that the execution of the code should wait at that point until all tasks are complete.

With this simple change, I was able to take the loading of an account from ~12 seconds down to 2 seconds!

This is not the first time that I’ve written code that does things on multiple threads. But this is the first time that I did it and it was this easy. This is important because now it’s easy enough to make me think about doing things asynchronously more often. In the JavaScript world, doing things asynchronously is the norm (Node.js basically forces you do write asynchronous server code). It feels a little weird at first, but it makes total sense. If I have this fancy computer that has lots of hyper-threaded cores, why wouldn’t I take advantage of it? That would be like having an assembly line with 8 workers but then having one person do all the work while the other seven sat around and watched.

How much can I actually do at once?

This got me curious. I know that I have a quad-core machine, and the cores are hyper-threaded so each core can do 2 threads at once. This means that my machine should be able to do 8 things at once. I decided to find out what would actually happen.

This code will queue up 50 tasks. Each task will sleep for 1 second and then tell me when it finished.

[TestMethod]
public void How_many_things_can_we_do_at_once()
{
    Func action = i =>
        {
            return (() =>
                {
                    Thread.Sleep(1000);
                    Console.WriteLine("Task {0} finished at {1}.", i, DateTime.Now.ToLongTimeString());
                });
        };

    var tasks = new List();
    for (var i = 0; i < 50; i++)
    {
        tasks.Add(Task.Run(action(i)));
    }

    Task.WaitAll(tasks.ToArray());
}

Here's the output of the test:

Task 1 finished at 10:10:52 PM.
Task 2 finished at 10:10:52 PM.
Task 3 finished at 10:10:52 PM.
Task 0 finished at 10:10:52 PM.
Task 4 finished at 10:10:52 PM.
Task 6 finished at 10:10:53 PM.
Task 5 finished at 10:10:53 PM.
Task 7 finished at 10:10:53 PM.
Task 8 finished at 10:10:53 PM.
Task 9 finished at 10:10:53 PM.
Task 10 finished at 10:10:53 PM.
Task 11 finished at 10:10:54 PM.
Task 12 finished at 10:10:54 PM.
Task 13 finished at 10:10:54 PM.
Task 14 finished at 10:10:54 PM.
Task 16 finished at 10:10:54 PM.
Task 15 finished at 10:10:54 PM.
Task 17 finished at 10:10:55 PM.
Task 19 finished at 10:10:55 PM.
Task 18 finished at 10:10:55 PM.
Task 20 finished at 10:10:55 PM.
Task 21 finished at 10:10:55 PM.
Task 22 finished at 10:10:55 PM.
Task 23 finished at 10:10:55 PM.
Task 24 finished at 10:10:56 PM.
Task 25 finished at 10:10:56 PM.
Task 26 finished at 10:10:56 PM.
Task 27 finished at 10:10:56 PM.
Task 28 finished at 10:10:56 PM.
Task 29 finished at 10:10:56 PM.
Task 30 finished at 10:10:56 PM.
Task 32 finished at 10:10:57 PM.
Task 31 finished at 10:10:57 PM.
Task 33 finished at 10:10:57 PM.
Task 34 finished at 10:10:57 PM.
Task 36 finished at 10:10:57 PM.
Task 35 finished at 10:10:57 PM.
Task 37 finished at 10:10:57 PM.
Task 38 finished at 10:10:57 PM.
Task 39 finished at 10:10:58 PM.
Task 40 finished at 10:10:58 PM.
Task 41 finished at 10:10:58 PM.
Task 42 finished at 10:10:58 PM.
Task 43 finished at 10:10:58 PM.
Task 45 finished at 10:10:58 PM.
Task 44 finished at 10:10:58 PM.
Task 46 finished at 10:10:58 PM.
Task 47 finished at 10:10:59 PM.
Task 49 finished at 10:10:59 PM.
Task 48 finished at 10:10:59 PM.

The test completed anywhere from 6-8 tasks per second in this test run. I did other runs where it never got above 6 tasks per second. It all depends on whatever else my machine is trying to do at the same time that is tying up the cores. The good thing is that the framework is taking care of creating the threads, figuring out when a thread can run, disposing of the thread when it's done, and all of that plumbing that I don't have to worry about.

Learn more

I found a really good series of blog posts that covered all kinds of different topics related to parallelism, threads, tasks, async/await, and everything related to it. It's definitely a new way of thinking for me but I'm starting to think of all kinds of ways that parallelism is going to change how I write code.