LINQ vs Loop Speed

There are always questions as to which is faster, plain old loops or LINQ select queries and the answer is always

depends

The biggest question is truly, what do you define as "faster"? Without arguing the definition too much I think it can be said that most developers define faster in this case as

Which is quicker to select/do something?

And to that end, I'm going to do simple experiment to demonstrate that the answer is never as easy or straightforward as expected.

Lets assume we have a range of numbers from 1 to 100,000,000. I want to fill a list with just the ones that are divisible by 3.

To that end, here are 3 methods for doing it, a For Loop, a For Each Loop and a LINQ Select:

  • First the list of numbers:

    • var numbers = Enumerable.Range(1, 100000000).ToList();
  • Then the For Each loop:

  • Then the For loop:

  • Lastly the LINQ query:

    • var divisibleBy3FromLinq = numbers.Where(n => n % 3 == 0);

I ran each one 100 times and averaged out their results to get:

  • For Each: 1,338
  • For: 1,596
  • LINQ: 0

Well that's a very impressive showing from LINQ, I guess we can stop here and say LINQ is the fastest, but we know that can't be right. Lets take a brief look at why we are getting such a false positive out of LINQ.

The types of each of our outputs are:

  • For Each: List<int>
  • For: List<int>
  • LINQ: IEnumerable<int>

That IEnumerable<int> is very important. LINQ is lazy and won't actually evaluate the statement until needed. Since we're not using divisibleBy3FromLinq anywhere then the numbers.Where(n => n % 3 == 0) is not evaluated, hence the impressive 0 time.

So lets evaluate it and get our actual calculation time:

  • For Each: 1,338
  • For: 1,596
  • LINQ: 1,877

Ouch, that's a surprisingly bad result. about half a second slower on average than a For Each loop. I won't get into the exact why of that, but suffice it to say LINQ has a lot going on under the hood. There are trade offs to using it, but thinking that it's "faster" because it's a one-liner vs a 7-liner (or 3 if you exclude curly braces) loop is just plain wrong.

The biggest benefits to using LINQ are readability (not that anything used as an example here is hard to read, but complicated statements can be much easier expressed in LINQ, usually) and lazy loading.

That being said, an average of less than half a second calculation difference over a hundred million numbers is fairly minor in the grand scheme of things. Obviously we need to write concise, efficient code, but if we're stressing over a 500 millisecond difference, then C# may not be the right language to be doing this in.

No comments:

Post a Comment