C# - Performance Benchmark Mistakes, Part Three [Beginner]

As I discussed in part one of this series, a C# program actually goes through two compilers before the code runs. First the C# compiler converts the code into "Common Intermediate Language", or "IL", which is written into the assembly -- the .EXE or .DLL file of the program. Then at run-time, immediately before a method is called for the first time, the aptly-named "just in time" compiler -- the jitter, for short -- transforms the IL into actual executable machine code. Because computer programmers love to verb words, this process is called "jitting a method".

The jitter was built to be fast, but it's not infinitely fast. You should expect that the first time you run a method it will be considerably slower; before it can run for the first time all its IL has to be translated into machine code. And of course, if the methods that it calls in turn are all being called for the first time then they've got to be jitted as well.

But wait, it gets worse. Suppose you are calling a method for the first time, and it in turn creates an object from a library. If that library is being used for the first time in this process, not only does the code for the constructor have to be jitted, but before it can be jitted the library has to be found on disk, mapped into virtual memory, analyzed for security problems, and any code in the static constructors of the newly-created object has to be jitted and executed.

In short, calling code for the first time can be massively more expensive than calling it for the second time. So this brings us to...

Mistake #6: Treat the first run as nothing special when measuring average performance.

In order to get a good result out of a benchmark test in a world with potentially expensive startup costs due to jitting code, loading libraries and calling static constructors, you've got to apply some careful thought about what you're actually measuring.

If, for example, you are benchmarking for the specific purpose of analyzing startup costs then you're going to want to make sure that you measure only the first run. If on the other hand you are benchmarking part of a service that is going to be running millions of times over many days and you wish to know the average time that will be taken in a typical usage then the high cost of the first run is irrelevant and therefore shouldn't be part of the average. Whether you include the first run in your timings or not is up to you; my point is, you need to be cognizant of the fact that the first run has potentially very different costs than the second.

Let's illustrate the enormous effect that the jitter can have on the first run of a program by modifying our example from last time. Let's suppose we've written our own implementation of the standard "quicksort" algorithm and we want to benchmark its performance:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

static class Extensions
{
  public static void Quicksort(this int[] items)
  {
    Quicksort(items, 0, items.Length - 1);
  } 
  static void Quicksort(int[] items, int leftBound, int rightBound)
  {
    int left = leftBound;
    int right = rightBound;
    int pivotIndex = leftBound + (rightBound - leftBound) / 2;
    int pivot = items[pivotIndex]; 
    while (left <= right)
    {
      // Find an item greater than the pivot
      while (items[left] < pivot)
        left += 1;
      // Find an item less than the pivot
      while (pivot < items[right])
        right -= 1;

      // Swap them
      if (left <= right)
      {
        Swap(ref items[left], ref items[right]);
        left += 1;
        right -= 1;
      }
    }

    // Everything less than the pivot is now left of the pivot.
    // Sort this portion.
    if (leftBound < right)
      Quicksort(items, leftBound, right);
    // Same for the right.
    if (left < rightBound)
      Quicksort(items, left, rightBound);        
  }
  private static void Swap(ref int x, ref int y)
  {
    int t = x;
    x = y;
    y = t;
  } 
}

That's a pretty standard implementation of quicksort in C#. Now let's make some modifications to our benchmark test of last time. Here we'll make a random list, copy it to an array twice so that we know that both runs will be operating on exactly the same data, and see if we can deduce the cost of jitting those methods the first time they're called:

class P
{        
  static void Main()
  {
    var original = new List<int>();
    const int size = 1000;
    var random = new Random();
    for(int i = 0; i < size; ++i)
      original.Add(random.Next());

    var arr1 = original.ToArray();
    var arr2 = original.ToArray();

    var stopwatch = new System.Diagnostics.Stopwatch();
    stopwatch.Start();
    arr1.Quicksort();
    stopwatch.Stop();
    Console.WriteLine(stopwatch.Elapsed);
    stopwatch.Reset();
    stopwatch.Start();
    arr2.Quicksort();
    stopwatch.Stop();
    Console.WriteLine(stopwatch.Elapsed);
  }
}

When I run this on my laptop (again, remembering to compile into release mode, and running without the debugger attached) a typical result is 3500 microseconds for the first run and 700 microseconds for the second run; in other words, the first run here took roughly five times longer than the second run on average. It must have taken the jitter about 2.8 milliseconds on average to find the IL and translate it into efficient machine code.

Of course, that factor is relative, and I chose the array size somewhat arbitrarily. If we were sorting an array with a million elements then the ~3 millisecond jit cost would be a barely-noticeable bump. If we were sorting an array with twenty elements then the jit cost wouldn't be five times worse; it could be a hundred times worse.

Moreover, it's important to note that different jitters give different results on different machines and in different versions of the .NET framework. The time taken to jit can vary greatly, as can the amount of optimization generated in the machine code. The jit compilers on the Windows 32 bit desktop, Windows 64 bit desktop, Silverlight running on a Mac, and the "compact" jitter that runs when you have a C# program in XNA on XBOX 360 all have potentially different performance characteristics. That's...

Mistake #7: Assuming that run-time characteristics in one environment tell you what behavior will be in a different environment.

Run your benchmarks in the actual environment that the code will be running in; use machines that have the same hardware and software that will typically be used by the customers who ultimately will run the code.

Next time in this series we'll take a look at how the garbage collector can affect performance benchmarks.

C# 4 All

Search This Blog