As I discussed in part one of this series, a C# program actually goes
through two compilers before the code runs. First the C# compiler
converts the code into "Common Intermediate Language", or "IL", which is
written into the assembly -- the .EXE or .DLL file of the
program. Then at run-time, immediately before a method is called for the
first time, the aptly-named "just in time" compiler -- the jitter, for
short -- transforms the IL into actual executable machine code. Because
computer programmers love to verb words, this process is called "jitting
a method".
The jitter was built to be fast, but it's not infinitely fast. You
should expect that the first time you run a method it will be
considerably slower; before it can run for the first time all its IL has
to be translated into machine code. And of course, if the methods that it calls in turn are all being called for the first time then they've got to be jitted as well.
But wait, it gets worse. Suppose you are calling a method for the
first time, and it in turn creates an object from a library. If that
library is being used for the first time in this process, not only does
the code for the constructor have to be jitted, but before it can be
jitted the library has to be found on disk, mapped into virtual memory,
analyzed for security problems, and any code in the static constructors
of the newly-created object has to be jitted and executed.
In short, calling code for the first time can be massively more
expensive than calling it for the second time. So this brings us to...
Mistake #6: Treat the first run as nothing special when measuring average performance.
In order to get a good result out of a benchmark test in a world with
potentially expensive startup costs due to jitting code, loading
libraries and calling static constructors, you've got to apply some
careful thought about what you're actually measuring.
If, for example, you are benchmarking for the specific purpose of analyzing startup costs then you're going to want to make sure that you measure only
the first run. If on the other hand you are benchmarking part of a
service that is going to be running millions of times over many days and
you wish to know the average time that will be taken in a typical usage
then the high cost of the first run is irrelevant and therefore shouldn't be part of the average.
Whether you include the first run in your timings or not is up to you;
my point is, you need to be cognizant of the fact that the first run has
potentially very different costs than the second.
Let's illustrate the enormous effect that the jitter can have on the
first run of a program by modifying our example from last time. Let's
suppose we've written our own implementation of the standard "quicksort"
algorithm and we want to benchmark its performance:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
static class Extensions
{
public static void Quicksort(this int[] items)
{
Quicksort(items, 0, items.Length - 1);
}
static void Quicksort(int[] items, int leftBound, int rightBound)
{
int left = leftBound;
int right = rightBound;
int pivotIndex = leftBound + (rightBound - leftBound) / 2;
int pivot = items[pivotIndex];
while (left <= right)
{
// Find an item greater than the pivot
while (items[left] < pivot)
left += 1;
// Find an item less than the pivot
while (pivot < items[right])
right -= 1;
// Swap them
if (left <= right)
{
Swap(ref items[left], ref items[right]);
left += 1;
right -= 1;
}
}
// Everything less than the pivot is now left of the pivot.
// Sort this portion.
if (leftBound < right)
Quicksort(items, leftBound, right);
// Same for the right.
if (left < rightBound)
Quicksort(items, left, rightBound);
}
private static void Swap(ref int x, ref int y)
{
int t = x;
x = y;
y = t;
}
}
That's a pretty standard implementation of quicksort in C#. Now let's
make some modifications to our benchmark test of last time. Here we'll
make a random list, copy it to an array twice so that we know that both
runs will be operating on exactly the same data, and see if we can
deduce the cost of jitting those methods the first time they're called:
class P
{
static void Main()
{
var original = new List<int>();
const int size = 1000;
var random = new Random();
for(int i = 0; i < size; ++i)
original.Add(random.Next());
var arr1 = original.ToArray();
var arr2 = original.ToArray();
var stopwatch = new System.Diagnostics.Stopwatch();
stopwatch.Start();
arr1.Quicksort();
stopwatch.Stop();
Console.WriteLine(stopwatch.Elapsed);
stopwatch.Reset();
stopwatch.Start();
arr2.Quicksort();
stopwatch.Stop();
Console.WriteLine(stopwatch.Elapsed);
}
}
When I run this on my laptop (again, remembering to compile into
release mode, and running without the debugger attached) a typical
result is 3500 microseconds for the first run and 700 microseconds for
the second run; in other words, the first run here took roughly five times longer
than the second run on average. It must have taken the jitter about 2.8
milliseconds on average to find the IL and translate it into efficient
machine code.
Of course, that factor is relative, and I chose the array size
somewhat arbitrarily. If we were sorting an array with a million
elements then the ~3 millisecond jit cost would be a barely-noticeable
bump. If we were sorting an array with twenty elements then the jit cost
wouldn't be five times worse; it could be a hundred times worse.
Moreover, it's important to note that different jitters give
different results on different machines and in different versions of the
.NET framework. The time taken to jit can vary greatly, as can the
amount of optimization generated in the machine code. The jit compilers
on the Windows 32 bit desktop, Windows 64 bit desktop, Silverlight
running on a Mac, and the "compact" jitter that runs when you have a C#
program in XNA on XBOX 360 all have potentially different performance
characteristics. That's...
Mistake #7: Assuming that run-time characteristics in one environment tell you what behavior will be in a different environment.
Run your benchmarks in the actual environment that the code will be running in; use machines that have the same hardware and software that will typically be used by the customers who ultimately will run the code.
Next time in this series we'll take a look at how the garbage collector can affect performance benchmarks.
Comments
Post a Comment