Wednesday, February 20, 2008

How To Go Slow: Summing Arrays



In this entry I'll talk about thrashing when iterating over complex arrays.

Consider a square array with the size of N = 10000;
Now, consider the code that summs elements of the square array

for (int row = 0; row < N;, ++row)
for (int col = 0; col < N; ++col)
sum += A[row, col];

Or on the other hand:
for (int col = 0; col < N; ++col)
for (int row = 0; row < N; ++row)
sum += A[row, col];


How do you think is there any difference between these two approaches?
The answer is yes - difference is quite noticeable... in terms of performance.

First approach takes about 1 second to complete, while the second - nearly 14 seconds! How can this be you may ask?

The answer is memory layout, caching and thrashing.

In .NET much like in C++ arrays are stored row-wise in contiguous memory. So, if you will access array rows first you will access contiguous memory. That means that the next data item you need will be be in pipeline, cache, RAM, and the next hard drive sector before you need it will be in the cache.

But if you go through columns first then you will be repeatedly reading just one item from each row before reading from the next row. As a result your system's caching mechanism and lookahead will fail to give you recent items, and you will waste a lot of time waiting for RAM, which is about three to five times slower than cache. It can get even worse, you may end up waiting for the hard drive. HDD can be millions of times slower than RAM if accessed in large random jumps.

The moral: the less sequential your data access, the slower your program will run :)

Friday, February 01, 2008

When string.ToLower() is Evil


Did you know how evil string.ToLower() can sometimes be?

Let me explain...

Very often I see code similar to this:

void DoBadAction (string val)
{
if (val.ToLower() == "someValue")
{ //do something
}
}

The code above can lead up to 4 times in performance loss when doing string comparison operations.

Best method to do such kind of case insensitive comparison is using string.Equals(...) method.

void DoGoodAction(string val)
{
if (val.Equals("someValue", StringComparison.OrdinalIgnoreCase))
{ //do something
}
}

Why is it so? The reason lies in the string type peculiarity - it is immutable.

Since it is an immutable - string.ToLower() will always return new string instance. Thus generating extra instance of string on every ToLower() call.

Detailed information about string.Equals with StringComparison enumeration can be found here.
Other performance related tips and tricks can be found here.