Monday, November 03, 2008

Bit Flags: The Simple Way


Time from time we face the need or (for some of us) an opprotunity to mess with the bit fields. As we all know bytes consist of bits. Bit can have two values "0" and "1".

Using this knowledge we can store several flags under single byte. In one byte of information we can efficiently store eight one bit flags (1 byte has 8 bits). If we use int values, which has four bytes - we can store 32 bit flags (4*8=32).

So, lets define 8 bit flags. We should receive following picture.

00000001 - 1-st flag
00000010 - 2-nd flag
00000100 - 3-rd flag
00001000 - 4-th flag
00010000 - 5-th flag
00100000 - 6-th flag
01000000 - 7-th flag
10000000 - 8-th flag

There are two ways how to define these flags in C#. Here is the first way: convert each binary number into hexadecimal representation (decimal can be also used). The result will be like this:
[Flags]
public enum FirstApproach : byte
{
First = 0x01,
Second = 0x02,
Third = 0x04,
Forth = 0x08,
Fifth = 0x10,
Sixth = 0x20,
Seventh = 0x40,
Eighth = 0x80,
}
We obtained rather compact definition. But if someone cannot convert binary (01000000) into hex (0x40) number very fast - she will have to use some special tool like calc.exe :). I consider the above mentioned way little bit tiresome.

Here is the second approach: we define "base" at first.
[Flags]
public enum DefBase : byte
{
First = 0,
Second = 1,
Third = 2,
Forth = 3,
Fifth = 4,
Sixth = 5,
Seventh = 6,
Eigth = 7,
}
Then, using "base" we can define our flags:
[Flags]
public enum SecondApproach : byte
{
First = 1 << (byte)DefBase.First,
Second = 1 << (byte)DefBase.Second,
Third = 1 << (byte)DefBase.Third,
Forth = 1 << (byte)DefBase.Forth,
Fifth = 1 << (byte)DefBase.Fifth,
Sixth = 1 << (byte)DefBase.Sixth,
Seventh = 1 << (byte)DefBase.Seventh,
Eighth = 1 << (byte)DefBase. Eigth,
}
Using 2-nd approach does not involve conversion from binary to hex. Using this method we can reuse DefBase multiple times to create required bit flags. Defining next bit flag is no more pain.

If application will have a lot of bit flags declarations then it is more usefull to use 2-nd approach as it can save time of not using additional tools.

Thursday, October 30, 2008

Handling Windows Operating System Version Mess



Operating System (OS) like any other software should have a version. So do new OSes from Microsoft.

Sometimes OS version is crucial for the installation software development process. Some products can work on XP and Vista, but cannot work on Windows Server 2003. MSI (Microsoft Installer) has special properties called VersionNT and WindowsBuild to ensure OS version.

Quite logically, eh?

Official documentation gives a reference table with OS versions and build numbers.

Using information from table we can define following WiX condition:

<Condition Message='This software requires the Windows Server
2003 to operate correctly'><![CDATA[VersionNT = "502"]]>
</Condition>
At the bottom of this table we can see strange thing: Vista and Windows Server 2008 have the same VersionNT and WindowsBuild numbers.

...
Windows Vista6006000Not applicable
Windows Vista Service Pack 1 (SP1)60060011
Windows Server 20086006001Not applicable
...


So, at this time to determine whether your installer runs on Windows Server 2008 you have to rely on ServicePackLevel property. It is bad, very bad - because when Microsoft will release service pack for the Server 2008 your WIX condition will not be telling truth...

Nontheless, here's how to include Windows Server 2008 launch condition into WIX script:
<Condition Message='This software requires the Windows Server
2003 or 2008 to operate correctly'><![CDATA[Installed OR (VersionNT = 502
OR VersionNT = "600" AND MsiNTProductType > 1)]]>
</Condition>

Update: In the launch condition above property MsiNTProductType was used to differentiate server from workstation

Friday, July 18, 2008

"Using" magic or working with type aliases



Generics in C# allow us specify and construct rather complex types. For instance, we can create a dictionary that maps Id number with the name:

Dictionary<int, string> idNameMapping;
We can create even more complex data structure, when Id number has to be mapped on another dictionary:
Dictionary<int, Dictionary<string, int>> complexIdMapping;
Nothing can stop us if we want put another types into dictionary (or any generic type) declaration.

You've noticed already that the more types we put into declaration the more clumsier it becomes. I'm not even mentioning typing effort :).
Here's how "using" keyword can help us reduce typing and introduce some level of self-documentation for the code.
using Item = System.Collections.Generic.Dictionary<string, string>;
namespace Sample
{
using Records = Dictionary<int, Item>;
public class Controller
{
Records recordDictionary = new Records();
}
}
Isn't that cool? Instead of typing in a lot of nested types with all that < and > we can get "normal" looking type names.

What do I read:

Friday, June 27, 2008

The basics of secure data exchange under TCP


Doing data exchange in plain text is very convenient and easy to implement but what can you do to prevent eavesdropping, tampering, and message forgery of the data you send back and forth? Here's where secure communication comes into play. At present the most common secure communication method is using Transport Layer Security (TLS) or Secure Sockets Layer (SSL). In web context you can see secure data exchange in action when browsing web-sites with HTTPS prefix

In .NET framework secure communications can be done with SslStream class. It can use both TLS and SSL protocols.

TLS and SSL for authentication process use public key infrastructure or PKI. It requires certificates.

Here's nice explanation how to create certificate using makecert utility

After reading and doing what was said in the above mentioned blogpost we should end up with 2 installed certificates. They're depicted on the picture below.

The certificate we'll use will be "vadmyst-enc".

SslStream gives you the look and feel of a common .NET stream.

So, what are the basic steps to start secure communication with SslStream?
Very often the communication happens between server (e.g web server) and client (e.g. browser).
Here are the steps for the server side:

  • Start listening on specific address and port

  • When connection is accepted wrap obtained NetworkStream with SslStream

  • Call SslStream::AuthenticateAsServer

  • Start doing I/O (in our case that's basic echo server

In code it looks like this:
TcpListener listener = new TcpListener(ipEndpoint);
listener.Start(5);
TcpClient tcpClient = listener.AcceptTcpClient();

SslStream secureStream = new SslStream(tcpClient.GetStream(), false);

secureStream.AuthenticateAsServer(serverCertificate);
//use anonymous delegate for simplicity
ThreadPool.QueueUserWorkItem(new WaitCallback(delegate(object unused)
{
//simple echo server
byte[] tempBuffer = new byte[1024];
int read = 0;
try
{
while ((read = secureStream.Read(tempBuffer, 0, tempBuffer.Length)) > 0)
{
secureStream.Write(tempBuffer, 0, read);
}
}
finally
{
secureStream.Close();
}
}), null);
serverCertificate is obtained from certificate storage on the local machine:
X509Store store = new X509Store(StoreName.My, StoreLocation.LocalMachine);
store.Open(OpenFlags.ReadOnly);
X509Certificate serverCertificate = null;
for (int i = 0; i < store.Certificates.Count; i++)
{
serverCertificate = store.Certificates[i];
if (serverCertificate.Subject.Contains("vadmyst-enc"))
break;
}
store.Close();
In this post I'll will not cover usage of client certificates to perform client authentication for the simplicity's sake. Client will only authenticate server.
The steps required by the client:
  • Open TCP connection to the remote server

  • Wrap obtained NetworkStream with SslStream instance

  • Call SslStream::AuthenticateAsClient

  • Begin do the I/O

Source code below demonstrates basic TCP client that transfers data in a secure way.
TcpClient client = new TcpClient();
client.Connect(endPoint);
SslStream sslStream = new SslStream(client.GetStream(), false);
sslStream.AuthenticateAsClient("vadmyst-enc");

byte[] plaintext = new byte[5*1024 + 35];
byte[] validation = new byte[plaintext.Length];

RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
rng.GetNonZeroBytes(plaintext);

sslStream.Write(plaintext);
sslStream.Flush();

int read = 0;
int totalRead = 0;
while( (read = sslStream.Read(validation, totalRead,
validation.Length - totalRead)) > 0)
{
totalRead += read;
if (totalRead == plaintext.Length)
break; //we've received all sent data
}
//check received data
for(int i=0; i < plaintext.Length; i++)
{
if ( validation[i] != plaintext[i] )
throw new InvalidDataException("Data is not the same");
}
sslStream.Close();

SslStream appeared in .NET framework on version 2.0. As you can see doing secure communications with it is very easy. However, there are number of situations that require additional coding: client authentication using client certificates, using other algorithms when doing secure I/O. I will cover these advanced topics on the next posts. Stay tuned!

Tuesday, June 17, 2008

Hashing in .NET (cryptography related battle tactics)




Those who think I'm going to talk about stuff related to hashish or hash brown are totally not right. (By the way I do like hash brown as well as this great Japanese liquor ;) )

I will be talking about hashing that is related to cryptography and security. Hashing can be described as a process of getting small digital "fingerprint" from any kind of data. Those interested in general information can get it here.

In .NET, security and cryptography related stuff is located in System.Security.Cryptography namespace. Our hero of the day will be SHA1 algorithm. .NET class SHA1Managed implements it. According to .NET cryptography model this class implements abstract class SHA1. The same, by the way, is valid for other hash algorithms e.g. MD5. They both inherit from HashAlgorithm class. It is very likely if new hashing algorithm is added to the .NET framework it will inherit from HashAlgorithm.

There are three ways how to calculate hash value for some data.

  1. Use ComputeHash method

  2. Use TransformBlock/TransformFinalBlock directly

  3. Use CryptoStream

I'll show how to use above mentioned approaches. Let's assume we have some application data

RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
byte[] data = new byte[5 * 4096 + 320];
//fill data array with arbitrary data
rng.GetNonZeroBytes(data);
//initialize HashAlgorithm instance
SHA1Managed sha = new SHA1Managed();

The first way:

byte[] hash1 = sha.ComputeHash(data);

It very straightforward and simple: pass data and get hash output. But this method is not suitable when hash has to be calculated for several byte arrays or when data size is very large (calculate hash value for the binary file).
This leads us to the second way:

int offset = 0;
int blockSize = 64;
//reset algorithm internal state
sha.Initialize();
while (data.Length - offset >= blockSize)
{
offset += sha.TransformBlock(data, offset, blockSize,
data, offset);
}
sha.TransformFinalBlock(data, offset, data.Length - offset);
//get calculated hash value
byte[] hash2 = sha.Hash;

This way is much more flexible because: we can reuse HashAlgorithm instance (using Initialize method) and calculate hash value for large data objects.
However, to do that we still have to write additional code to read chunks from file and then pass them to TransformBlock method.

Finally, the third way:

//reuse SHA1Managed instance
sha.Initialize();
MemoryStream memoryStream = new MemoryStream(data, false);
CryptoStream cryptoStream = new CryptoStream(memoryStream, sha,
CryptoStreamMode.Read);
//temporary array used by the CryptoStream to store
//data chunk for which hash calculation was performed
byte[] temp = new byte[16];
while (cryptoStream.Read(temp, 0, temp.Length) > 0) { }
cryptoStream.Close();
hash3 = sha.Hash;

Isn't it beautiful? CryptoStream can use any Stream object to read from. Thus calculating hash value for a large file isn't a problem - just pass FileStream to CryptoStream constructor!
Under the hood CryptoStream uses TransformBlock/TransformFinalBlock, so the third way is derivative from the second one.
CryptoStream links data streams to cryptographic transformations: it can be chained together with any objects that implement Stream, so the streamed output from one object can be fed into the input of another object.

The first approach is good when you're calculating hash values from time to time.
The second and third are best when large part of your application's operation is connected with hash calculations (like using cryptography in network I/O).

Wednesday, May 28, 2008

Seeking CSS Enlightenment

Nowadays it is hard to find software developer or web-designer that doesn't know what CSS is all about.

I must say that until recently, I didn't realize what potential CSS has. I found a place in the web that helps people become Enlightened with CSS power.

Please, welcome to the Zen Garden.

In the above mentioned CSS garden the layout of the same markup or content is changed via different styles applied.

Tuesday, May 06, 2008

Sample code for TCP server using completion ports

As I have promised in my previous post I'm presenting sample code of TCP server that is receiving variable length messages in specific format. Data transfer protocol implies that network messages consist of prefix holding body size and body part.

At first, code defines application state object

/// 
/// Server state holds current state of the client socket
///

class AsyncServerState
{
public byte[] Buffer = new byte[512]; //buffer for network i/o
public int DataSize = 0; //data size to be received by the server

//flag that indicates whether prefix was received
public bool DataSizeReceived = false;

public MemoryStream Data = new MemoryStream(); //place where data is stored
public SocketAsyncEventArgs ReadEventArgs = new SocketAsyncEventArgs();
public Socket Client;
}

To preserve application state between async operations SocketAsyncEventArgs.UserToken is used.
/// 
/// Async server sample, demonstates usage of XxxAsync methods
///

class AsyncServer
{
Socket listeningSocket;
List messages = new List();
const int PrefixSize = 4;

SocketAsyncEventArgs acceptEvtArgs;

public AsyncServer()
{
this.listeningSocket = new Socket(AddressFamily.InterNetwork,
SocketType.Stream, ProtocolType.Tcp);
this.acceptEvtArgs = new SocketAsyncEventArgs();
}

public void Start(IPEndPoint listeningAddress)
{
acceptEvtArgs.Completed += new EventHandler(
Accept_Completed);

listeningSocket.Bind(listeningAddress);
listeningSocket.Listen(1);

ProcessAccept(acceptEvtArgs);
}

///
/// Accept completion handler
///

void Accept_Completed(object sender, SocketAsyncEventArgs e)
{
if (e.SocketError == SocketError.Success)
{
Socket client = e.AcceptSocket;
AsyncServerState state = new AsyncServerState();
state.ReadEventArgs.AcceptSocket = client;
state.ReadEventArgs.Completed += new EventHandler(
IO_Completed);
state.ReadEventArgs.UserToken = state;
state.Client = client;
state.ReadEventArgs.SetBuffer(state.Buffer, 0, state.Buffer.Length);

if (!client.ReceiveAsync(state.ReadEventArgs))
{ //call completed synchonously
ProcessReceive(state.ReadEventArgs);
}
}
ProcessAccept(e);
}

private void ProcessAccept(SocketAsyncEventArgs e)
{
e.AcceptSocket = null;
if (!listeningSocket.AcceptAsync(acceptEvtArgs))
{ //operation completed synchronously
Accept_Completed(null, acceptEvtArgs);
}
}

///
/// Genereic I/O completion handler
///

void IO_Completed(object sender, SocketAsyncEventArgs e)
{
switch (e.LastOperation)
{
case SocketAsyncOperation.Receive:
ProcessReceive(e);
break;
case SocketAsyncOperation.Send:
ProcessSend(e);
break;
default:
throw new NotImplementedException("The code will "
+"handle only receive and send operations");
}
}

///
/// In future will process server send operations
///

private void ProcessSend(SocketAsyncEventArgs e) { }

///
/// Implements server receive logic
///

private void ProcessReceive(SocketAsyncEventArgs e)
{
//single message can be received using several receive operation
AsyncServerState state = e.UserToken as AsyncServerState;

if (e.BytesTransferred <= 0 || e.SocketError != SocketError.Success) { CloseConnection(e); } int dataRead = e.BytesTransferred; int dataOffset = 0; int restOfData = 0; while (dataRead > 0)
{
if (!state.DataSizeReceived)
{
//there is already some data in the buffer
if (state.Data.Length > 0)
{
restOfData = PrefixSize - (int)state.Data.Length;
state.Data.Write(state.Buffer, dataOffset, restOfData);
dataRead -= restOfData;
dataOffset += restOfData;
}
else if (dataRead >= PrefixSize)
{ //store whole data size prefix
state.Data.Write(state.Buffer, dataOffset, PrefixSize);
dataRead -= PrefixSize;
dataOffset += PrefixSize;
}
else
{ // store only part of the size prefix
state.Data.Write(state.Buffer, dataOffset, dataRead);
dataOffset += dataRead;
dataRead = 0;
}

if (state.Data.Length == PrefixSize)
{ //we received data size prefix
state.DataSize = BitConverter.ToInt32(state.Data.GetBuffer(), 0);
state.DataSizeReceived = true;

state.Data.Position = 0;
state.Data.SetLength(0);
}
else
{ //we received just part of the headers information
//issue another read
if (!state.Client.ReceiveAsync(state.ReadEventArgs))
ProcessReceive(state.ReadEventArgs);
return;
}
}

//at this point we know the size of the pending data
if ((state.Data.Length + dataRead) >= state.DataSize)
{ //we have all the data for this message

restOfData = state.DataSize - (int)state.Data.Length;

state.Data.Write(state.Buffer, dataOffset, restOfData);
Console.WriteLine("Data message received. Size: {0}",
state.DataSize);

dataOffset += restOfData;
dataRead -= restOfData;

state.Data.SetLength(0);
state.Data.Position = 0;
state.DataSizeReceived = false;
state.DataSize = 0;

if (dataRead == 0)
{
if (!state.Client.ReceiveAsync(state.ReadEventArgs))
ProcessReceive(state.ReadEventArgs);
return;
}
else
continue;
}
else
{ //there is still data pending, store what we've
//received and issue another BeginReceive
state.Data.Write(state.Buffer, dataOffset, dataRead);

if (!state.Client.ReceiveAsync(state.ReadEventArgs))
ProcessReceive(state.ReadEventArgs);

dataRead = 0;
}
}
}

private void CloseConnection(SocketAsyncEventArgs e)
{
AsyncServerState state = e.UserToken as AsyncServerState;

try
{
state.Client.Shutdown(SocketShutdown.Send);
}
catch (Exception) { }

state.Client.Close();
}
}

Code sample above gives basic idea how completion ports asynchronous pattern can be used in TCP server development.

High peformance TCP server using completion ports

Completion ports were first introduced in Windows NT 4.0. This technology makes simultaneous asynchronous I/O possible and extremely effective. When building high performance network software one has to think of effective threading model. Having too many or too little server threads in the system can result in poor server performance.


The goal of a server is to incur as few context switches as possible by having its threads avoid unnecessary blocking, while at the same time maximizing parallelism by using multiple threads. The ideal is for there to be a thread actively servicing a client request on every processor and for those threads not to block if there are additional requests waiting when they complete a request. For this to work correctly however, there must be a way for the application to activate another thread when one processing a client request blocks on I/O (like when it reads from a file as part of the processing). (read more...)

Smart reader may admit that I/O completion ports (IOCP) are not directly available in .NET. Well, that was pure truth until SP1 of .NET 2.0. From .NET 2.0 SP1 this marvelous technology can be accessed using following Socket class methods:
  • AcceptAsync
  • ConnectAsync
  • DisconnectAsync
  • ReceiveAsync
  • SendAsync
  • and other XxxAsync methods
In one of my previous posts about receiving variable length messages I used BeginXXX/EndXXX asynchronous approach. Main drawback of this approach is presence of the repeated allocation and synchronization of objects during high-volume asynchronous socket I/O. That is because BeginXXX/EndXXX design pattern currently implemented by the System.Net.Sockets.Socket class requires a System.IAsyncResult object be allocated for each asynchronous socket operation.

Completion port approach on the other hands avoids the above mentioned problems altogether. Asynchronous operations are described by instances of SocketAsyncEventArgs. These objects can be reused by the application, more over, application can create as many SocketAsyncEventArgs objects that it needs to perform well under sustainable load.

The pattern for performing an asynchronous socket operation with this class consists of the following steps:
  1. Allocate a new SocketAsyncEventArgs context object, or get a free one from an application defined pool
  2. Set properties on the context object to the operation about to be performed (the completion callback method, the data buffer, the offset into the buffer, and the maximum amount of data to transfer, for example).
  3. Call the appropriate socket method (XxxAsync) to initiate the asynchronous operation
  4. If the asynchronous socket method (XxxAsync) returns true, in the callback, query the context properties for completion status
  5. If the asynchronous socket method (XxxAsync) returns false, the operation completed synchronously. The context properties may be queried for the operation result
  6. Reuse the context for another operation, put it back in the pool, or discard it
In How to Transfer Fixed Sized Data With Async Sockets I've presented server code that uses BeginXXX/EndXXX model for data receive handling. In the next post I'll show how that code can be rewritten to use IOCP server model.
Sample code of IOCP based TCP server

Sunday, April 20, 2008

What is peeking in TCP and why should it be avoided

Typical network receive operation via sockets looks like this:

socket.Receive(userBuffer, userBuffer.Length, SocketFlags.None);

In the above code received data is moved from Winsock internal buffers into the buffer specified by the user. Next time when socket.Receive is called another data chunk from TCP stream will be moved into the userBuffer.



On the other hand there is a receive method that is not moving data from system to user buffer but instead copies it. This process is called peeking.


Following code is peeking data from system buffer:
socket.Receive(userBuffer, userBuffer.Length, SocketFlags.Peek);

Every time Receive is called it copies the very same data from internal (system) buffer into user buffer.

One may ask why do wee need this peeking at all and why it is present in sockets API?

Peeking was introduced to preserve compatibility with Unix BSD sockets.
So, where one can use peeking? The answer is nowhere, do not use it at all when doing network I/O. Peeking is very inefficient and must be avoided.

Monday, April 14, 2008

Change socket send and receive timeout

Sometimes synchronous Socket I/O methods like Send or Receive take too long to complete or produce an error (exception). This default behavior can be changed using socket options. Namely, SocketOptionName.SendTimeout and SocketOptionName.ReceiveTimeout.

Here's code sample how to get/set these socket options:


//get socket receive timeout
int timeout = (int)socket.GetSocketOption(SocketOptionLevel.Socket,
SocketOptionName.ReceiveTimeout);

//set socket receive timeout
int timeout = 0;
socket.SetSocketOption(SocketOptionLevel.Socket,
SocketOptionName.ReceiveTimeout, timeout);

Nearly the same code is for SendTimeout socket option

//get socket send timeout
int timeout = (int)socket.GetSocketOption(SocketOptionLevel.Socket,
SocketOptionName.SendTimeout);

//set socket send send timeout
int timeout = 0;
socket.SetSocketOption(SocketOptionLevel.Socket,
SocketOptionName.SendTimeout, timeout);


When you do not use synchronous I/O but instead use BeginXXX/EndXXX or non-blocking sockets then you can let the system handle long lasting I/O. You no longer need to handle timeouts, since your code is not waiting for the I/O to complete.

If you still want to wait for asynchronous operation to complete you can use AsyncWaitHandle property of the IAsyncResult interface.

IAsyncResult result = socket.BeginReceive(/*parameters come here*/);
//here we can wait for 5 minutes for completion of receive operation
result.AsyncWaitHandle.WaitOne(5*60*1000, false);

Thursday, April 10, 2008

Proper way to close TCP socket

This question arises very often in the developers communities.
For simplification I'll talk about synchronous sockets.

Generally, the procedure is like this:

  1. Finish sending data

  2. Call Socket.Shutdown with SocketShutdown.Send parameter

  3. Loop on Receive until it returns 0 or fails with an exception

  4. Call Close()

Here's a small sample in pseudo code that is very similar to C# :)

void CloseConnection(Socket socket)
{
socket.Send(/*last data of the connection*/);
socket.Shutdown(SocketShutdown.Send);

try
{
int read = 0;
while( (read = socket.Receive(/*application data buffers*/)) > 0 )
{}
}
catch
{
//ignore
}
socket.Close();
}

If first and third steps are skipped - data loss can happen.

Things become more complicated when using asynchronous sockets.
To prevent data loss while closing connection: termination logic can be added to the data exchange protocol.
For example, it can be some kind of termination message (depends on data protocol). When peer receives message of this kind, it can proceed with connection termination logic described above.

Another way is to put socket in a blocking mode (if socket was in non blocking mode) and close the connection in the way described above.

When designing network application one has to think also about its connection closing procedure. The purpose of proper connection close is prevention of data loss.

Sunday, March 23, 2008

Part 2: How to Transfer Variable Length Messages With Async Sockets

In my previous post about transferring data in the async manner. I was talking about designing a small data exchange protocol to transfer a message over the network. All I/O was done in the async manner using BeginXXX/EndXXX pattern. Code on that post was handling single message only. However, in the real world it rarely happens that only one message is being transferred over single connection. It is more common to expect that several messages can be received by the peer.



Data exchange protocol contains messages prefixed by the size. Size prefix has fixed length.

Prefixing data with its size is the corner stone of the simple data transfer protocol introduced in the previous example. There is no problem transferring multiple messages over single connection. Remote peer should have no problem distinguishing separate messages from the data stream.
This post will provide code sample how to read multiple messages from the network.

What to expect when multiple messages arrive at the server?
While dealing with multiple messages one has to remember that receive operation can return arbitrary number of bytes being read from the net. Typically that size is from 0 to specified buffer length in the Receive or BeginReceive methods.

Our data exchange format is illustrated on the image above.
Peer code after receiving number of bytes should be able to answer what part of the message it has just received. Is it part of the size prefix or it is a message body?
Sometimes, several messages can be received at one Receive call.

Let's see what situations we can encounter while processing incoming data:
- received data contain only data size prefix
- received data contain part of the data size prefix
- received data contain prefix and part of the data
- received data contain prefix, message data and part of the prefix of the next message
- received data contain prefix, message, prefix of the next message and part of its body.



When developing data processing code one has to expect the above illustrated scenarios will happen.

Here's the server code that handles conditions described above. I present here only server callback function. Client sending code can be obtained from the previous post .


private void ServerReadCallback(IAsyncResult ar)
{
try
{
ServerState state = (ServerState)ar.AsyncState;
Socket client = state.Client;
SocketError socketError;

int dataRead = client.EndReceive(ar, out socketError);
int dataOffset = 0; //to simplify logic
int restOfData = 0;

if (socketError != SocketError.Success)
{
client.Close();
return;
}

if (dataRead <= 0)
{
client.Close();
return;
}

while (dataRead > 0)
{
//check to determine what income data contain: size prefix or message
if (!state.DataSizeReceived)
{
//there is already some data in the buffer
if (state.Data.Length > 0)
{
restOfData = PrefixSize - (int)state.Data.Length;
state.Data.Write(state.Buffer, dataOffset, restOfData);
dataRead -= restOfData;
dataOffset += restOfData;
}
else if (dataRead >= PrefixSize)
{ //store whole data size prefix
state.Data.Write(state.Buffer, dataOffset, PrefixSize);
dataRead -= PrefixSize;
dataOffset += PrefixSize;
}
else
{ // store only part of the size prefix
state.Data.Write(state.Buffer, dataOffset, dataRead);
dataOffset += dataRead;
dataRead = 0;
}

if (state.Data.Length == PrefixSize )
{ //we received data size prefix
state.DataSize = BitConverter.ToInt32(state.Data.GetBuffer(), 0);
state.DataSizeReceived = true;
//reset internal data stream
state.Data.Position = 0;
state.Data.SetLength(0);
}
else
{ //we received just part of the prefix information
//issue another read
client.BeginReceive(state.Buffer, 0, state.Buffer.Length,
SocketFlags.None, new AsyncCallback(ServerReadCallback),
state);
return;
}
}

//at this point we know the size of the pending data
if ((state.Data.Length + dataRead) >= state.DataSize)
{ //we have all the data for this message

restOfData = state.DataSize - (int)state.Data.Length;

state.Data.Write(state.Buffer, dataOffset, restOfData);
Console.WriteLine("Data message received. Size: {0}",
state.DataSize);

//store received messages
//lock(messages)
// messages.Add(state.Data.ToArray());

dataOffset += restOfData;
dataRead -= restOfData;

//message received - cleanup internal memory stream
state.Data.SetLength(0);
state.Data.Position = 0;
state.DataSizeReceived = false;
state.DataSize = 0;

if (dataRead == 0)
{ //no more data remaining to process - issue another receive
client.BeginReceive(state.Buffer, 0, state.Buffer.Length,
SocketFlags.None, new AsyncCallback(ServerReadCallback),
state);
return;
}
else
continue; //there's still some data to process in the buffers
}
else
{ //there is still data pending, store what we've
//received and issue another BeginReceive
state.Data.Write(state.Buffer, dataOffset, dataRead);

client.BeginReceive(state.Buffer, 0, state.Buffer.Length,
SocketFlags.None, new AsyncCallback(ServerReadCallback), state);

dataRead = 0;
}
}
}
catch (Exception ex)
{
Console.WriteLine(ex.ToString());
}
}

Wednesday, February 20, 2008

How To Go Slow: Summing Arrays



In this entry I'll talk about thrashing when iterating over complex arrays.

Consider a square array with the size of N = 10000;
Now, consider the code that summs elements of the square array

for (int row = 0; row < N;, ++row)
for (int col = 0; col < N; ++col)
sum += A[row, col];

Or on the other hand:
for (int col = 0; col < N; ++col)
for (int row = 0; row < N; ++row)
sum += A[row, col];


How do you think is there any difference between these two approaches?
The answer is yes - difference is quite noticeable... in terms of performance.

First approach takes about 1 second to complete, while the second - nearly 14 seconds! How can this be you may ask?

The answer is memory layout, caching and thrashing.

In .NET much like in C++ arrays are stored row-wise in contiguous memory. So, if you will access array rows first you will access contiguous memory. That means that the next data item you need will be be in pipeline, cache, RAM, and the next hard drive sector before you need it will be in the cache.

But if you go through columns first then you will be repeatedly reading just one item from each row before reading from the next row. As a result your system's caching mechanism and lookahead will fail to give you recent items, and you will waste a lot of time waiting for RAM, which is about three to five times slower than cache. It can get even worse, you may end up waiting for the hard drive. HDD can be millions of times slower than RAM if accessed in large random jumps.

The moral: the less sequential your data access, the slower your program will run :)

Friday, February 01, 2008

When string.ToLower() is Evil


Did you know how evil string.ToLower() can sometimes be?

Let me explain...

Very often I see code similar to this:

void DoBadAction (string val)
{
if (val.ToLower() == "someValue")
{ //do something
}
}

The code above can lead up to 4 times in performance loss when doing string comparison operations.

Best method to do such kind of case insensitive comparison is using string.Equals(...) method.

void DoGoodAction(string val)
{
if (val.Equals("someValue", StringComparison.OrdinalIgnoreCase))
{ //do something
}
}

Why is it so? The reason lies in the string type peculiarity - it is immutable.

Since it is an immutable - string.ToLower() will always return new string instance. Thus generating extra instance of string on every ToLower() call.

Detailed information about string.Equals with StringComparison enumeration can be found here.
Other performance related tips and tricks can be found here.

Wednesday, January 23, 2008

How to Transfer Fixed Sized Data With Async Sockets

This post was inspired by the discussion in MDSN forums.

Discussion on that forum showed that there is misunderstanding of the principles of data transfer across the network, especially in asynchronous way.

I wrote simple client/server application that demonstrates usage of asynchronous sockets to transfer fixed sized data from client to server.

Usually data exchange between server and client is done in some special way (or special order). Such an order is called communication protocol.
More information on network and communication protocols can be found here and here

The sample here also uses communication protocol. It is very simple, client prefixes the data with 4 bytes that hold data size. Format of the data that will go to the wire can be shown like this [4 bytes - data size][data - data size].

Here comes implementation with short comments. Asynchronous communication via sockets involves methods like BeginReceive/EndReceive - for data receiving and BeginSend/EndSend for sending. The sample demonstrates only how async sockets work and the fact that data is received in the stream like way.
More information about network programming can be found here


Server

At first we define the state that will be passed between async calls. State is defined as following:


///
/// Server state holds current state of the client socket
///

class ServerState
{
public byte[] Buffer = new byte[512]; //buffer for network i/o
public int DataSize = 0; //data size to be received by the server
public bool DataSizeReceived = false; //whether prefix was received
public MemoryStream Data = new MemoryStream(); //place where data is stored
public Socket Client; //client socket
}

Server is listening for the clients using Socket.Accept method.
Here is the server listening loop:

//server listening loop
while (true)
{
Socket client = serverSocket.Accept();
ServerState state = new ServerState();

state.Client = client;
//issue first receive
client.BeginReceive(state.Buffer, 0, state.Buffer.Length, SocketFlags.None,
new AsyncCallback(ServerReadCallback), state);
}

When data comes to the client socket ServerReadCallback is called. There server can read received data and process it.

Here is the implementation of that method:

private void ServerReadCallback(IAsyncResult
{
ServerState state = (ServerState)ar.AsyncState;
Socket client = state.Client;
SocketError socketError;

int dataRead = client.EndReceive(ar, out socketError);
int dataOffset = 0; //to simplify logic

if (socketError != SocketError.Success)
{
client.Close();
return;
}

if (dataRead <= 0)
{ //connection reset
client.Close();
return;
}

if (!state.DataSizeReceived)
{
if (dataRead >= 4)
{ //we received data size prefix
state.DataSize = BitConverter.ToInt32(state.Buffer, 0);
state.DataSizeReceived = true;
dataRead -= 4;
dataOffset += 4;
}
}

if ((state.Data.Length + dataRead) == state.DataSize)
{ //we have all the data
state.Data.Write(state.Buffer, dataOffset, dataRead);

Console.WriteLine("Data received. Size: {0}", state.DataSize);

client.Close();
return;
}
else
{ //there is still data pending, store what we've
//received and issue another BeginReceive
state.Data.Write(state.Buffer, dataOffset, dataRead);

client.BeginReceive(state.Buffer, 0, state.Buffer.Length,
SocketFlags.None, new AsyncCallback(ServerReadCallback), state);
}
}

As you can see server implementation is very simple. It receives data size prefix at first and if it is complete (first 4 bytes are received) then proceeds with data receive.

It is very important to always check the number of received bytes. This number can vary.


Client

Client implementation is pretty straightforward. It also has a state that is passed along async operations.

public class ClientState
{
public Socket Client; //client socket
public byte[] DataToSend; //data to be trasferred
public int DataSent = 0; //data already sent
}

The data being sent is prefixed with its size.

ClientState state = new ClientState();
state.Client = socket;

//add prefix to data
state.DataToSend = new byte[data.Length + 4];
byte[] prefix = BitConverter.GetBytes(data.Length);
//copy data size prefix
Buffer.BlockCopy(prefix, 0, state.DataToSend, 0, prefix.Length);
//copy the data
Buffer.BlockCopy(data, 0, state.DataToSend, prefix.Length, data.Length);

socket.BeginSend(state.DataToSend, 0, state.DataToSend.Length,
SocketFlags.None, new AsyncCallback(ClientSendCallback), state);

And finally the implementation of ClientSendCallback

private void ClientSendCallback(IAsyncResult ar)
{
ClientState state = (ClientState)ar.AsyncState;
SocketError socketError;
int sentData = state.Client.EndSend(ar, out socketError);

if ( socketError != SocketError.Success )
{
state.Client.Close();
return;
}

state.DataSent += sentData;

if (state.DataSent != state.DataToSend.Length)
{ //not all data was sent
state.Client.BeginSend(state.DataToSend, state.DataSent,
state.DataToSend.Length - state.DataSent, SocketFlags.None,
new AsyncCallback(ClientSendCallback), state);
}
else
{ //all data was sent
Console.WriteLine("All data was sent. Size: {0}",
state.DataToSend.Length);
state.Client.Close();
}
}

As you can see implementation details are very simple.
There are several important things here:
  • Data is received in pieces, you can't predict how much data you will receive during one method call
  • When using async sockets always pass the same state object in the context of the same client

Update
The code sample above illustrates the idea of tranferring size-prefixed data between client and server in the async way. Please, note that only one message per connection is passed from client to server. To add multiple message handling - server code has to be modified according to your custom the protocol.

Saturday, January 19, 2008

Unmanaged Debugging Option Very Useful in Visual Studio .NET


When developing managed code that is interacting with unmanaged world one has to be very careful, especially when working with unmanaged memory.

Here's an example how critical memory bug can be omitted.

I was developing code that used Marshal.AllocHGlobal method to allocate unmanaged memory. Naturally I was freeing that memory after usage.

The code looked like
IntPtr unmanagedMemory = IntPtr.Zero;
try
{
unmanagedMemory = Marshal.AllocaHGlobal(nBytes);
//invoking unmanaged method here via P/Invoke
}
finally
{
Marshal.FreeHGlobal(unmanagedMemory);
}

Everything was okay; application was working perfectly, except occasional crashes with AccessViolationException. Exception of this type can happen when something is writing into wrong memory offset and system reacts by throwing an exception.

Exception wasn't thrown every time application was running and it was hard to detect what was causing it. At the top of callstack was ntdll.dll module.

I decided to turn on unmanaged debugging and see what is happening under the hood in the unmanaged world.
(To turn unmanaged debugging in Visual Studio you need to go to Project Properties -> Debug -> Check "enable unmanaged code debugging").

After turning on unmanaged debugging I've got an exception immediately.


Marshal.FreeHGlobal(unmanagedMemory) was throwing exception saying that there was heap corruption. This exception was repeating constantly.

Finally, the bug was found - unmanaged function was not behaving well with the pointer passed to it.

Every time you're developing managed code that is interacting with unmanaged memory it is highly desirable to turn on unmanaged debugging. This will save a lot of time when tracking memory related issues and exceptions. Also a lot of information about debugging .NET applications can be found in .NET Debugging