Monday, December 17, 2007

Effective SQL Mindset: Cursors VS Joins



In the past I've dealt a lot with cursors. My experience was mostly negative. SQL code written using them was terribly slow. Thus our team was transforming SQL code with cursors into code without them.

Sure, cursors have their advantages. But, after conversion SQL query performance can be much better then with cursors.

Now, I'll show you how to accomplish the task using SQL with and without cursors.
There are two tables: regions and values. For example, regions represents departments while values is used to store arbitrary int data about the specific region.

SQL tables:
create table regions (id int, name varchar(10))
create table [values] (id int identity(1, 1), regid int, value int)


Table regions contains records for 100 regions, while values contains data only for region with id = 1 (only for the first region). We need to populate values table using data from the first region.

For exaple if table regions contains data





idname
1Reg 1
2Reg 2

And table values contains data only for first region




idReg Idvalue
1130
2135


So we need to populate table values for every region from regions table.

There are three ways how to do this:
  • write a computer program that will insert values into values table
  • write SQL query with cursors
  • write SQL query without cursors
I will not touch how to write a program to do this, its too ineffective.

But last two options seem to be interesting. Cursor solution is quick and dirty - you can code it almost in no time. Cursor code will be similar to the computer program's one.

select id,regionid,value into #Valuestmp from [values]

DECLARE RegionsCursor CURSOR FOR SELECT id FROM regions
declare @id int

OPEN RegionsCursor;
FETCH NEXT FROM RegionsCursor into @id;
WHILE @@FETCH_STATUS = 0
BEGIN

UPDATE #Valuestmp SET RegionId=@id
INSERT INTO [values](regionId, value) SELECT regionId, value FROM #Valuestmp
FETCH NEXT FROM RegionsCursor into @id;
END;
CLOSE RegionsCursor;
DEALLOCATE RegionsCursor;

drop table #Valuestmp
Here we store initial values from values table, and then for each region we insert data into values table.

This elegant script will accomplish the above:
insert into val (regid, value) select r.id as regid, v.value as value from reg r, val v where r.id <> 1

Central point in this script is Cartesian product - we select from two tables (bold font), joining every row of the values table with the rows in regions table.

SQL code above shows how cursor can be avoided. Every time you're tempted to use cursor - stop and think that nearly everything can be done without them. All you have to do is to think little bit.

Thursday, December 06, 2007

Here Comes Another Bubble

Some people say that there won't be .com like bubble any more. However, there are signs out there that indicate the opposite :)



Thanks Alena C++ for the link

Sunday, November 11, 2007

Did you know that...


This post will start the series of "did you know that..." posts. Where I'll try to show things that may be useful for developers and other people connected with IT.

Did you know that "Internal .Net Framework Data Provider error 6" can happen when you use the SQL Native Client data provider to connect to an instance of SQL Server 2005 that is configured to use database mirroring. The solution to this situation is here.

or...

Did you know that "the CPU on a computer that is running the NET Framework 2.0 consumes excessive system resources when you marshal an object array across a domain boundary"?
The solution provided doesn't result in installing hotfixes. But it can force the developer to redesign the app or sacrifice its performance. Both variants are bad :(.

Wednesday, October 24, 2007

Microsoft Announces Mobile Device Manager 2008

Recently Microsoft announced Mobile Device Manager 2008

What they are saying about this new technology:
"Mobile Device Manager 2008 is a solution that gives organizations enhanced on-device security and over-the-air policy enforcement. It allows IT professionals to more easily manage phones within the organization, and gives mobile employees access to confidential information on corporate networks with firewalls."

That means mobile devices can be integrated into corporate IT infrastructure and become part of it. IT personal will be able to control mobile devices and enforce corporate security policies on them.

At least one software company announced support for the Mobile Device Manager 2008.

How this new technology can help software development?

Generally device in the field is connecting to the enterprise network through the Internet. In this case device is initiating connection and is playing the role of client.

With this new technology it is believed that devices will become part of the enterprise network and internal applications will be able to "talk" to devices via secure channel transparently.

Thursday, October 18, 2007

Strategy Pattern in C# 2.0

Strategy pattern is very convenient when it comes to algorithms selection. The selection is possible because algorithm is encapsulated in single class, so it is easier to handle multiple algorithms.

There is a C# sample of Strategy pattern on the Wikipedia. Strategies there are passed as objects into the context. With the emergence of generics in the .NET 2.0 it is possible to specify strategy as a type parameter.

In this post I'll show sample list class "SmartList" that can be tuned with different allocation strategies. Such list can be used when developer needs full control on how memory is consumed by the list. For simplicity, there is no error handling and list class has limited functionality.

So, at first we define our algorithm's interface or contract.


interface IAllocationSrategy<T>
{
T[] Allocate(T[] data)
;
}


Then we define two allocation strategies: DoubleAllocator and FixedAllocator.
DoubleAllocator when doing allocation doubles the initial memory count, while FixedAllocator makes constant increment.


class DoubleAllocator<T> : IAllocationSrategy<T>
{
#region IAllocationSrategy Members

public T[] Allocate(T[] data)
{
int length = data.Length * 2;
T[] newData = new T[length];
Buffer.BlockCopy(data, 0, newData, 0, data.Length);
return
newData;
}

#endregion
}

class FixedAllocator<T> : IAllocationSrategy<T>
{
int fixedIncrement = 20;
#region
IAllocationSrategy Members

public T[] Allocate(T[] data)
{
int length = data.Length + fixedIncrement;
T[] newData = new T[length];
Buffer.BlockCopy(data, 0, newData, 0, data.Length);
return
newData;
}

#endregion
}

SmartList provides list functionality. Items can be added to the list and obtained by index.


class SmartList<A, T> where A : IAllocationSrategy<T>, new()
{
A allocator
= new A();

T[] innerList;
int
size = 0;

public
SmartList() : this(1)
{ }

public SmartList(int? capacity)
{
innerList
= new T[capacity ?? 1];
}

public void Add(T item)
{
if (size == innerList.Length)
innerList
= allocator.Allocate(innerList);
innerList[size++] = item;
}

public T this[int index]
{
get { return innerList[index]; }
}

public int Size
{
get { return size; }
}

public int Capacity
{
get { return innerList.Length; }
}
}



And finally, sample code that uses SmartList class with FixedAllocator and int type as item type.


class Sample
{
public Sample()
{
SmartList<FixedAllocator<
int>, int> smartList =
new
SmartList<FixedAllocator<int>, int>();

smartList.Add(1);
smartList.Add(2);
smartList.Add(3);

Console.WriteLine(smartList.Capacity);
}
}

Thursday, October 04, 2007

String Splitting and Tokenization Techniques in .NET

At start I'll give the explanation what token and tokenization is.

According to wikipedia article: "A token is a categorized block of text, usually consisting of indivisible characters known as lexemes. A lexical analyzer processes lexemes to categorize them according to function, giving them meaning. This assignment of meaning is known as tokenization. A token can look like anything: English, gibberish symbols, anything; it just needs to be a useful part of the structured text."

In .NET string class has a Split method. It returns an array of strings that are substrings delimited by a specified separator. Split method is convenient, however somewhat inefficient. Let us consider an example.

Input string is "59;21;67;72;111". String.Split(';') will return an array of five strings. But what will happen if we need to find specific token, say "21". String.Split will return an array, and then we'll have to iterate through it and search for "21". Do you notice pitfall here? No?

String.Spilt inefficiency comes from its API contract - it returns an array of string, every item of which consumes memory. But we're searching only for "21", we do not need other tokens.

This example shows inappropriate scenario for String.Split(...) method.

Other approach for string tokenizing is using special tokenizers that are returning one token at a time. An example of such tokenizer can be StringTokenizer class from Java world.
Sample code in Java:

StringTokenizer st = new StringTokenizer("this is a test");
while (st.hasMoreTokens()) {
System.out.println(st.nextToken());
}

We can see that StringTokenizer is returning one token at a time. It is extremely efficient for token search scenarios as we may not need to iterate through all the tokens thus allocate less memory and perform faster.

In .NET base class library (BCL) there is no such class. But there are 3-d party analogs or ports of Java StringTokenizer.

.NET BCL actually, has very powerful set of classes that can cope with string tokenization scenario easily. These classes "live" in System.Text.RegularExpressions.
Good example of string tokenization using Regex class is provided by TrackerRealm design team in their blog.

So, there are at least three ways in .NET how strings can be split into tokens. There is a rule of thumb here, if algorithm doesn't need all tokens from the string - do not use String.Split.

Sunday, September 23, 2007

Non Recursive Binary Tree Traversal

In my previous post I was talking about drawbacks of recursive approach when traversing tree structures.

In comments on that post, Daemin and w-g mentioned then tail-recursion can be used to avoid function call overhead. Tail-recursion really helps if:
a) language of use supports it
b) your algorithm makes it possible to make a recursive call the last operation in the function

Since, tail-recursion approach is still a recursive one and there is no guarantee that the language of use will support it or the algorithm will allow it - best way not to go into trouble is to make no recursive calls at all.

Here, I'll present non-recursive binary tree traversal. Binary tree is not balanced. To make non-recursive approach possible - tree node was extended with special meta information. It includes reference to the node's parent and a special flag. Flag indicates if node was previously visited during traversal process.

Binary tree node definition in code



class TreeNode
    {
        public int Value;
        public TreeNode Left;
        public TreeNode Right;
        public TreeNode Parent;
        public byte Visited;
 
        public TreeNode(int value)
        {
            Value = value;
        }
    }


Next comes recursive iterator. This method demonstrates recursive approach. It is self-explanatory and very easy to implement. All job is done via recursive call to RecursiveIterator method.


private void RecursiveIterator(TreeNode node)
        {
            Console.Write(node.Value + " ");
            if (node.Left != null)
                RecursiveIterator(node.Left);
 
            if (node.Right != null)
                RecursiveIterator(node.Right);
        }


Non-recursive approach on the other hand uses iteration and additional variables to do tree traversal.


private void NonRecursive()
        {
            TreeNode currNode = root;
            while (true)
            {
                if (currNode == null)
                    break;
 
                if (currNode.Left != null && currNode.Left.Visited != 1)
                    currNode = currNode.Left;
                else if (currNode.Right != null && currNode.Right.Visited != 1)
                    currNode = currNode.Right;
                else if (currNode.Visited == 0)
                {
                    Console.Write(currNode.Value + " ");
                    currNode.Visited = 1;
                }
else
                    currNode = currNode.Parent;
            }
        }

Iterative approach is more complicated then recursive, it uses additional data (Parent reference and Visited flag).

So, generally, when you know that your tree structures won't be very deep recursive approach will do. And in the case, when tree depth is quite big, it is better to consider iterative approach.

Wednesday, September 19, 2007

Why Avoid Recursion For Tree Structures

The subject may seem little bit contradictory. Recursive approach is very convenient when doing search or traversals in a tree structure (e.g. binary tree).

Use of recursion in an algorithm has both advantages and disadvantages. The main advantage is usually simplicity. The main disadvantage is often that the algorithm may require large amounts of memory if the depth of the recursion is very large. High memory consumption is due to large function call number (recursion means that function calls itself multiple times).



Every function call results in thread's stack memory consumption. Default stack size is 1 Mb. So, with high recursion depth there is a risk to run out of memory.

Next time I'll present recursive and non-recursive approaches to binary tree traversals.

Sunday, September 16, 2007

Google Reader Increase Productivity

Nearly a month ago I've stopped using Internet Explorer 7 as my primart RSS reader. The reasons were:
- IE7 is not quite stable, after 4-5 days of uptime it hangs or crashes
- Feed items are bound to one computer
- It consumes CPU when feeds updating.

So, I've exported my IE7 feeds into an OPML file and imported them into Google Reader.
Now my feeds are all in one place, which is good, their update doesn't take my CPU cycles.

P.S. Lifehacker has some tips how to increase your productivity with Google Reader.

Tuesday, September 04, 2007

Most amazing tutorial I've ever read

Recently, I've stumbled upon this cool Ruby tutorial. It is called Poignant Guide to Ruby. It is so well written, I wish other programming languages had such tutorials. Even if you're not engaged in Ruby development this book can be a great reading.

Totally recommended! :)

Friday, January 26, 2007

Creating Critical System Process in .NET

Yestreday, my home computer was infected by a worm - "Win32/Brontok.A". While cleaning it up I detected that I have TWO lsass.exe processes in the task manager. lsass.exe is a system process of the Microsoft Windows security mechanisms. The worm created lsass.exe in the My Documents folder, launched it and was happily operating on my machine.



And here's most interesting fact, when you try to kill lsass.exe process via task manager, you'll receive warning, like in the picture below.

I used Process Exloperer tool to kill that process and desinfect my computer.
However, it was interesting to see that Task Manager checks process name and not some special things about system process ( digital signature? ).

I created simple console application in C#, named it lsass.exe and voila - I have criticall system process :8-)