Rajdeep Kwatra's Blog: May 2007

Thursday, May 24, 2007

Custom string formatting in .Net

Recently, while working on a project, I came across the need to format the phone number supplied as string in the standard U.S. phone format(i.e. 1234567890 should be displayed as (123) 456-7890. The phone number was being fetched as a string from the database. I personally feel, that a better option to operate on strings is to save them in a formatted manner i.e. the way you wish to display them, but this is not always possible.

So how do you custom format the strings in .Net? You got it - you use IFormatProvider. IFormatProvider is an interface that provides you an option to format the value as per your requirements. The sole member contained in this interface is GetFormat(). To provide the custom formatting, you need to make a class that implements IFormatProvider. Apart from this, your class must also implement ICustomFormatter. This is the interface that actually handles the custom logic of formatting the supplied value. ICustomFormatter contains a single method named Format() which accepts the format in which value should be formatted, the value that is to be formatted and an instance of IFormatProvider(i.e your custom class that implements IFormatProvider).

Lets get down to the code and see what it need to do a custom formatting:-

Our requirement: We have a string that contains the phone number. The phone number is a standard 10 digit number. However, it may or may not have a country code attached. If the country code is present, the format will be '+' followed by country code, a space and then the 10 digit phone number. So, example of valid values are:

Phone number without country code: 1234567890
Phone number with country code: +91 1234567890

Lets look at the implementation of our class that handles the formatting:

public class PhoneFormatter : IFormatProvider, ICustomFormatter
    {
        #region IFormatProvider Members 

        public object GetFormat(Type formatType)
        {
            // Check if the class implements ICustomFormatter 
            if (formatType == typeof(ICustomFormatter))
            {
                return this;
            }
            else
            {
                return null;
            }
        } 

        #endregion 

        #region ICustomFormatter Members 

        public string Format(string format, object arg, IFormatProvider formatProvider)
        {
            // if the passed in argument is null, return empty string
            if (arg == null)
            {
                return string.Empty;
            } 

            // Get the value of argument in string
            string phoneNumber = arg.ToString(); 

            // Check if phone number has country code
            if (phoneNumber.StartsWith("+") && phoneNumber.IndexOf(' ') > 1)
            {
                // If it contains country code, separate it from phone number
                string countryCode = phoneNumber.Substring(0, phoneNumber.IndexOf(' ') + 1);
                phoneNumber = phoneNumber.Remove(0, countryCode.Length);

                // Get the formatted value of phone number and prefix it with the country code
                phoneNumber = string.Format("{0}{1}", countryCode, this.GetFormattedPhoneNumber(phoneNumber, format));
            }

            // Check if the phone number is a valid 10 digit number
            if (phoneNumber.Length == 10)
            {
                // Get the formatted value of phone number
                phoneNumber = this.GetFormattedPhoneNumber(phoneNumber, format);
            }

            return phoneNumber;
        }        

        #endregion 

        #region Helper method 
       
        private string GetFormattedPhoneNumber(string phoneNumber, string format)
        {
            long number = 0;

            //Check if the phone number is a valid numeric value
            if (long.TryParse(phoneNumber, out number))
            {
                // If phone number is numeric, format it as per the passed in value
                phoneNumber = number.ToString(format);
            }
            return phoneNumber;
        } 

        #endregion
    }

As you can see, we have a class defined, that implements IFormatProvider and ICustomFormatter. I have commented code at each step so that it is easier to understand whats going on.

Now, lets have a look at its usage:

string phoneNumber = "+91 1234567890";
Console.WriteLine(string.Format(new PhoneFormatter(), "{0:(###) ###-####}", phoneNumber));

This outputs: +91 (123) 456-7890

Similarly, following code produces (123) 456-7890

string phoneNumber = "1234567890";
Console.WriteLine(string.Format(new PhoneFormatter(), "{0:(###) ###-####}", phoneNumber));

Another interesting implementation could be in case of displaying Bank account numbers where only last 4 or 5 digits are displayed and rest are shown as '*' (e.g. ****-****-1234). Its not that this can't be achieved by string operations like substring, but its just that this approach provides you a more structured, managable and reusable way of formatting values.

I hope that this article was easy to follow and enjoyable to you as much as it was to me writing it.

Monday, May 21, 2007

T-SQL New features in SQL Server 2005 - Part 4

Have you ever come across a situation where you need to delete data from one table and keep the deleted records in another table? or insert data in one table and also add the new rows at the same time to another? Weird situation...but this is what I had to do while writing a stored procedure that operates on some physical table and also keep track of affected records in another table(a table data type variable, to be precise) so that I can do some more manipulations on that. This approach, of keeping two tables(a physical and a table type variable)in sync with each other, is definitely not a clean approach but comes in handy when you know that you will be working on a very small subset of a table containing thousands of rows. Now remember, I had to do all manipulations in the stored procedure itself and SQL Server 2000 or earlier version, do not provide you any option so that I can affect two tables with one query.By using Output clause, not only you can output the data into another table, but also send it to the client as if it were a select command.

Output Clause

SQL Server 2005 provides a new clause known as "Output". Output operates on similar lines as that of a trigger i.e. it provides you the details about affected records in logical tables named deleted and inserted.

Lets take up a quick example to understand what I have been talking about:

Suppose we have a table called Employees and we wish to insert a new record into it and also get the ID of the inserted record, which we want to return to the client of our stored procedure.

Lets assume, we have a column named EmployeeID in the table. The query we will write to insert the record looks like following:

INSERT INTO Employees([Name],DepartmentID, ManagerID) 
VALUES('Alec', 1, 2)

To get the value of identity column, we can use well known @@Identity as follows:

INSERT INTO Employees([Name],DepartmentID, ManagerID) 
VALUES('Alec', 1, 2)
SELECT @@Identity

This will output the inserted ID. However, while working on tables having triggers defined on it, it is better to use Scope_Identity() since it operates in the current scope. The modified statement will look as follows:

INSERT INTO Employees([Name],DepartmentID, ManagerID) 
VALUES('Alec', 1, 2)
SELECT Scope_Identity()

In both the cases, however, you can see that we are required to write two statements: one for performing the operation(insert the record) and another to get the affected/inserted ID. Using the Output clause, makes it possible to insert the record as well as return the ID in same SQL statement:

INSERT INTO Employees([Name],DepartmentID, ManagerID) 
OUTPUT inserted.EmployeeID
VALUES('Alec', 1, 2)

Pretty neat! You can see that all the above three statements produces the same result, but using output saves you from writing an extra statement to fetch the ID of inserted record. It is also worth noting that the Output clause give you the value before any triggers have fired on the table i.e. if there is an "instead of" trigger defined, and the record is not actually inserted, you will still get the ID of the inserted record.
Lets look at an interesting problem now. Suppose that you wish to update data in certain table as per the condition specified in where clause and you also wish to return the affected records to the client for some reason. Output does just that.

Lets look the table first:

Now, we wish to update the departmentID to 2 for all those employees who are reporting to EmployeeID 2(i.e. employeeID 4, 5 and 8) and have department ID set as 3 and also return these records to client of our stored procedure. Once these records are updated, we have absolutely no way to get the details about which records were updated. To do this, we can possibly store all the IDs(based on our condition) into some table variable before firing the update command and then use the previously saved IDs to get which records were updated. Another, more efficient, way can be: Output

UPDATE Employees
    SET DepartmentID = 2
OUTPUT inserted.EmployeeID, inserted.[Name], inserted.DepartmentID, inserted.ManagerID
    WHERE ManagerID = 2 AND DepartmentID = 3

Following is the output of above mentioned query:

As mentioned earlier, Output provides you access to inserted and deleted logical tables. The data that has been inserted into the table OR the data after update statement has been executed, is available in inserted logical table. Similarly, data deleted using delete statement OR as a result of updation(i.e. data that was there prior to updation) is available in deleted logical table.

Lets take one simple example now(simple because I feel that now you have a fair idea of what output is all about). We wish to update the name of an employee and we want to get both the old and the new value of the employee name:

UPDATE Employees
    SET [Name] = 'Alex'
OUTPUT inserted.EmployeeID, deleted.[name] as OldName, inserted.[Name] as NewName
    WHERE [Name] = 'Alec'

Following is the output of the above mentioned query:

You can see that we have the name that was updated, obtained from deleted logical table in column named "OldName" and the new value for the name in "NewName" from inserted logical table. Cool!!! Isn't it?

One last thing, which I discussed in the beginning of this article: storing the affected records in a table data type variable. Lets take the example of updating the department IDs of employees:

-- Declare a temporary table to hold the updated records
DECLARE @TempTable Table
(
    EmployeeID        INT,
    [Name]            VARCHAR(50),
    DepartmentID    INT,
    ManagerID        INT
)
 
-- Update the table and store the affected records in @tempTable via output
UPDATE Employees
    SET DepartmentID = 2
OUTPUT inserted.EmployeeID, inserted.[Name], inserted.DepartmentID, inserted.ManagerID INTO @TempTable
    WHERE ManagerID = 2 AND DepartmentID = 3

-- Select all the records to see which were updated
SELECT * FROM @TempTable

Introduction of Output gives a lot of opportunities to developers like me to write a better and manageable code. I think you must have already thought of a couple of places where you can use this... So what are you waiting for..give it a shot!

Friday, May 18, 2007

T-SQL New features in SQL Server 2005 - Part 3

Often times, during my career, I faced a situation where I had to recursively get data out of a table starting from a root. The classic example of such a situation is Employee-Manager relationship table, which I am sure everyone, having even a basic idea of SQL, is familiar with. In this case, we have a table containing the details like Name, Department ID, Salary etc of Employees. There is also another column called Manager ID, which is nothing but the Employee ID from same table ie. case of a self-join. In this table, the employee having highest designation(lets assume CEO) has it Manager ID as null making him the root. Another such example is a table having the details/labels for a TreeView kind of structure (in other words, source for hierarchical data display). Now, prior to SQL Server 2005, the only way to get the data from such a table was by writing a stored procedure. However, introduction of CTEs has made it possible to get data recursively out of a table.

As mentioned in the previous article, CTEs are nothing but an inline representation of a table whose source is a query written in the CTE. By having the join with itself , the CTE can fetch the data recursively. To take it more clearer, lets look at an example.

Lets assume we have a table named Employees containing following data:

In the above table, Jim is the CEO since his manager ID is null. Now, we want to get all the people in the hierarchy starting from Jim. We can do that using the following query:

WITH AllEmployees AS -- (1)

(

SELECT Employees.EmployeeID, Employees.Name, Employees.ManagerID

FROM Employees WHERE ManagerID IS NULL -- (2)

UNION ALL -- (3)

SELECT Employees.EmployeeID, Employees.Name, Employees.ManagerID

FROM AllEmployees INNER JOIN Employees ON -- (4)

AllEmployees.EmployeeID = Employees.ManagerID -- (5)

)

SELECT * FROM AllEmployees -- (6)

This query will result in the following data:

Following is the explanation of the steps:

1) WITH defines a name for the CTE.

2) This query defines the anchor point. Notice we have the condition "ManagerID IS NULL" in WHERE clause. This defines/fetches the starting point. All the data will be fetched from this point onwards.

3) It is mandatory to use UNION ALL in case you are writing a recursive query. Failing this will give you an error mentioning that "it does not contain a top-level UNION ALL operator".

4) Makes a join with the data returned from the anchor query and in turn the CTE as a whole.

5) We make the join on condition where the manager ID from Employees table is in EmployeeID of AllEmployees CTE.

6) Selects all the records from the CTE as source. It is important that the select is the immediate next statement after the CTE closing parentheses.

We can also restrict the recursion level to a desired value. In the above example, we have nesting upto 3 levels starting from Jim. Lets assume that we wish to get the data only till the second level i.e. Jim->Thomas, Helena->Robert, Julian. We can do so by adding the following option to the above mentioned query:
OPTION (MaxRecursion 2)

This will result in the termination of the query as soon as the supplied nesting level is reached. If you look at the messages after executing the query, you will get:

Msg 530, Level 16, State 1, Line 1
The statement terminated. The maximum recursion 2 has been exhausted before statement completion.

But in the results tab, you will still get the expected data.

Also, if you wish to get the nesting or recursion level along with the data, you can modify the query in following manner:

WITH AllEmployees AS
( 
SELECT Employees.EmployeeID, Employees.Name, Employees.ManagerID, 0 as Level 
FROM Employees WHERE ManagerID IS NULL 
UNION ALL 
SELECT Employees.EmployeeID, Employees.Name, Employees.ManagerID, Level + 1 as Level 
FROM AllEmployees INNER JOIN Employees ON 
AllEmployees.EmployeeID = Employees.ManagerID 
) 
SELECT * FROM AllEmployees

Notice that we have added another column named Level to display the nesting level of the given record. The dissection is as follows: the anchor query assigns the value of "Level" as 0(zero). In the second query (after Union All), we set the "Level" as "Level + 1". What it does is that takes the value of Level from previous recursion and adds 1 to it. The value of "Level" is not incremented on each record, but on each recursion. Now the data is displayed as shown in the following figure:

I hope that this tutorial will come in handy when you are in a situation where you need to fetch the data out of a table recursively.

Sunday, May 13, 2007

Managing unmanaged memory

The title of this post might seem confusing. Garbage Collector (or GC) in .Net is designed to take care of managed memory only, so what do you do in a case where you have a managed object that itself takes very small amount of memory but allocates significant amount of unmanaged resources? In that case, GC will not be aware of the actual memory allocated by your application and therefore will not be able to optimize the collection of unused resources.

In .Net 2.0, a new feature has been introduced to take care of such an issue. Now it is possible to make the GC aware about the large amount of unmanaged resources that should be taken care of via GC.AddMemoryPressure. This method accepts a long bytesallocated and informs GC that there are x bytes to be taken care of while scheduling garbage collection. Ideal place to use this method can be the object constructor.

Since we are increasing the memory pressure on GC, so we have to release it too when the object is no longer required. This can be taken care of by using GC.RemoveMemoryPressure after the unmanaged resources have been released. Two points should be taken care of in this kind of a situation: first, you should always release the same amount of memory pressure as much as you have added, failing which might adversely affect the application performance and second, since we are dealing with unmanaged resources, your object must be implementing the IDisposable interface and use it to release the object deterministically.

Thursday, May 10, 2007

T-SQL New features in SQL Server 2005 - Part 2

In continuation to the previous post, today we will be discussing about two new enhancements, namely Top and Common Table Expressions (better known as CTEs).

Top:
It isn't so that the Top clause is introduced in this version. Its about the new possibilities where you can use Top clause in an enhanced fashion. Take for example, you are using SQL Server 2000 and have to write a stored procedure for implementing server side paging. In a typical scenario, you will be passing a value to a parameter for defining the page size (i.e. number of records to be returned) along with other parameters like page index etc. Now an optimum way to use the value of page size would be to use "SET ROWCOUNT" with the page size parameter of the procedure, so that only the desired number of rows are returned:

SET ROWCOUNT @pageSize 
SELECT col1, col2 FROM Table1

However, now you have the power of writing your query in following way:

SELECT TOP(@pageSize) col1, col2 FROM Table1

This will result in same number of rows as specified by @pageSize. Note the use of parenthesis enclosing the @pageSize variable. These are mandatory if you are using a variable with Top clause. You can, however, omit these if using a fixed numeral.

Top clause can now be combined with Insert, Update and Delete statements as well apart from Select. Reviewing following examples will provide a clearer view on this:

DECLARE @Count INT 
SET @Count = 50 
SELECT TOP (@Count) * FROM Categories

This statement will return top 50 record from table named categories. Another usage can be as follows:

DECLARE @Count INT 
SET @Count = 2 
UPDATE TOP (@Count) Categories 
SET CategoryName = CategoryName + 's' 

This statement will update the first two records in 'Categories' table and append an 's' to the CategoryName. In a real life scenario, the queries will be more complex than the above mentioned ones, but those will only be a variation (for eg. Update statement may have a 'Where' clause to filter out the necessary records which needs to be modified.)

It is also worth mentioning that Microsoft recommends using the Top clause in new development work since 'SET ROWCOUNT' is slated to be removed from future versions of SQL Server. More information about the same can be found here.

Common Table Expressions:
Common Table Expressions(or CTEs) can be defined as "An expression that is referred to by name within the context of a single query." What this means is that you define a query, give it a name(or alias) and use that alias as if it is a physical table. An example will make it clear what I am referring to:

WITH EvenNumbers AS -- (1) 
( 
SELECT Col1 FROM Numbers WHERE Number%2=0 -- (2) 
) 
SELECT Col1 FROM EvenNumbers -- (3) 

Here, we have a physical table named "Numbers" which contains random numbers in "Col1" column. Lets dissect the query to understand how it works:
(1) The syntax to define a CTE is by using keyword "WITH". In the above statement, "EvenNumbers" is the alias that we wish to assign to the resultset from the query defined in (2).
In line (2), we are selecting only those values of Col1 which are divisible by 2 i.e. even numbers. The statement in line (3) returns all the records found in (2).

Looking at it the first time, you might not be able to visualize the numerous options CTEs provide. A very good use of CTEs can be in set-based operations. Imagine writing a complex query having join on may be 5-6 tables...can be difficult and time consuming. Looking at it from CTEs perspective, however changes it all. Internally it is doing just the same thing, but provides a better and easier mechanism to understand and write query. Another very useful application of CTEs is in writing recursive queries. Yes! I mean recursive queries!!! We will be looking at that application of CTEs in coming days. Till then....happy querying!!!

Wednesday, May 09, 2007

T-SQL New features in SQL Server 2005 - Part 1

I have been working in SQL Server since past 3 years or so. I like solving complex problems and deriving the simplest, yet efficient, query. Though, I was always able to solve the problem at hand using the constructs provided in SQL Server 2000, but there were times when I felt myself restricted.. or wasn't particularly happy the way I solved a problem. With new constructs and clauses, now available, in SQL Server 2005 T-SQL, I feel that I am able to write the query in a neat, manageable way. This can be compared with the Generics introduced in .Net 2.0 in a way that it does not enable you to do something that wasn't at all possible earlier, but just that it provides a easy to maintain and efficient ways of doing the same thing. In coming days, we will be discussing the following features:

1. Error Handling
2. Top clause
3. Common Table Expression
4. Recursive Queries
5. Output clause
6. Intersect and Except
7. Apply
8. Ranking Function

Error Handling:
In prior version of SQL Server, the way to check if there was an error in a SQL statement was to use @@Error which should be used immediately after the SQL statement. It was mandatory to follow this practice since the value in @@Error was reset after every statement.
With the new structured error handing mechanism provided in T-SQL 2005, it provides a straighter forward, simpler and better way to handle exceptions in code. This can be understood more clearly by the following example:

Writing the following query will result in “Divide by zero” error:

SELECT 1/0

If the statement which is expected to throw an exception is enclosed in a try-catch block, the same situation can be handled in an elegant manner:

BEGIN TRY
SELECT 1/0
END TRY
BEGIN CATCH
PRINT 'ErrorNumber= ' + CONVERT(VARCHAR(10), Error_Number())
PRINT 'ErrorMessage= ' + Error_Message()
END CATCH

This will now handle the exception and display the Error number along with the message.

Apart from the above mentioned functions for obtaining the exception details, SQL Server also provides the following:

ERROR_NUMBER() - Number of error
ERROR_SEVERITY() - Severity of error
ERROR_STATE() - State of error
ERROR_MESSAGE() - Error Message
ERROR_LINE() - Line on which error occurred
ERROR_PROCEDURE() - Procedure in which error occurred
XACT_STATE() - Transaction state

The example shown here just gives an idea of what can be achieved using structured exception handing in SQL Server. I also feel, this approach should be a welcome feature by any .Net programmer since the construct is almost similar to that a .Net developer is familiar with.

Sunday, May 06, 2007

First Post

It was difficult to decide "the first" topic for my blog and after giving a lot of thought I was able to come to a decision. During this whole process of deciding a topic to write upon, I evaluated a lot many things which I like doing and on which I have been working lately. Here, in my blog, I will be covering many things including SQL Server programming, custom controls, new features in .Net 2.0 and similar topics. Since most of my experience is in .Net, so the topics will be more or less .Net centric but I also have interest in other languages, particularly Ruby, probably because of its nature..and dynamism...which again, is related to the latest version of .Net - .Net 3.0. I will try to post my experiences with Ruby as well. So, lets get rolling....

Rajdeep Kwatra's Blog