Monday, June 25, 2007

Validating XML in .Net

There might be a case where you are importing XML file from somewhere and need to validate it before performing some operations on the XML file. Though, there can be many ways to do so, probably the best is using XMLValidatingReader or XMLReaderSettings. It provides you mechanism to provide a XML file, add a schema and validate the XML file supplied against the schema. It also provides an event (ValidationEventHandler) which is fired as soon as an exception occurs in validating the XMLDocument. Please note that this class has been marked obsolete in .Net framework 2.0 which recommends using XMLReader.Create() with proper XMLReaderSettings for validating the document. We will be looking at both the cases i.e. using XMLValidatingReader as well as XMLReader.Create().

Lets assume that the XML file that we wish to validate has details about orders. Each of the Order contains the Order ID, details about the Customer, Discount percent given and the details about Products. Following graphic(click for larger image) shows the XML file that we are using:


Click to view larger image

Orders.xml


To validate the XML shown above, we have the schema. The schema for the XML can be found here.

Validating XML using XmlValidatingReader:

To validate the XML file against a schema, we can use XmlValidatingReader. The steps involved are:

  1. Create instance of XmlValidatingReader using the XML file to validate.
  2. Create an object of XmlSchema using the schema file to validate against.
  3. Hookup event for XmlValidatingReader.ValidationEventHandler.
  4. Read the Xml to the end using XmlValidatingReader.
Please note that in absence of event mentioned in point (3), an exception will be thrown on the first error encountered in XML file where as in case of having the ValidationEventHandler hooked up, all the errors in the XML file can be displayed. Following code shows the steps mentioned above:


using (XmlValidatingReader xmlValidatingReader = new XmlValidatingReader(new XmlTextReader("Orders.xml")))

{

// Create the schema object to validate XML files

XmlSchema xmlSchema = XmlSchema.Read(new XmlTextReader("Orders.xsd"), new ValidationEventHandler(Schema_ValidationError));

// Add to the collection of schemas for XmlValidatingReader

xmlValidatingReader.Schemas.Add(xmlSchema);

// Attach an event which will be filed on validating error

xmlValidatingReader.ValidationEventHandler += new ValidationEventHandler(xmlValidatingReader_ValidationEventHandler);

// Read the XML to the end

while (xmlValidatingReader.Read()) ;

Console.WriteLine("\nFinished validating XML file....");

}




Validating XML using XmlReaderSettings:

Validating XML using XmlReaderSettings follows almost the same path as that for XmlValidatingReader, with a difference that here we will be using XmlReaderSettings to pass in the schema details. Following are the steps to using XmlReaderSettings for validating an XML file:

  1. Create a XmlSchema object using the schema file.
  2. Create an object for XmlReaderSettings.
  3. Set the ValidationType of XmlReaderSettings as "Schema".
  4. Add the XmlSchema to the collection of schemas of XmlReaderSettings.
  5. Attach the event for XmlReaderSettings.ValidationEventHandler.
  6. Create an instance of XmlReader using XmlReader.Create() and passing in XML file to validate.
  7. Read the XmlFile to the end.
In this case also, absence of ValidationEventHandler will cause exception on the first error encountered in reading the XML file. Following code shows the above mentioned steps:

// Create the schema object

XmlSchema xmlSchema = XmlSchema.Read(new XmlTextReader("Orders.xsd"), new ValidationEventHandler(Schema_ValidationError));

// Create reader settings

XmlReaderSettings xmlReaderSettings = new XmlReaderSettings();

// Set validation type to schema

xmlReaderSettings.ValidationType = ValidationType.Schema;

// Add to the collection of schemas in readerSettings

xmlReaderSettings.Schemas.Add(xmlSchema);

// Attach event handler whic will be fired when validation error occurs

xmlReaderSettings.ValidationEventHandler += new ValidationEventHandler(xmlReaderSettings_ValidationEventHandler);

// Create object of XmlReader using XmlReaderSettings

using (XmlReader xmlReader = XmlReader.Create(new XmlTextReader("Orders.xml"), xmlReaderSettings))

{

// Read XML to the end

while (xmlReader.Read()) ;

Console.WriteLine("\nFinished validating XML file....");

}



The code behaves the same in both the cases. However, there is a small difference in how the exception message is displayed. Using theXmlValidatingReader tells about the element/attribute with incorrect value as per the data type, but using XmlReaderSetting provides a detailed error showing the incorrect value along with the expected datatype.

I hope I was able to provide most of you with a good insight on validating XML files using schemas. If you have been using some other method, please do share it here.

kick it on DotNetKicks.com

Monday, June 11, 2007

T-SQL New features in SQL Server 2005 - Part 5

In continuation to the series that I have on new features in SQL Server 2005, today I will be discussing about two new operators- Intersect and Except. I believe, most of you know that Oracle already has these operators, called Intersect and Minus. It was good to learn that now these are supported in SQL Server as well. Before going any further, lets discuss what these operators help achieve. To give you a brief idea - both these operators combine the result of two select statements having same number of columns and outputs result based on the operator used.

Intersect:
As mentioned above, it combines the result of two select statements and gets the distinct rows that exists in both the queries. To make it more clear, lets take an example. Suppose, we have a table that contains the details about Products and another table that contains the details about the Orders. Now, we want to get all the product IDs for which we have at least one order. Following graphics shows the structure and data of Products and Orders tables:


Products:


Orders:


Now, we wish to get all the Product IDs which have at least one order associated with it. Using the Intersect operator, we can get the desired output:

SELECT ProductID FROM Production.Product

INTERSECT

SELECT ProductID FROM Sales.SalesOrderDetail

This lists all the distinct products that are associated to any Order. Its also worth mentioning here, that whatever can be done using Intersect and Except, can be done achieved using Exists as well as shown below:

SELECT DISTINCT ProductID FROM Production.Product P

WHERE EXISTS

(SELECT ProductID FROM Sales.SalesOrderDetail S WHERE S.ProductID = P.ProductID)


Note that the output of both the queries are identical. However, we are required to place Distinct in the query using Exists, so as to remove the duplicate rows. Intersect, on the other hand, handles this automatically and provides a more straight forward way to get the desired result set.

Except:
Contrary to Intersect, Except combines output from both the supplied queries and give the result as the records which are present in result set of first query and not in that of the second query. Lets take an example. Suppose we wish to get the list of slow moving products from the Products table shown above. In order to get such a list, we need to check for such products which exists, but have no orders associated with them. In essence, we are reversing the example discussed above:

SELECT ProductID FROM Production.Product

EXCEPT

SELECT ProductID FROM Sales.SalesOrderDetail

Here, we will get a list of all those items which do not have any orders associated with them. As mentioned above, this can be achieved using Exists as well.

SELECT DISTINCT ProductID FROM Production.Product P

WHERE NOT EXISTS

(SELECT ProductID FROM Sales.SalesOrderDetail S WHERE S.ProductID = P.ProductID)

Looking at the queries above, it is clear that the Intersect and Except provides a neat and straight forward mechanism to get the records as discussed in our examples.

On a final note, we do have some limitations using the Intersect and Except operators. You can use Order By only on the overall result set whereas Group By and Having can only be used with the queries involved and not with the result set.

kick it on DotNetKicks.com