There is more to xsd than datasets

 

In this post I’m going to take a closer look at XSD, the XML schema language. Like most of you I came into touch with XSD through.NET datasets and for quite some time I didn’t quite get the perspective it deserves. Before exploring XSD itself I’ll take a brief historical tour and will kick in some open doors. But in the end I hope to have given you enough glimpses to wake your interest.

From dbf-file to xsd document

Almost every application has to persist some data in a store. In the ancient days we built monolithic dBase and Clipper apps with the user literally editing the physical data files. (Canceling an insert meant deleting the appended record. Which led to the first template generators, but that’s a different story). Next came Client-Server applications which separated the code handling the database from the part the end user touched. These were the hey-days of Delphi (or VB <= 6 in case you were less lucky). There was a separation of layers but the application was always connected to the database. The next big step, reaching mainstream in the .NET framework, was dropping this connection. The application works with a disconnected set of data; when it needs new data or has some updates to persist the connection to the store is made and messages containing the data are exchanged. It should be possible to exchange these messages over any channel. The entire world is reachable over the internet and the HTTP protocol will take you through all possible hurdles found under way. There is only one drawback on the latter: you can only transport text encoded messages over this channel. All of this can be handled by the .NET dataset. Known in VS 2005 as DataSetx.xsd.

A (typed) .NET dataset has many faces. Inside your .net code it’s a nice object. In code you can iterate through or find data it contains; many of the components in the .NET framework can work with it and (web-) forms easily bind to it. To the outside world a dataset can serialize it’s content to an XML text document. A recipient of the document can de-serialize it back into the binary dataset object again.

My first adventures with XML had been through tXMLdocument, a Delphi wrapper for the XML DOM api. The documents we worked with were free form documents. They had to be well formed, that is consist out of neatly paired XML tags and with a single root node. But their structure followed every tree and shrubbery we found to be handy for the application. Database data is stored in tables, these have a far more limited structure. Not every well formed XML document can map to a database table.

A database schema describes the layout of the tables in a database. Likewise XSD is a schema language describing the structure of an xml-document. For an XML document to house database table data, like a serialized .NET dataset, the XSD describes an XML structure to map the table and its columns. My first XSD was created in the Visual Studio dataset designer by dropping a database table from the server explorer on the dataset designer’s surface. The resulting .NET dataset schema is a pretty complex document. You see them coming by when publishing typed datasets in a webservice. (Which might not be a very good idea either, see this post for a deeper dive into that).

Learning XSD

So a .NET dataset does not really look like a good starting point to learn the essentials of XSD. For some time I took XSD for granted. Until I had to create and validate some XML documents with a custom structure. Reading through my bookshelf and googling around I found loads and loads of info on XSD. There is a very good tutorial on MSDN, I wish I had found that one earlier. Another one is on W3schools but this one is imho to much aimed at web designers. The Schema reference on W3schools is a handy companion. As XSD is an internet standard there is also a lot of w3c stuff. But that is quite academic and as a starter it’s just as frightening as a .net dataset schema.

To get to my point how XSD can handle more than just DB structures I’m going to rush through some XSD basics, for more information there is the mentioned MSDN tutorial. Let’s create a schema to do some exploring. Instead of using the visual representation of the dataset designer in Visual studio I switch to view code and type by hand. I’m not alone, there is code completion available. The XSD schema is an xml document which describes types. Simple types such an integer or a string. The great thing is that you can put restrictions on these types. This example is a very simple schema, it publishes two types, a string with a maximum length of 20 characters and an integer value in the range from 12 to 22.

<?xml version="1.0" encoding="utf-8" ?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema&quot; xmlns:mys ="http://Gekko-Software.nl/Demos/ThereIsMore.xsd&quot;

targetNamespace="http://Gekko-Software.nl/Demos/ThereIsMore.xsd&quot;

elementFormDefault="qualified"

                  >

  <xs:element name="MyString">

    <xs:simpleType >

      <xs:restriction base="xs:string">

        <xs:maxLength value="20"></xs:maxLength>

      </xs:restriction>

    </xs:simpleType>

  </xs:element>

  <xs:element name="MyInteger">

    <xs:simpleType >

      <xs:restriction base="xs:integer">

        <xs:minInclusive value="12"></xs:minInclusive>

        <xs:maxInclusive value="22"></xs:maxInclusive>

      </xs:restriction>

    </xs:simpleType>

  </xs:element>

</xs:schema>

The element MyString is a simple type. It is based on the standard xsd string type and is restricted to a maximum length of 20. Likewise the element MyInteger is based on the xsd type integer and has a range from 12 to 22.

Now we have an xml document which can be used to validate the content of other xml documents. This can also be done in Visual Studio itself. When you add an xml document to a project you can set the schemas of the document. After adding the previous schema there is code completion and content validation as you type.

This document uses the schema we just created. The curly lines indicate that both document elements are invalid. The tooltip provides more information.

There are far more restrictions you can put on a simple type. They include enumerations and even regular expressions. Check the tutorials and the reference for more on that.

Complex types in XSD are a grouping of typed elements. The next example uses the simple types just built to create a complex type mytable. Note that the myinteger and mystring types are no longer enclosed in xs:element tags. They are now used internally to build the complex type and no longer directly published by the schema. In the table the typenames are prefixed with mys. In the header of the schema this prefix is defined as being the namespace of this schema.

<?xml version="1.0" encoding="utf-8" ?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema&quot; xmlns:mys ="http://Gekko-Software.nl/Demos/ThereIsMore.xsd&quot;

targetNamespace="http://Gekko-Software.nl/Demos/ThereIsMore.xsd&quot;

elementFormDefault="qualified"

                  >

  <xs:simpleType name="mystring">

    <xs:restriction base="xs:string">

      <xs:maxLength value="20"></xs:maxLength>

    </xs:restriction>

  </xs:simpleType>

  <xs:simpleType name="myinteger">

    <xs:restriction base="xs:integer">

      <xs:minInclusive value="12"></xs:minInclusive>

      <xs:maxInclusive value="22"></xs:maxInclusive>

    </xs:restriction>

  </xs:simpleType>

  <xs:element name ="mytable">

    <xs:complexType>

      <xs:sequence minOccurs="1" maxOccurs="unbounded">

        <xs:element name="Field1" type="mys:mystring"></xs:element>

        <xs:element name="Field2" type="mys:mystring"></xs:element>

        <xs:element name="Field3" type="mys:myinteger"></xs:element>

      </xs:sequence>

    </xs:complexType>

  </xs:element>

</xs:schema>

There are several way to group types in a complex type. Here I have used xs:sequence. Which indicates a fixed sequence of all enclosed elements. Just like a database table. Using the the minoccurs attribute I force the document to contain at least one row of data. Using this schema to validate an xmldocumemt again shows validation results inside VS.

The curly lines indicate that only Field2 passes validation.

We have now seen the basics of describing a database table in xsd. Wrestling through a VS generated dataset xsd you will eventually see the pattern.

XSD beyond databases

There are also quite different things then db-like tables you can express in xsd. In the mytable complex type I had grouped types using a sequence. Another way to group types is using a choice.

  <xs:element name ="mytable">

    <xs:complexType>

      <xs:choice minOccurs="1" maxOccurs="unbounded">

        <xs:element name="Field1" type="mys:mystring"></xs:element>

        <xs:element name="Field2" type="mys:mystring"></xs:element>

        <xs:element name="Field3" type="mys:myinteger"></xs:element>

      </xs:choice>

    </xs:complexType>

  </xs:element>

Now a valid element of the type mytable contains any random list of Fieldx elements.

No curly lines at all, this document is entirely valid. But just imagine having to store this into a database table. Hard to map.

And you can go far further. A complex type is a grouping of other types. These other types don’t have to be simple types, they can be complex types themselves. Best of all a type definition can be recursive, that is one of the contained types is the type itself.

To demonstrate this I have to do some reshuffling on the example. The elements get a more descriptive name. The complex type mynode is recursive. It has the previous elements and a recursive element. The schema publishes the element mytree. Which is a label and the mynode element.

<?xml version="1.0" encoding="utf-8"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema&quot; xmlns:mys="http://Gekko-Software.nl/Demos/ThereIsMore.xsd&quot; targetNamespace="http://Gekko-Software.nl/Demos/ThereIsMore.xsd&quot; elementFormDefault="qualified">

  <xs:simpleType name="mystring">

    <xs:restriction base="xs:string">

      <xs:maxLength value="20">

      </xs:maxLength>

    </xs:restriction>

  </xs:simpleType>

  <xs:simpleType name="myinteger">

    <xs:restriction base="xs:integer">

      <xs:minInclusive value="12">

      </xs:minInclusive>

      <xs:maxInclusive value="22">

      </xs:maxInclusive>

    </xs:restriction>

  </xs:simpleType>

  <xs:complexType name="mynode">

    <xs:choice minOccurs="1" maxOccurs="unbounded">

      <xs:element name="label1" type="mys:mystring">

      </xs:element>

      <xs:element name="label2" type="mys:mystring">

      </xs:element>

      <xs:element name="counter" type="mys:myinteger">

      </xs:element>

      <xs:element name="node" type="mys:mynode">

      </xs:element>

    </xs:choice>

  </xs:complexType>

  <xs:element name="mytree">

    <xs:complexType>

      <xs:sequence minOccurs="1" maxOccurs="1">

        <xs:element name="nodename" type="xs:string">

        </xs:element>

        <xs:element name="content" type="mys:mynode">

        </xs:element>

      </xs:sequence>

    </xs:complexType>

  </xs:element>

</xs:schema>

Using this schema quite complex documents can be validated in great detail. The only thing wrong with the following document is that one of the strings is too long.

This is quite something else as the xsd describing datasets. In the textual representation the schema looks quite simple. Switching to the visual designer that’s quite different.

Which is a lot more than database tables dropped on the surface, we have created something which cannot even be fully visualized.

Winding down

These days xsd-datasets as a model for database data are beginning to disappear below the horizon, in apps that role is taken over by OR mappers and domain modeling. But it would be throwing away the child with the bathwater to lose interest in xsd itself as well. I hope I have hinted at the many more things you can model with it and the roles it can have in developing software. Besides validating data messages it’s a great tool to spec xml configuration or specification files. Using the xsd document you and others can validate documents while typing. And you are even assisted with code completion. This works so intuitively that the only thing you miss is refactoring support.

There is more to xsd than datasets

Advertisements
Explore posts in the same categories: XML / XSLT

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: