Source

Download Xml2Xsd.zip - 5.07 KB
Download Binary Xml2Xsd_bin.zip - 6.07 KB

Introduction

Frequently in line of business projects you need to generate complex schemas. This article outlines rapid prototyping of high quality/maintainable schema from sample xml such that derivative object models generate cleanly in all platforms. 

Background

There are many tools for creating and managing schema, most are "not fun". Starting from a blank screen on a complex tool is daunting, especially when you're writing in an abstract language like Xml Schema (XSD)

It's most benificial to create a sample, then use existing tools to generate schema. The problem with these shema generating tools is that they nest complex types... Nesting complex types causes two problems:

  • It's ugly/hard to maintain
  • Generators will build very ugly objects from this kind of schema
  • It does not follow general industry practices for Xml Schema (msdata namespace)

By Example:

I started my data modelling using a sample, in this case I want to model Cub Scout pinewood derby race data (yes, I have an 8yo boy).

<Derby>
	<Racers>
		<Group Name="Den7">
			<Cub Id="1" First="Johny" Last="Racer" Place="1"/>
			<Cub Id="2" First="Stan" Last="Lee" Plac="3"/>
		</Group>
        ...
If I run XSD.exe (included in the .Net SDK) on that xml; it would generate XSD like:
<xs:schema id="Derby" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
  <xs:element name="Derby" msdata:IsDataSet="true" msdata:UseCurrentLocale="true">
    <xs:complexType>
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:element name="Racers">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="Group" minOccurs="0" maxOccurs="unbounded">
                <xs:complexType>
                  <xs:sequence>
                    <xs:element name="Cub" minOccurs="0" maxOccurs="unbounded">
                      <xs:complexType>
                      ...
Please notice all the nesting... When you then run xsd.exe on the generated derby.xsd... it will generate objects with names like: DerbyRacersGroupCub BLECK!

The Better Schema:

<xs:schema xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  
  <xs:element name="Derby" type="DerbyInfo" />
  
  <xs:complexType name="DerbyInfo">
    <xs:sequence>
      <xs:element name="Racers" type="RacersInfo" />
      <xs:element name="Races" type="RacesInfo" />
    </xs:sequence>
  </xs:complexType>
  ...

Improve Xml2Xsd

So I set out to solve all these problems and built a better/simpler generator.

Algorithm overview:

  • Open an XDocument for the sample xml
  • Read all the elements and build a dictionary of xpaths. I used a dictionary but a List with Distinct() could have worked too
  • from the list of xpaths, drive through all the xpaths and build the attribute & elements making sure to reference all new elements, instead of nesting

High Level Static Method

public static XDocument Generate(XDocument content, string targetNamespace)
{
    xpaths.Clear();
    elements.Clear();
    recurseElements.Clear();

    RecurseAllXPaths(string.Empty, content.Elements().First());

    target = XNamespace.Get(targetNamespace);

    var compTypes = xpaths.Select(k => k.Key)
        .OrderBy(o => o)
        .Select(k => ComplexTypeElementFromXPath(k))
        .Where(q => null != q).ToArray();

    // The first one is our root element... it needs to be extracted and massage
    compTypes[0] = compTypes.First().Element(xs + "sequence").Element(xs + "element");

    // Warning: Namespaces are tricky/hinted here, be careful
    return new XDocument(new XElement(target + "schema",
        // Why 'qualified'?
        // All "qualified" elements and attributes are in the targetNamespace of the
        // schema and all "unqualified" elements and attributes are in no namespace.
        //  All global elements and attributes are qualified.
        new XAttribute("elementFormDefault", "qualified"),

        // Specify the target namespace, you will want this for schema validation
        new XAttribute("targetNamespace", targetNamespace),
                
        // hint to xDocument that we want the xml schema namespace to be called 'xs'
        new XAttribute(XNamespace.Xmlns + "xs", "http://www.w3.org/2001/XMLSchema"),
        compTypes));
}
 

Recurse All XPaths

Foreach element find if it's distinct, look for repeating element names (recursively defined) elements and track them

static void RecurseAllXPaths(string xpath, XElement elem)
{
    var missingXpath = !xpaths.ContainsKey(xpath);
    var lclName = elem.Name.LocalName;

    var hasLcl = elements.ContainsKey(lclName);

    // Check for recursion in the element name (same name different level)
    if (hasLcl && missingXpath)
        RecurseElements.Add(lclName);
    else if (!hasLcl)
        elements.Add(lclName, true);

    // if it's not in the xpath, then add it.
    if (missingXpath)
        xpaths.Add(xpath, null);

    // add xpaths for all attributes
    elem.Attributes().ToList().ForEach(attr =>
        {
            var xpath1 = string.Format("{0}/@{1}", xpath, attr.Name);
            if (!xpaths.ContainsKey(xpath1))
                xpaths.Add(xpath1, null);
        });

    elem.Elements().ToList().ForEach(fe => RecurseAllXPaths(
        string.Format("{0}/{1}", xpath, lclName), fe));
}

Generating Schema form xpaths

Now that we have a list of xpaths, we need to generate appropriate schema for them

private static XElement ComplexTypeElementFromXPath(string xp)
{
    var parts = xp.Split('/');
    var last = parts.Last();
    var isAttr = last.StartsWith("@");
    var parent = ParentElementByXPath(parts);

    return (isAttr) ? BuildAttributeSchema(xp, last, parent) : 
        BuildElementSchema(xp, last, parent);
}

BuildAttributeSchema

private static XElement BuildAttributeSchema(string k, string last, XElement parent)
{
    var elem0 = new XElement(xs + "attribute",
        new XAttribute("name", last.TrimStart('@')),
        new XAttribute("type", "string"));
            
    if (null != parent)
        parent.Add(elem0);

    xpaths[k] = elem0;

    return null;
}

BuildElementSchema

This one is not as straight forward as BuildAttribute, instead we have to make sure we have appropriate "type-references" made the parent node... it's a little hairy, but it works nicely

private static XElement BuildElementSchema(string k, string last, XElement parent)
{
    XElement seqElem = null;
    if (null != parent)
    {
        seqElem = parent.Element(xs + "sequence");

        // Add a new squence if one doesn't already exist
        if (null == seqElem && null != parent)
            // Note: add sequence to the start, because sequences need to come before any 
            //  attributes in XSD syntax
            parent.AddFirst(seqElem = new XElement(xs + "sequence"));
    }
    else
    {
        // In this case, there's no existing parent
        seqElem = new XElement(xs + "sequence");
    }

    var lastInfo = last + "Info";

    var elem0 = new XElement(xs + "element",
            new XAttribute("name", last),
            new XAttribute("type", lastInfo));
    seqElem.Add(elem0); // add the ref to the existing sequence

    return xpaths[k] = new XElement(xs + "complexType",
        new XAttribute("name", lastInfo));
}

Using the code

  • Download the sample project
  • Build in VS2010 or Express
  • F5 from the debug solution will execute
  • Open Derby.Xsd in bin/Debug to see the result
If you're still reading, I strongy recomend F10/F11 through the project to get into the details. ! HAVE FUN !

Enhancements

  • Elements without children (aka value elements) 
  • Derive data types from the contents of the sample xml (integer, boolean, DateTime, etc) 

Future Improvements

  • Make recursively defined elements work

History 

  • 12/04/2010 - Created