XmlToXsd - A better schema generator
Source
Download Xml2Xsd.zip - 5.07 KBDownload Binary Xml2Xsd_bin.zip - 6.07 KB
Introduction
Frequently in line of business projects you need to generate complex schemas. This article outlines rapid prototyping of high quality/maintainable schema from sample xml such that derivative object models generate cleanly in all platforms.
Background
There are many tools for creating and managing schema, most are "not fun". Starting from a blank screen on a complex tool is daunting, especially when you're writing in an abstract language like Xml Schema (XSD)
It's most benificial to create a sample, then use existing tools to generate schema. The problem with these shema generating tools is that they nest complex types... Nesting complex types causes two problems:
- It's ugly/hard to maintain
- Generators will build very ugly objects from this kind of schema
- It does not follow general industry practices for Xml Schema (msdata namespace)
By Example:
I started my data modelling using a sample, in this case I want to model Cub Scout pinewood derby race data (yes, I have an 8yo boy).
<Derby>
<Racers>
<Group Name="Den7">
<Cub Id="1" First="Johny" Last="Racer" Place="1"/>
<Cub Id="2" First="Stan" Last="Lee" Plac="3"/>
</Group>
...
If I run XSD.exe (included in the .Net SDK) on that xml; it would generate XSD like:
<xs:schema id="Derby" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
<xs:element name="Derby" msdata:IsDataSet="true" msdata:UseCurrentLocale="true">
<xs:complexType>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element name="Racers">
<xs:complexType>
<xs:sequence>
<xs:element name="Group" minOccurs="0" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="Cub" minOccurs="0" maxOccurs="unbounded">
<xs:complexType>
...
Please notice all the nesting... When you then run xsd.exe on the generated derby.xsd...
it will generate objects with names like: DerbyRacersGroupCub BLECK!
The Better Schema:
<xs:schema xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Derby" type="DerbyInfo" />
<xs:complexType name="DerbyInfo">
<xs:sequence>
<xs:element name="Racers" type="RacersInfo" />
<xs:element name="Races" type="RacesInfo" />
</xs:sequence>
</xs:complexType>
...
Improve Xml2Xsd
So I set out to solve all these problems and built a better/simpler generator.
Algorithm overview:
- Open an XDocument for the sample xml
- Read all the elements and build a dictionary of xpaths. I used a dictionary but a
ListwithDistinct()could have worked too - from the list of xpaths, drive through all the xpaths and build the attribute & elements making sure to reference all new elements, instead of nesting
High Level Static Method
public static XDocument Generate(XDocument content, string targetNamespace)
{
xpaths.Clear();
elements.Clear();
recurseElements.Clear();
RecurseAllXPaths(string.Empty, content.Elements().First());
target = XNamespace.Get(targetNamespace);
var compTypes = xpaths.Select(k => k.Key)
.OrderBy(o => o)
.Select(k => ComplexTypeElementFromXPath(k))
.Where(q => null != q).ToArray();
// The first one is our root element... it needs to be extracted and massage
compTypes[0] = compTypes.First().Element(xs + "sequence").Element(xs + "element");
// Warning: Namespaces are tricky/hinted here, be careful
return new XDocument(new XElement(target + "schema",
// Why 'qualified'?
// All "qualified" elements and attributes are in the targetNamespace of the
// schema and all "unqualified" elements and attributes are in no namespace.
// All global elements and attributes are qualified.
new XAttribute("elementFormDefault", "qualified"),
// Specify the target namespace, you will want this for schema validation
new XAttribute("targetNamespace", targetNamespace),
// hint to xDocument that we want the xml schema namespace to be called 'xs'
new XAttribute(XNamespace.Xmlns + "xs", "http://www.w3.org/2001/XMLSchema"),
compTypes));
}
Recurse All XPaths
Foreach element find if it's distinct, look for repeating element names (recursively defined) elements and track them
static void RecurseAllXPaths(string xpath, XElement elem)
{
var missingXpath = !xpaths.ContainsKey(xpath);
var lclName = elem.Name.LocalName;
var hasLcl = elements.ContainsKey(lclName);
// Check for recursion in the element name (same name different level)
if (hasLcl && missingXpath)
RecurseElements.Add(lclName);
else if (!hasLcl)
elements.Add(lclName, true);
// if it's not in the xpath, then add it.
if (missingXpath)
xpaths.Add(xpath, null);
// add xpaths for all attributes
elem.Attributes().ToList().ForEach(attr =>
{
var xpath1 = string.Format("{0}/@{1}", xpath, attr.Name);
if (!xpaths.ContainsKey(xpath1))
xpaths.Add(xpath1, null);
});
elem.Elements().ToList().ForEach(fe => RecurseAllXPaths(
string.Format("{0}/{1}", xpath, lclName), fe));
}
Generating Schema form xpaths
Now that we have a list of xpaths, we need to generate appropriate schema for them
private static XElement ComplexTypeElementFromXPath(string xp)
{
var parts = xp.Split('/');
var last = parts.Last();
var isAttr = last.StartsWith("@");
var parent = ParentElementByXPath(parts);
return (isAttr) ? BuildAttributeSchema(xp, last, parent) :
BuildElementSchema(xp, last, parent);
}
BuildAttributeSchema
private static XElement BuildAttributeSchema(string k, string last, XElement parent)
{
var elem0 = new XElement(xs + "attribute",
new XAttribute("name", last.TrimStart('@')),
new XAttribute("type", "string"));
if (null != parent)
parent.Add(elem0);
xpaths[k] = elem0;
return null;
}
BuildElementSchema
This one is not as straight forward as BuildAttribute, instead we have to make sure we have appropriate "type-references" made the parent node... it's a little hairy, but it works nicely
private static XElement BuildElementSchema(string k, string last, XElement parent)
{
XElement seqElem = null;
if (null != parent)
{
seqElem = parent.Element(xs + "sequence");
// Add a new squence if one doesn't already exist
if (null == seqElem && null != parent)
// Note: add sequence to the start, because sequences need to come before any
// attributes in XSD syntax
parent.AddFirst(seqElem = new XElement(xs + "sequence"));
}
else
{
// In this case, there's no existing parent
seqElem = new XElement(xs + "sequence");
}
var lastInfo = last + "Info";
var elem0 = new XElement(xs + "element",
new XAttribute("name", last),
new XAttribute("type", lastInfo));
seqElem.Add(elem0); // add the ref to the existing sequence
return xpaths[k] = new XElement(xs + "complexType",
new XAttribute("name", lastInfo));
}
Using the code
- Download the sample project
- Build in VS2010 or Express
- F5 from the debug solution will execute
- Open Derby.Xsd in bin/Debug to see the result
Enhancements
- Elements without children (aka value elements)
- Derive data types from the contents of the sample xml (integer, boolean, DateTime, etc)
Future Improvements
- Make recursively defined elements work
History
- 12/04/2010 - Created