Pages in this article
-
What is XML
-
Characteristics of XML
-
Structure of XML
-
XML Schemas
-
XML in Excel
-
XML Validation
-
Conclusion
Characteristics of
XML
XML has a number of important characteristics (reprinted from
Professional Excel Development with permission from Addison Wesley):
- XML is a structured format,
which means that we can define exactly how the data is to be
arranged, organized and expressed within the file. When we are given
a file, we can validate that it conforms to a specific structure,
prior to importing the data. As we know the structure of the file in
advance, we know what it contains and how to process each item.
Prior to XML, the only structure in a text file was positional – we
knew the bit of text after the fourth comma should be a date of
birth – and we had no way to validate whether it was a date of
birth, or even a date, or whether it was in day/month/year or
month/day/year order.
- XML is a described format,
which means that within the text file, every item of data has a name
that is both human- and machine-readable as well as being uniquely
identifiable. We can open these files, read their contents and
understand the data they contain, without having to refer back to
another document to find out what the text after the fourth comma
represents (and was that comma a separator, or part of the text of the
second item?). Similarly, we can edit these documents with a fairly high
level of confidence that we’re making the correct changes.
- XML can easily describe hierarchical data and the
relationships between data.
If we want to import and export a list of authors, with their names,
addresses and the books they’ve written, deciding on a reasonable format
for a csv file is by no means straightforward. Using XML, we can define
what an Author item is and that it has a name, address and multiple Book
items. We can also define what a Book item it is and that it has a
title, a publisher and an ISBN. The hierarchy and relationships are a
natural consequence of the definition.
- XML can be validated,
which means we can provide a second XML file – an XML Schema Definition
file – that describes exactly how the XML data file should be
structured. Before processing an XML file, we can compare it with the
schema to ensure it conforms to the structure we expect to receive.
- XML is a discoverable format,
which means programs (including Excel 2003/2007/2010/2013) can parse an
XML data file and infer the structure and relationships between the
items. This means we can read an XML file, infer its structure and
generate new XML data files that conform to the same structure, with a
high degree of confidence the new XML data files will pass validation.
- XML is a strongly-typed format,
which means the schema definition file specifies the data type of each
element. When importing the data, the application can check the schema
definition to identify the data type to import it as. We no longer run
the risk of the product code 01-03 being imported as a date.
- XML is a global format.
There is only one way to express a number in an XML file (with US number
formats) and only one way to express a date. We no longer have to check
whether a csv file was created with US or French settings and adjust our
processing of it accordingly.
- XML is a standard format.
The way in which the content of an XML file is defined has been
specified by the World Wide Web Consortium (W3C). This allows
applications (including Excel 2003/2007/2010/2013) to read, understand
and validate the structure of an XML file and create files that conform
to the specified structure. It also allows different applications to
read, write, understand and validate the same XML files, allowing us to
share data between applications in an extremely robust manner.

