Characteristics of XML
XML has a number of important characteristics (reprinted from
Professional Excel Development with permission from Addison Wesley):
- XML is a structured format,
which means that we can define exactly how the data is to be arranged, organized and expressed within the file. When we are given a file, we can validate that it conforms to a specific structure, prior to importing the data. As we know the structure of the file in advance, we know what it contains and how to process each item. Prior to XML, the only structure in a text file was positional – we knew the bit of text after the fourth comma should be a date of birth – and we had no way to validate whether it was a date of birth, or even a date, or whether it was in day/month/year or month/day/year order.
- XML is a described format,
which means that within the text file, every item of data has a name that is both human- and machine-readable as well as being uniquely identifiable. We can open these files, read their contents and understand the data they contain, without having to refer back to another document to find out what the text after the fourth comma represents (and was that comma a separator, or part of the text of the second item?). Similarly, we can edit these documents with a fairly high level of confidence that we’re making the correct changes.
- XML can easily describe hierarchical data and the relationships between data.
If we want to import and export a list of authors, with their names, addresses and the books they’ve written, deciding on a reasonable format for a csv file is by no means straightforward. Using XML, we can define what an Author item is and that it has a name, address and multiple Book items. We can also define what a Book item it is and that it has a title, a publisher and an ISBN. The hierarchy and relationships are a natural consequence of the definition.
- XML can be validated,
which means we can provide a second XML file – an XML Schema Definition file – that describes exactly how the XML data file should be structured. Before processing an XML file, we can compare it with the schema to ensure it conforms to the structure we expect to receive.
- XML is a discoverable format,
which means programs (including Excel 2003/2007/2010/2013) can parse an XML data file and infer the structure and relationships between the items. This means we can read an XML file, infer its structure and generate new XML data files that conform to the same structure, with a high degree of confidence the new XML data files will pass validation.
- XML is a strongly-typed format,
which means the schema definition file specifies the data type of each element. When importing the data, the application can check the schema definition to identify the data type to import it as. We no longer run the risk of the product code 01-03 being imported as a date.
- XML is a global format.
There is only one way to express a number in an XML file (with US number formats) and only one way to express a date. We no longer have to check whether a csv file was created with US or French settings and adjust our processing of it accordingly.
- XML is a standard format.
The way in which the content of an XML file is defined has been specified by the World Wide Web Consortium (W3C). This allows applications (including Excel 2003/2007/2010/2013) to read, understand and validate the structure of an XML file and create files that conform to the specified structure. It also allows different applications to read, write, understand and validate the same XML files, allowing us to share data between applications in an extremely robust manner.