XML and Excel

Pages in this article

  1. What is XML
  2. Characteristics of XML
  3. Structure of XML
  4. XML Schemas
  5. XML in Excel
  6. XML Validation
  7. Conclusion

Characteristics of XML

XML has a number of important characteristics (reprinted from
Professional Excel Development with permission from Addison Wesley):

  • XML is a structured format,
    which means that we can define exactly how the data is to be arranged, organized and expressed within the file. When we are given a file, we can validate that it conforms to a specific structure, prior to importing the data. As we know the structure of the file in advance, we know what it contains and how to process each item. Prior to XML, the only structure in a text file was positional – we knew the bit of text after the fourth comma should be a date of birth – and we had no way to validate whether it was a date of birth, or even a date, or whether it was in day/month/year or month/day/year order.
  • XML is a described format,
    which means that within the text file, every item of data has a name that is both human- and machine-readable as well as being uniquely identifiable. We can open these files, read their contents and understand the data they contain, without having to refer back to another document to find out what the text after the fourth comma represents (and was that comma a separator, or part of the text of the second item?). Similarly, we can edit these documents with a fairly high level of confidence that we’re making the correct changes.
  • XML can easily describe hierarchical data and the relationships between data.
    If we want to import and export a list of authors, with their names, addresses and the books they’ve written, deciding on a reasonable format for a csv file is by no means straightforward. Using XML, we can define what an Author item is and that it has a name, address and multiple Book items. We can also define what a Book item it is and that it has a title, a publisher and an ISBN. The hierarchy and relationships are a natural consequence of the definition.
  • XML can be validated,
    which means we can provide a second XML file – an XML Schema Definition file – that describes exactly how the XML data file should be structured. Before processing an XML file, we can compare it with the schema to ensure it conforms to the structure we expect to receive.
  • XML is a discoverable format,
    which means programs (including Excel 2003/2007/2010/2013) can parse an XML data file and infer the structure and relationships between the items. This means we can read an XML file, infer its structure and generate new XML data files that conform to the same structure, with a high degree of confidence the new XML data files will pass validation.
  • XML is a strongly-typed format,
    which means the schema definition file specifies the data type of each element. When importing the data, the application can check the schema definition to identify the data type to import it as. We no longer run the risk of the product code 01-03 being imported as a date.
  • XML is a global format.
    There is only one way to express a number in an XML file (with US number formats) and only one way to express a date. We no longer have to check whether a csv file was created with US or French settings and adjust our processing of it accordingly.
  • XML is a standard format.
    The way in which the content of an XML file is defined has been specified by the World Wide Web Consortium (W3C). This allows applications (including Excel 2003/2007/2010/2013) to read, understand and validate the structure of an XML file and create files that conform to the specified structure. It also allows different applications to read, write, understand and validate the same XML files, allowing us to share data between applications in an extremely robust manner.

 


Comments

All comments about this page:


Comment by: rahmat (10-2-2011 10:23:55) deeplink to this comment

i dont use xml. i have no idea about it.
what is its function and what purpose it need?
how xml is used?


Comment by: Jan Karel Pieterse (11-2-2011 04:00:31) deeplink to this comment

Hi Rahmat,

You'll know when you need xml, it is when you get an xml file from some source and you're asked to work with the data in that file.
As long as you do not have any xml files to work with, you don't need this option of course.


Comment by: yasser latif from pakistan (25-3-2011 04:25:14) deeplink to this comment

why we can't save the xmlDocument utf-8 whole contents as a singel string in xml field of Table in SQL Server.


Comment by: Jan Karel Pieterse (25-3-2011 05:14:59) deeplink to this comment

Hi Yasser,

I'm sorry, SQL Server is not my core expertise, Excel is.
Try asking your question here:
http://www.sqlteam.com/forums/


Comment by: Sandeep (12-8-2012 11:27:26) deeplink to this comment

XML is introduced for data storing and transferring is it true of false


Comment by: Jan Karel Pieterse (13-8-2012 11:34:42) deeplink to this comment

Hi Sandeep,

Yes and no. XML was introduced to be able to create structured data files, with the capability to mark what the data is about. It eases exchanging data between systems, simply because you can open an XML file and by reading its content understand whet it contains.

Of course this only works if people use descriptive and clear tag names and adhere to the xml rules.


Have a question, comment or suggestion? Then please use this form.

If your question is not directly related to this web page, but rather a more general "How do I do this" Excel question, then I advise you to ask your question here: www.eileenslounge.com.




To post VBA code in your comment, use [VB] tags, like this: [VB]Code goes here[/VB].