The Excel File Format
Pages in this article
Working With Worksheet Data In An Excel xlsx/xlsm File
In this chapter I'll explain how you can read and write to a cell on a worksheet by editing the xml inside an Excel file's internals directly.
Finding the proper worksheet
First a tip about how to get at the content. An Excel xlsx/xlsm file is nothing more than a zip container which holds a couple of folders and files. You can view those simply by adding .zip behind the filename. Then you right-click the file and select Explore.
The XML package in an Excel 2007 file is structured like this:
Contents of an Excel xlsm file.
In this structure, we need to dive into the "xl" folder, which (in the sample file I created) contains:
Contents of "xl" folder
The file Workbook.xml contains this section which is relevant to this chapter:
Part of the XML in "Workbook.xml"
Let's assume we want to add something to the worksheet "Comments". We must find which xml file contains its data. Key element here is the r:id part in the xml shown above, the r:id of worksheet "Comments" equals "rId7".
In the folder xl\_rels there is a file called "Workbook.xml.rels", which contains this snippet of xml:
Section in "Workbook.xml.rels" which shows us which file belongs to our
worksheet
As can be seen, the worksheet "Comments" (rId7) is represented by a file named "sheet7.xml", which is one of the files listed in the folder "worksheets":
folder xl\worksheets showing its contents
Reading a cell's value
The worksheet "Comments" contains this:
Content of worksheet "Comments" (cell B2 contains a cell comment)
Which in turn is reflected in sheet7.xml as:
XML in sheet7.xml (irrelevant part collapsed) showing cells A1, A2 and
A3
Suppose we’re interested in the content of cell A1. How do we find that content?
Because it says: <c r="A1" t="s">
We know that cell A1 contains a string (t="s"). The next line: <v>28</v> tells us, we need to find the item with index 28 inside the file called "xl/SharedStrings.xml"
The top of that file looks like this:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
count="41" uniqueCount="46">
So there are 46 unique strings in this workbook. Since the index starts at zero and the sheet7.xml says we need index # 28, we need the 29th "<si>" entry in that XML file:
Not very hard!
Adding Text to a cell
Now suppose we want to add a new text entry to a row in sheet "Comments"?
Let’s also assume the text is not yet included in the SharedStrings.xml file.
First we modify the string count at the top of that file. Change:
<?xml version="1.0" encoding="UTF-8" standalone="yes"
?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
count="41" uniqueCount="46">
to:
<?xml version="1.0" encoding="UTF-8" standalone="yes"
?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
count="42" uniqueCount="47">
Now we can add our new string to this file. At position 29 of the file insert:
Addition of a new string in the sharedStrings.xml file
Done with sharedStrings.xml. Now to the changes in sheet7.xml.
Here is the relevant part of the sheet7.xml file again:
<row r="1" spans="1:2">
<c r="A1" s="33" t="s">
<v>28</v>
</c>
</row>
<row r="2" spans="1:2">
<c r="A2">
<v>12</v>
</c>
</row>
<row r="3" spans="1:2">
<c r="A3" s="34">
<v>39218</v>
</c>
</row>
To add a row with our new string, we add the bold part:
<row r="1" spans="1:2">
<c r="A1" s="33" t="s">
<v>28</v>
</c>
</row>
<row r="2" spans="1:2">
<c r="A2">
<v>12</v>
</c>
</row>
<row r="3" spans="1:2">
<c r="A3" s="34">
<v>39218</v>
</c>
</row>
<row r="4" spans="1:2">
<c r="A4" s="33" t="s">
<v>29</v>
</c>
</row>
Done!! Here is what the sheet looks like when opened in Excel:
The result of our xml editing displayed in Excel
Adding numbers to a cell
Adding a number to a cell works similar to adding text, but with two differences: The numbers are kept in the Sheet?.xml itself (no reference to sharedStrings.xml) and you can add a number format.
A cell with number format "general" and no special formatting applied is represented by this xml in sheet7.xml:
<row r="5" spans="1:2">
<c r="A5">
<v>12</v>
</c>
</row>
Cell formats (styles) are referred to by an index number, like this (showing the date in cell A3 now):
<row r="3" spans="1:2">
<c r="A3" s="34">
<v>39218</v>
</c>
</row>
This happens to be a date style.
The 34 index points to another file within the package: Styles.xml.
This file starts out with a list of (custom) number formats. Then
further down, a cell formats cross reference is shown, called cellXfs,
which -if applicable- refers back to the custom number format list. The
s="34" within Sheet?.xml refers to one of the elements within this node:
Another zero based list. So number 34 is the 35th item (and last) of that list:
You can find more information about implicit style formats in the Standard ECMA-376 Office Open XML Formats: 2nd Edition Part 2.
This cell format cross reference in turn references the numFormatId’s list, the FontId’s list, the fillId’s list and the borderId’s list, which are all contained within Styles.xml.
As you can see, adding formatted cells to a worksheet isn’t that straightforward. You will have to check whether your specific cell formatting combination is already in the cell formatting cross reference (in Styles.xml), which involves checking all individual formatting references for their respective elements.
If your format combination is new, you’ll have to extend any formatting elements’ list with the new format and update the style crossreference. Of course all "count" arguments of these lists need to be updated as well.
Download
I have made the file used in this article available for download:
Comments
All comments about this page:
Comment by: XL-Dennis (18-5-2007 02:38:15) deeplink to this comment
J K,
Check out the nifty tool XML Notepad from MSFT: http://www.microsoft.com/downloads/details.aspx?familyid=72d6aa49-787d-4118-ba5f-4f30fe913628&displaylang=en
Kind regards,
Dennis
Comment by: Ken Puls (22-5-2007 08:38:20) deeplink to this comment
Very nice disection, Jan Karel! Quite interesting. :)
Comment by: Jan Karel Pieterse (22-5-2007 10:08:21) deeplink to this comment
Hi Ken,
Thanks!
Comment by: Mangesh (3-8-2007 04:51:54) deeplink to this comment
Hi, The page is quite helpful, I would like to remove Home tab from Excel 2007, How to do that, if possible please reply.
regards
-Mangesh
Comment by: Jan Karel Pieterse (3-8-2007 05:10:02) deeplink to this comment
Hi Mangesh,
See this page for information about modifying the ribbon:
<a href="http://pschmid.net/blog/category/ribbonx/">
pschmid.net</a>
Comment by: Frank Diaz (19-9-2007 10:35:06) deeplink to this comment
I realize I am In the wrong location but I can not find the right one to list my problem. The is my problem. When I open excel I get a message that say's Compile error in hidden module: AutoExeNew. Also when I call I get a message that say's compile error in hidden module:DistMon. I am yusng Eaxcel 7000. My old excel worksheets using Excel 2000 have been affected when converted to Excel 20007. I have no protected files.
Can you help.
Comment by: Lebo Bucibo (4-9-2008 02:05:48) deeplink to this comment
Is it possible at all to insert string literals inside the cells without using
share strings. It seems not to work in my exprience
Comment by: Jan Karel Pieterse (4-9-2008 02:54:36) deeplink to this comment
Hi Lebo,
As far as I know, no: all strings (this includes formulas!) are kept in the
SharedStrings.xml file and are referred to from the cell using their index.
Comment by: JM (30-10-2008 08:01:04) deeplink to this comment
I use drop-down lists in a Word 2007 USERFORM and the data comes from an Excel
sheet. I would like to know if it is possible to get the data in XML format from
excel and include it in the Word Document 'docx'/zip file and map it to the list in
the word userform?
This way I don't have to open Excel each time I need the data.
Thanks,
Jose
Comment by: Jan Karel Pieterse (31-10-2008 02:20:27) deeplink to this comment
Hi JM,
I doubt if this is feasible. It certainly is a lot harder than getting the
information from the excel file directly. You can use ADO to fetch the data from the
excel file wiothout opening the file. The code below gets the data from a range name
named "Addresses" into a variant array:
Function GetAddresses() As Variant
Dim sQuery As String
sQuery = "SELECT Addresses.FirstName, Addresses.LastName,
Addresses.JobTitle, Addresses.Company, " & _
"Addresses.Address, Addresses.PlaceAndCode, Addresses.State,
Addresses.Country, Addresses.WorkPhone, " & _
"Addresses.PrivatePhone, Addresses.Fax, Addresses.Mobile, Addresses.Email,
Addresses.Website, " & _
"Addresses.BTW" & _
" FROM Addresses Addresses" & _
" ORDER BY Addresses.Company, Addresses.LastName, Addresses.FirstName"
GetAddresses = ADOImportFromExcelTable("c:\data\clients\YourFile.xls",
sQuery)
End Function
Private Function ADOImportFromExcelTable(DBFullName As String, sQuery As String) As
Variant
Dim vResult As Variant
Dim cn As ADODB.Connection
Dim rs As ADODB.Recordset
Dim intColIndex As Integer
' open the database
Set cn = New ADODB.Connection
cn.Open "DRIVER={Microsoft Excel Driver (*.xls)};" & "ReadOnly=1;DBQ=" &
DBFullName
Set rs = New ADODB.Recordset
With rs
.Open sQuery, cn, , , adCmdText
vResult = .GetRows
End With
rs.Close
Set rs = Nothing
cn.Close
Set cn = Nothing
ADOImportFromExcelTable = vResult
End Function
Comment by: Eku tap (22-7-2009 17:22:23) deeplink to this comment
Hi there
I need help with Data analysis:
for example if you are a District sales manager and you are overloaded with data that you need to compile into useable data and you are looking to use Excel to help you do this. How will you determine where to start and what tools can you use within Excel to help you organize everything and make decisions?
Comment by: Jan Karel Pieterse (23-7-2009 01:55:31) deeplink to this comment
Hi Eku,
I expect you will benefit by using pivot tables for your reporting.
Check out Debra Dalgleish' website for lots of information on pivot tables, start here:
http://www.contextures.com/Pubn03.html
Comment by: Eku tap (23-7-2009 18:40:32) deeplink to this comment
Thank you so much
i really appreciated!!!
Comment by: Maruti Prasad (8-9-2010 06:51:01) deeplink to this comment
sir
i want to know how i can copy same row data from various excel workbooks into one sheet
Comment by: Jan Karel Pieterse (8-9-2010 23:10:22) deeplink to this comment
Hi Maruti,
The sample code below demonstrates how to enable you to select some files, after which the content of row 1 of sheet1 of each file is copied to Sheet1 of the file with the macro:
Sub GetOpenFileNameExample3()
Dim lCount As Long
Dim vFilename As Variant
Dim sPath As String
Dim lFilecount As Long
sPath = "c:\windows\temp\"
ChDrive sPath
ChDir sPath
vFilename = Application.GetOpenFilename("Microsoft Excel files (*.xls),*.xls", , "Please select the file(s) to open", , True)
If TypeName(vFilename) = "Boolean" Then Exit Sub
For lCount = LBound(vFilename) To UBound(vFilename)
ImportThisOne CStr(vFilename(lCount))
Next
End Sub
Sub ImportThisOne(sFileName As String)
Dim oBook As Workbook
Workbooks.Open sFileName
Set oBook = ActiveWorkbook
'Now do your processing on the newly imported sheet
'Copy Row1, Sheet1 into this workbook's Sheet1, below last filled cell on column A
oBook.Worksheets(1).Range("1:1").Copy _
Destination:=ThisWorkbook.Worksheets("Sheet1").Range("A" & _
ThisWorkbook.Worksheets("Sheet1").Rows.Count).End(xlUp).Offset(1)
'close file, do not save changes
oBook.Close False
Set oBook = Nothing
End Sub
Comment by: Tom (14-4-2011 12:49:42) deeplink to this comment
Thanks so much for a very useful article. I tried to follow along though and am struggling a bit on one piece which also leaves me with questions.
1. In your example you added a 29th item in the sharedString.xml file which you then referenced in row 4 of the sheet7.xml part. What I don't understand is how adding this here does not throw off the references of everything that refers to a string after 29. The previous string 29 is now string 30 and so on, correct? So to simply add a new string at 29 should throw all the following strings from 30 to 70 off by one unless all references to them were updated as well, is this not right? What am I misunderstanding her?
2. Given files like sharedString.xml, is there an editor somewhere that would allow me to easily see what the index of something is? For my manual editing the counting of strings doesn't seem very reliable.
3. I've also hit an issue where the editing of some xml files makes them apparently unusable. If I take the original of a file and a copy that has been opened and resaved, the original will function well in my .xlsm file, the resaved version will cause errors. I can't tell any differences between the files though, is there an equivalent of unix/windows carriage return problems that occurs with xml? Is this perchance a common problem with a known fix?
Thanks again for a great article, for someone jumping into this from scratch with only vba experience and no xml this article was exactly what I was looking for, thank you so much!
Comment by: Jan Karel Pieterse (15-4-2011 03:01:47) deeplink to this comment
Hi Tom,
I'm sorry if I confused you. My example assumes that the string to be added will be the last one in the sharedstrings xml; so if there are 30, I add a 31st. That prevents the difficulty you describe.
Comment by: Paul Farrell (10-1-2012 13:15:50) deeplink to this comment
This approach works fine when the Workbook that is doing the Processing and the XML source file are different files.
There is an issue when the files are the same as the unzip function will not work on the currently open Workbook.
Do you have any workarounds for this scenario?
Comment by: Jan Karel Pieterse (11-1-2012 04:42:54) deeplink to this comment
Hi Paul,
Unfortunately that can only be done on files which are not open.
Comment by: John (2-5-2012 01:34:25) deeplink to this comment
JKP Said (9/4/2008 2:54:36 AM)
"As far as I know, no: all strings (this includes formulas!) are kept in the SharedStrings.xml file and are referred to from the cell using their index."
In the 2010 version of Excel, are the formulae still (always) stored in the SharedStrings.xml file?
Thanks
JR
Comment by: Jan Karel Pieterse (2-5-2012 03:32:22) deeplink to this comment
Hi John,
I just checked a recent Excel 2010 file and it looks like the formulas are stored in the respective Sheet??.xml files, not in the sharedstrings.xml file.
So I am not sure whether my previous remark was correct!
Comment by: John (2-5-2012 18:38:08) deeplink to this comment
Thank you for your reply. Your previous post/comment gave me hope that there might be a simple way to make a single change to (one or more) widely used formulae in a huge workbook, that would then propagate across all instances of its use. Back to the drawing board, I suppose... Any hints?
Comment by: Jan Karel Pieterse (2-5-2012 22:00:11) deeplink to this comment
Hi John,
No, I'm afraid not through that route. But you can always use Search and replace. Or if the formula is really the same (only works on a single worksheet), you can define a range name containing the formula and use the range name. This is tricky to do with relative references however.
Comment by: ashok (11-4-2013 12:43:32) deeplink to this comment
some of my excel/ word sheet. i am format my pc and now they could not open. it message..file format or extension is not valid or curpted. please tell me how to open my files.
Comment by: Jan Karel Pieterse (12-4-2013 12:59:18) deeplink to this comment
Hi Ashok,
Have a look at this page:
www.jkp-ads.com/articles/corruptfiles.asp
Have a question, comment or suggestion? Then please use this form.
If your question is not directly related to this web page, but rather a more general "How do I do this" Excel question, then I advise you to ask your question here: www.eileenslounge.com.