Skip to content

Commit e7a243e

Browse files
author
github-actions
committed
Merge branch 'main' into live
2 parents 29c3832 + 4986e62 commit e7a243e

File tree

6 files changed

+470
-0
lines changed

6 files changed

+470
-0
lines changed
Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
---
2+
api_name:
3+
- Microsoft.Office.DocumentFormat.OpenXML.Packaging
4+
api_type:
5+
- schema
6+
ms.assetid: 2ad4855c-1c83-4dab-b93f-2bae13fac644
7+
title: 'How to: Copy a Worksheet Using SAX (Simple API for XML)'
8+
ms.suite: office
9+
10+
ms.author: o365devx
11+
author: o365devx
12+
ms.topic: conceptual
13+
ms.date: 04/01/2025
14+
ms.localizationpriority: high
15+
---
16+
# Copy a Worksheet Using SAX (Simple API for XML)
17+
18+
This topic shows how to use the the Open XML SDK for Office to programmatically copy a large worksheet
19+
using SAX (Simple API for XML). For more information about the basic structure of a `SpreadsheetML`
20+
document, see [Structure of a SpreadsheetML document](structure-of-a-spreadsheetml-document.md).
21+
22+
------------------------------------
23+
## Why Use the SAX Approach?
24+
25+
The Open XML SDK provides two ways to parse Office Open XML files: the Document Object Model (DOM) and
26+
the Simple API for XML (SAX). The DOM approach is designed to make it easy to query and parse Open XML
27+
files by using strongly-typed classes. However, the DOM approach requires loading entire Open XML parts into
28+
memory, which can lead to slower processing and `Out of Memory` exceptions when working with very large parts.
29+
The SAX approach reads in the XML in an Open XML part one element at a time without reading in the entire part
30+
into memory giving noncached, forward-only access to XML data, which makes it a better choice when reading
31+
very large parts, such as a <xref:DocumentFormat.OpenXml.Packaging.WorksheetPart> with hundreds of thousands of rows.
32+
33+
## Using the DOM Approach
34+
35+
Using the DOM approach, we can take advantage of the Open XML SDK's strongly typed classes. The first step
36+
is to access the package's `WorksheetPart` and make sure that it is not null.
37+
38+
### [C#](#tab/cs-1)
39+
[!code-csharp[](../../samples/spreadsheet/copy_worksheet_with_sax/cs/Program.cs#snippet1)]
40+
41+
### [Visual Basic](#tab/vb-1)
42+
[!code-vb[](../../samples/spreadsheet/copy_worksheet_with_sax/vb/Program.vb#snippet1)]
43+
***
44+
45+
Once it is determined that the `WorksheetPart` to be copied is not null, add a new `WorksheetPart` to copy it to.
46+
Then clone the `WorksheetPart`'s <xref:DocumentFormat.OpenXml.Spreadsheet.Worksheet> and assign the cloned
47+
`Worksheet` to the new `WorksheetPart`'s Worksheet property.
48+
49+
### [C#](#tab/cs-2)
50+
[!code-csharp[](../../samples/spreadsheet/copy_worksheet_with_sax/cs/Program.cs#snippet2)]
51+
52+
### [Visual Basic](#tab/vb-2)
53+
[!code-vb[](../../samples/spreadsheet/copy_worksheet_with_sax/vb/Program.vb#snippet2)]
54+
***
55+
56+
At this point, the new `WorksheetPart` has been added, but a new <xref:DocumentFormat.OpenXml.Spreadsheet.Sheet>
57+
element must be added to the `WorkbookPart`'s <xref:DocumentFormat.OpenXml.Spreadsheet.Sheets>'s
58+
child elements for it to display. To do this, first find the new `WorksheetPart`'s Id and
59+
create a new sheet Id by incrementing the `Sheets` count by one then append a new `Sheet`
60+
child to the `Sheets` element. With this, the copied Worksheet is added to the file.
61+
62+
### [C#](#tab/cs-3)
63+
[!code-csharp[](../../samples/spreadsheet/copy_worksheet_with_sax/cs/Program.cs#snippet3)]
64+
65+
### [Visual Basic](#tab/vb-3)
66+
[!code-vb[](../../samples/spreadsheet/copy_worksheet_with_sax/vb/Program.vb#snippet3)]
67+
***
68+
69+
## Using the SAX Approach
70+
71+
The SAX approach works on parts, so using the SAX approach, the first step is the same.
72+
Access the package's <xref:DocumentFormat.OpenXml.Packaging.WorksheetPart> and make sure
73+
that it is not null.
74+
75+
### [C#](#tab/cs-4)
76+
[!code-csharp[](../../samples/spreadsheet/copy_worksheet_with_sax/cs/Program.cs#snippet4)]
77+
78+
### [Visual Basic](#tab/vb-4)
79+
[!code-vb[](../../samples/spreadsheet/copy_worksheet_with_sax/vb/Program.vb#snippet4)]
80+
***
81+
82+
With SAX, we don't have access to the <xref:DocumentFormat.OpenXml.OpenXmlElement.Clone*>
83+
method. So instead, start by adding a new `WorksheetPart` to the `WorkbookPart`.
84+
85+
### [C#](#tab/cs-5)
86+
[!code-csharp[](../../samples/spreadsheet/copy_worksheet_with_sax/cs/Program.cs#snippet5)]
87+
88+
### [Visual Basic](#tab/vb-5)
89+
[!code-vb[](../../samples/spreadsheet/copy_worksheet_with_sax/vb/Program.vb#snippet5)]
90+
***
91+
92+
Then create an instance of the <xref:DocumentFormat.OpenXml.OpenXmlPartReader> with the
93+
original worksheet part and an instance of the <xref:DocumentFormat.OpenXml.OpenXmlPartWriter>
94+
with the newly created worksheet part.
95+
96+
### [C#](#tab/cs-6)
97+
[!code-csharp[](../../samples/spreadsheet/copy_worksheet_with_sax/cs/Program.cs#snippet6)]
98+
99+
### [Visual Basic](#tab/vb-6)
100+
[!code-vb[](../../samples/spreadsheet/copy_worksheet_with_sax/vb/Program.vb#snippet6)]
101+
***
102+
103+
Then read the elements one by one with the <xref:DocumentFormat.OpenXml.OpenXmlPartReader.Read*>
104+
method. If the element is a <xref:DocumentFormat.OpenXml.Spreadsheet.CellValue> the inner text
105+
needs to be explicitly added using the <xref:DocumentFormat.OpenXml.OpenXmlPartReader.GetText*>
106+
method to read the text, because the <xref:DocumentFormat.OpenXml.OpenXmlPartWriter.WriteStartElement*>
107+
does not write the inner text of an element. For other elements we only need to use the `WriteStartElement`
108+
method, because we don't need the other element's inner text.
109+
110+
### [C#](#tab/cs-7)
111+
[!code-csharp[](../../samples/spreadsheet/copy_worksheet_with_sax/cs/Program.cs#snippet7)]
112+
113+
### [Visual Basic](#tab/vb-7)
114+
[!code-vb[](../../samples/spreadsheet/copy_worksheet_with_sax/vb/Program.vb#snippet7)]
115+
***
116+
117+
At this point, the worksheet part has been copied to the newly added part, but as with the DOM
118+
approach, we still need to add a `Sheet` to the `Workbook`'s `Sheets` element. Because
119+
the SAX approach gives noncached, **forward-only** access to XML data, it is only possible to
120+
prepend element children, which in this case would add the new worksheet to the beginning instead
121+
of the end, changing the order of the worksheets. So the DOM approach is
122+
necessary here, because we want to append not prepend the new `Sheet` and since the `WorkbookPart` is
123+
not usually a large part, the performance gains would be minimal.
124+
125+
### [C#](#tab/cs-8)
126+
[!code-csharp[](../../samples/spreadsheet/copy_worksheet_with_sax/cs/Program.cs#snippet8)]
127+
128+
### [Visual Basic](#tab/vb-8)
129+
[!code-vb[](../../samples/spreadsheet/copy_worksheet_with_sax/vb/Program.vb#snippet8)]
130+
***
131+
132+
## Sample Code
133+
134+
Below is the sample code for both the DOM and SAX approaches to copying the data from one sheet
135+
to a new one and adding it to the Spreadsheet document. While the DOM approach is simpler
136+
and in many cases the preferred choice, with very large documents the SAX approach is better
137+
given that it is faster and can prevent `Out of Memory` exceptions. To see the difference,
138+
create a spreadsheet document with many (10,000+) rows and check the results of the
139+
<xref:System.Diagnostics.Stopwatch> to check the difference in execution time. Increase the
140+
number of rows to 100,000+ to see even more significant performance gains.
141+
142+
### DOM Approach
143+
144+
### [C#](#tab/cs-0)
145+
[!code-csharp[](../../samples/spreadsheet/copy_worksheet_with_sax/cs/Program.cs#snippet0)]
146+
147+
### [Visual Basic](#tab/vb-0)
148+
[!code-vb[](../../samples/spreadsheet/copy_worksheet_with_sax/vb/Program.vb#snippet0)]
149+
***
150+
151+
### SAX Approach
152+
153+
### [C#](#tab/cs-99)
154+
[!code-csharp[](../../samples/spreadsheet/copy_worksheet_with_sax/cs/Program.cs#snippet99)]
155+
156+
### [Visual Basic](#tab/vb-99)
157+
[!code-vb[](../../samples/spreadsheet/copy_worksheet_with_sax/vb/Program.vb#snippet99)]
158+
***

samples/samples.sln

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -320,6 +320,9 @@ Project("{F184B08F-C81C-45F6-A57F-5ABD9991F28F}") = "working_with_tables_vb", "w
320320
EndProject
321321
Project("{F184B08F-C81C-45F6-A57F-5ABD9991F28F}") = "insert_a_picture_vb", "word\insert_a_picture\vb\insert_a_picture_vb.vbproj", "{6170C4E1-A109-435A-BF59-026C85B3BD9C}"
322322
EndProject
323+
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "copy_worksheet_with_sax_cs", "spreadsheet\copy_worksheet_with_sax\cs\copy_worksheet_with_sax_cs.csproj", "{0AA6B9DD-2A2C-0E96-1052-6F4AC44B3F5D}"
324+
EndProject
325+
Project("{F184B08F-C81C-45F6-A57F-5ABD9991F28F}") = "copy_worksheet_with_sax_vb", "spreadsheet\copy_worksheet_with_sax\vb\copy_worksheet_with_sax_vb.vbproj", "{2DD90EFB-7F2A-497B-A0F4-EE5F62A49BA4}"
323326
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "replace_text_with_sax_cs", "word\replace_text_with_sax\cs\replace_text_with_sax_cs.csproj", "{4C514047-64B5-1383-4564-B827B846A6A7}"
324327
EndProject
325328
Project("{F184B08F-C81C-45F6-A57F-5ABD9991F28F}") = "replace_text_with_sax_vb", "word\replace_text_with_sax\vb\replace_text_with_sax_vb.vbproj", "{6EB91F44-EC13-5354-0450-9A2687C3B169}"
@@ -942,6 +945,14 @@ Global
942945
{6170C4E1-A109-435A-BF59-026C85B3BD9C}.Debug|Any CPU.Build.0 = Debug|Any CPU
943946
{6170C4E1-A109-435A-BF59-026C85B3BD9C}.Release|Any CPU.ActiveCfg = Release|Any CPU
944947
{6170C4E1-A109-435A-BF59-026C85B3BD9C}.Release|Any CPU.Build.0 = Release|Any CPU
948+
{0AA6B9DD-2A2C-0E96-1052-6F4AC44B3F5D}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
949+
{0AA6B9DD-2A2C-0E96-1052-6F4AC44B3F5D}.Debug|Any CPU.Build.0 = Debug|Any CPU
950+
{0AA6B9DD-2A2C-0E96-1052-6F4AC44B3F5D}.Release|Any CPU.ActiveCfg = Release|Any CPU
951+
{0AA6B9DD-2A2C-0E96-1052-6F4AC44B3F5D}.Release|Any CPU.Build.0 = Release|Any CPU
952+
{2DD90EFB-7F2A-497B-A0F4-EE5F62A49BA4}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
953+
{2DD90EFB-7F2A-497B-A0F4-EE5F62A49BA4}.Debug|Any CPU.Build.0 = Debug|Any CPU
954+
{2DD90EFB-7F2A-497B-A0F4-EE5F62A49BA4}.Release|Any CPU.ActiveCfg = Release|Any CPU
955+
{2DD90EFB-7F2A-497B-A0F4-EE5F62A49BA4}.Release|Any CPU.Build.0 = Release|Any CPU
945956
{4C514047-64B5-1383-4564-B827B846A6A7}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
946957
{4C514047-64B5-1383-4564-B827B846A6A7}.Debug|Any CPU.Build.0 = Debug|Any CPU
947958
{4C514047-64B5-1383-4564-B827B846A6A7}.Release|Any CPU.ActiveCfg = Release|Any CPU
@@ -1107,6 +1118,8 @@ Global
11071118
{A43A75AB-D6B6-4D31-99F7-6951AFEF502D} = {D207D3D7-FD4D-4FD4-A7D0-79A82086FB6F}
11081119
{4EB1FCC9-E1E2-4D2A-ACF9-A3A31AA947A5} = {D207D3D7-FD4D-4FD4-A7D0-79A82086FB6F}
11091120
{6170C4E1-A109-435A-BF59-026C85B3BD9C} = {D207D3D7-FD4D-4FD4-A7D0-79A82086FB6F}
1121+
{0AA6B9DD-2A2C-0E96-1052-6F4AC44B3F5D} = {7ACDC26B-C774-4004-8553-87E862D1E71F}
1122+
{2DD90EFB-7F2A-497B-A0F4-EE5F62A49BA4} = {7ACDC26B-C774-4004-8553-87E862D1E71F}
11101123
{4C514047-64B5-1383-4564-B827B846A6A7} = {D207D3D7-FD4D-4FD4-A7D0-79A82086FB6F}
11111124
{6EB91F44-EC13-5354-0450-9A2687C3B169} = {D207D3D7-FD4D-4FD4-A7D0-79A82086FB6F}
11121125
EndGlobalSection
Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+

2+
3+
using DocumentFormat.OpenXml;
4+
using DocumentFormat.OpenXml.Packaging;
5+
using DocumentFormat.OpenXml.Spreadsheet;
6+
using System.Diagnostics;
7+
8+
CopySheetDOM(args[0]);
9+
CopySheetSAX(args[1]);
10+
11+
// <Snippet0>
12+
void CopySheetDOM(string path)
13+
{
14+
Console.WriteLine("Starting DOM method");
15+
16+
Stopwatch sw = new();
17+
sw.Start();
18+
// <Snippet1>
19+
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(path, true))
20+
{
21+
// Get the first sheet
22+
WorksheetPart? worksheetPart = spreadsheetDocument.WorkbookPart?.WorksheetParts?.FirstOrDefault();
23+
24+
if (worksheetPart is not null)
25+
// </Snippet1>
26+
{
27+
// <Snippet2>
28+
// Add a new WorksheetPart
29+
WorksheetPart newWorksheetPart = spreadsheetDocument.WorkbookPart!.AddNewPart<WorksheetPart>();
30+
31+
// Make a copy of the original worksheet
32+
Worksheet newWorksheet = (Worksheet)worksheetPart.Worksheet.Clone();
33+
34+
// Add the new worksheet to the new worksheet part
35+
newWorksheetPart.Worksheet = newWorksheet;
36+
// </Snippet2>
37+
38+
Sheets? sheets = spreadsheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>();
39+
40+
if (sheets is null)
41+
{
42+
spreadsheetDocument.WorkbookPart.Workbook.AddChild(new Sheets());
43+
}
44+
45+
// <Snippet3>
46+
// Find the new WorksheetPart's Id and create a new sheet id
47+
string id = spreadsheetDocument.WorkbookPart.GetIdOfPart(newWorksheetPart);
48+
uint newSheetId = (uint)(sheets!.ChildElements.Count + 1);
49+
50+
// Append a new Sheet with the WorksheetPart's Id and sheet id to the Sheets element
51+
sheets.AppendChild(new Sheet() { Name = "My New Sheet", SheetId = newSheetId, Id = id });
52+
// </Snippet3>
53+
}
54+
}
55+
56+
sw.Stop();
57+
58+
Console.WriteLine($"DOM method took {sw.Elapsed.TotalSeconds} seconds");
59+
}
60+
// </Snippet0>
61+
62+
// <Snippet99>
63+
void CopySheetSAX(string path)
64+
{
65+
Console.WriteLine("Starting SAX method");
66+
67+
Stopwatch sw = new();
68+
sw.Start();
69+
// <Snippet4>
70+
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(path, true))
71+
{
72+
// Get the first sheet
73+
WorksheetPart? worksheetPart = spreadsheetDocument.WorkbookPart?.WorksheetParts?.FirstOrDefault();
74+
75+
if (worksheetPart is not null)
76+
// </Snippet4>
77+
{
78+
// <Snippet5>
79+
WorksheetPart newWorksheetPart = spreadsheetDocument.WorkbookPart!.AddNewPart<WorksheetPart>();
80+
// </Snippet5>
81+
82+
// <Snippet6>
83+
using (OpenXmlReader reader = OpenXmlPartReader.Create(worksheetPart))
84+
using (OpenXmlWriter writer = OpenXmlPartWriter.Create(newWorksheetPart))
85+
// </Snippet6>
86+
{
87+
// <Snippet7>
88+
// Write the XML declaration with the version "1.0".
89+
writer.WriteStartDocument();
90+
91+
// Read the elements from the original worksheet part
92+
while (reader.Read())
93+
{
94+
// If the ElementType is CellValue it's necessary to explicitly add the inner text of the element
95+
// or the CellValue element will be empty
96+
if (reader.ElementType == typeof(CellValue))
97+
{
98+
if (reader.IsStartElement)
99+
{
100+
writer.WriteStartElement(reader);
101+
writer.WriteString(reader.GetText());
102+
}
103+
else if (reader.IsEndElement)
104+
{
105+
writer.WriteEndElement();
106+
}
107+
}
108+
// For other elements write the start and end elements
109+
else
110+
{
111+
if (reader.IsStartElement)
112+
{
113+
writer.WriteStartElement(reader);
114+
}
115+
else if (reader.IsEndElement)
116+
{
117+
writer.WriteEndElement();
118+
}
119+
}
120+
}
121+
// </Snippet7>
122+
}
123+
124+
// <Snippet8>
125+
Sheets? sheets = spreadsheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>();
126+
127+
if (sheets is null)
128+
{
129+
spreadsheetDocument.WorkbookPart.Workbook.AddChild(new Sheets());
130+
}
131+
132+
string id = spreadsheetDocument.WorkbookPart.GetIdOfPart(newWorksheetPart);
133+
uint newSheetId = (uint)(sheets!.ChildElements.Count + 1);
134+
135+
sheets.AppendChild(new Sheet() { Name = "My New Sheet", SheetId = newSheetId, Id = id });
136+
// </Snippet8>
137+
138+
sw.Stop();
139+
140+
Console.WriteLine($"SAX method took {sw.Elapsed.TotalSeconds} seconds");
141+
}
142+
}
143+
}
144+
// </Snippet99>
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
<Project Sdk="Microsoft.NET.Sdk">
2+
3+
<PropertyGroup>
4+
<OutputType>Exe</OutputType>
5+
<TargetFramework>net8.0</TargetFramework>
6+
<ImplicitUsings>enable</ImplicitUsings>
7+
<Nullable>enable</Nullable>
8+
</PropertyGroup>
9+
10+
</Project>

0 commit comments

Comments
 (0)