Skip to content

Commit 29c3832

Browse files
author
github-actions
committed
Merge branch 'main' into live
2 parents 55ee819 + 8b28a33 commit 29c3832

File tree

6 files changed

+272
-0
lines changed

6 files changed

+272
-0
lines changed
Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
---
2+
3+
api_name:
4+
- Microsoft.Office.DocumentFormat.OpenXML.Packaging
5+
api_type:
6+
- schema
7+
ms.assetid: 2f6f0f89-0ac0-4d40-9f1a-222caf074cf1
8+
title: 'How to: Replace Text in a Word Document Using SAX (Simple API for XML)'
9+
description: 'Learn how to replace text in a Word document using SAX (Simple API for XML)'
10+
ms.suite: office
11+
12+
ms.author: o365devx
13+
author: o365devx
14+
ms.topic: conceptual
15+
ms.date: 04/03/2025
16+
ms.localizationpriority: high
17+
---
18+
# Replace Text in a Word Document Using SAX (Simple API for XML)
19+
20+
This topic shows how to use the Open XML SDK to search and replace text in a Word document with the
21+
Open XML SDK using the Simple API for XML (SAX) approach. For more information about the basic structure
22+
of a `WordprocessingML` document, see [Structure of a WordprocessingML document](./structure-of-a-wordprocessingml-document.md).
23+
24+
## Why Use the SAX Approach?
25+
26+
The Open XML SDK provides two ways to parse Office Open XML files: the Document Object Model (DOM) and the Simple API for XML (SAX). The DOM approach is designed to make it easy to query and parse Open XML files by using strongly-typed classes. However, the DOM approach requires loading entire Open XML parts into memory, which can lead to slower processing and Out of Memory exceptions when working with very large parts. The SAX approach reads in the XML in an Open XML part one element at a time without reading in the entire part into memory giving noncached, forward-only access to the XML data, which makes it a better choice when reading very large parts.
27+
28+
## Accessing the MainDocumentPart
29+
30+
The text of a Word document is stored in the <xref:DocumentFormat.OpenXml.Packaging.MainDocumentPart>, so the first step to
31+
finding and replacing text is to access the Word document's `MainDocumentPart`. To do that we first use the `WordprocessingDocument.Open`
32+
method passing in the path to the document as the first parameter and a second parameter `true` to indicate that we
33+
are opening the file for editing. Then make sure that the `MainDocumentPart` is not null.
34+
35+
### [C#](#tab/cs-1)
36+
[!code-csharp[](../../samples/word/replace_text_with_sax/cs/Program.cs#snippet1)]
37+
38+
### [Visual Basic](#tab/vb-1)
39+
[!code-vb[](../../samples/word/replace_text_with_sax/vb/Program.vb#snippet1)]
40+
***
41+
42+
## Create Memory Stream, OpenXmlReader, and OpenXmlWriter
43+
44+
With the DOM approach to editing documents, the entire part is read into memory, so we can use the Open XML SDK's
45+
strongly typed classes to access the <xref:DocumentFormat.OpenXml.Wordprocessing.Text> class to access the
46+
document's text and edit it. The SAX approach, however, uses the <xref:DocumentFormat.OpenXml.OpenXmlPartReader>
47+
and <xref:DocumentFormat.OpenXml.OpenXmlPartWriter> classes, which access a part's stream with forward-only
48+
access. The advantage of this is that the entire part does not need to be loaded into memory, which is faster
49+
and uses less memory, but since the same part cannot be opened in multiple streams at the same time, we cannot create a
50+
<xref:DocumentFormat.OpenXml.OpenXmlReader> to read a part and a <xref:DocumentFormat.OpenXml.OpenXmlWriter> to edit
51+
the same part at the same time. The solution to this is to create an additional memory stream and write the
52+
updated part to the new memory stream then use the stream to update the part when `OpenXmlReader` and `OpenXmlWriter`
53+
have been disposed. In the code below we create the `MemoryStream` to store the updated part and create an
54+
`OpenXmlReader` for the `MainDocumentPart` and a `OpenXmlWriter` to write to the `MemoryStream`
55+
56+
### [C#](#tab/cs-2)
57+
[!code-csharp[](../../samples/word/replace_text_with_sax/cs/Program.cs#snippet2)]
58+
59+
### [Visual Basic](#tab/vb-2)
60+
[!code-vb[](../../samples/word/replace_text_with_sax/vb/Program.vb#snippet2)]
61+
***
62+
63+
## Reading the Part and Writing to the New Stream
64+
65+
Now that we have an `OpenXmlReader` to read the part and an `OpenXmlWriter` to write to the new `MemoryStream`
66+
we use the <xref:DocumentFormat.OpenXml.OpenXmlReader.Read*> method to read each element in the part. As
67+
each element is read in we check if it is of type `Text` and if it is, we use the <xrefDocumentFormat.OpenXml.OpenXmlReader.GetText*>
68+
method to access the text and use <xref:System.String.Replace*> to update the text. If it is not a
69+
`Text` element, then we write it to the stream unchanged.
70+
71+
> [!Note]
72+
> In a Word document text can be separated into multiple `Text` elements, so if you are replacing a
73+
> phrase and not a single word, it's best to replace one word at a time.
74+
75+
### [C#](#tab/cs-3)
76+
[!code-csharp[](../../samples/word/replace_text_with_sax/cs/Program.cs#snippet3)]
77+
78+
### [Visual Basic](#tab/vb-3)
79+
[!code-vb[](../../samples/word/replace_text_with_sax/vb/Program.vb#snippet3)]
80+
***
81+
82+
## Writing the New Stream to the MainDocumentPart
83+
84+
With the updated part written to the memory stream the last step is to set the `MemoryStream`'s
85+
position to 0 and use the <xref:DocumentFormat.OpenXml.Packaging.OpenXmlPart.FeedData*> method
86+
to replace the `MainDocumentPart` with the updated stream.
87+
88+
### [C#](#tab/cs-4)
89+
[!code-csharp[](../../samples/word/replace_text_with_sax/cs/Program.cs#snippet4)]
90+
91+
### [Visual Basic](#tab/vb-4)
92+
[!code-vb[](../../samples/word/replace_text_with_sax/vb/Program.vb#snippet4)]
93+
***
94+
95+
## Sample Code
96+
97+
Below is the complete sample code to replace text in a Word document using the SAX (Simple API for XML)
98+
approach.
99+
100+
### [C#](#tab/cs-0)
101+
[!code-csharp[](../../samples/word/replace_text_with_sax/cs/Program.cs#snippet0)]
102+
103+
### [Visual Basic](#tab/vb-0)
104+
[!code-vb[](../../samples/word/replace_text_with_sax/vb/Program.vb#snippet0)]
105+
***

samples/samples.sln

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -320,6 +320,10 @@ Project("{F184B08F-C81C-45F6-A57F-5ABD9991F28F}") = "working_with_tables_vb", "w
320320
EndProject
321321
Project("{F184B08F-C81C-45F6-A57F-5ABD9991F28F}") = "insert_a_picture_vb", "word\insert_a_picture\vb\insert_a_picture_vb.vbproj", "{6170C4E1-A109-435A-BF59-026C85B3BD9C}"
322322
EndProject
323+
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "replace_text_with_sax_cs", "word\replace_text_with_sax\cs\replace_text_with_sax_cs.csproj", "{4C514047-64B5-1383-4564-B827B846A6A7}"
324+
EndProject
325+
Project("{F184B08F-C81C-45F6-A57F-5ABD9991F28F}") = "replace_text_with_sax_vb", "word\replace_text_with_sax\vb\replace_text_with_sax_vb.vbproj", "{6EB91F44-EC13-5354-0450-9A2687C3B169}"
326+
EndProject
323327
Global
324328
GlobalSection(SolutionConfigurationPlatforms) = preSolution
325329
Debug|Any CPU = Debug|Any CPU
@@ -938,6 +942,14 @@ Global
938942
{6170C4E1-A109-435A-BF59-026C85B3BD9C}.Debug|Any CPU.Build.0 = Debug|Any CPU
939943
{6170C4E1-A109-435A-BF59-026C85B3BD9C}.Release|Any CPU.ActiveCfg = Release|Any CPU
940944
{6170C4E1-A109-435A-BF59-026C85B3BD9C}.Release|Any CPU.Build.0 = Release|Any CPU
945+
{4C514047-64B5-1383-4564-B827B846A6A7}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
946+
{4C514047-64B5-1383-4564-B827B846A6A7}.Debug|Any CPU.Build.0 = Debug|Any CPU
947+
{4C514047-64B5-1383-4564-B827B846A6A7}.Release|Any CPU.ActiveCfg = Release|Any CPU
948+
{4C514047-64B5-1383-4564-B827B846A6A7}.Release|Any CPU.Build.0 = Release|Any CPU
949+
{6EB91F44-EC13-5354-0450-9A2687C3B169}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
950+
{6EB91F44-EC13-5354-0450-9A2687C3B169}.Debug|Any CPU.Build.0 = Debug|Any CPU
951+
{6EB91F44-EC13-5354-0450-9A2687C3B169}.Release|Any CPU.ActiveCfg = Release|Any CPU
952+
{6EB91F44-EC13-5354-0450-9A2687C3B169}.Release|Any CPU.Build.0 = Release|Any CPU
941953
EndGlobalSection
942954
GlobalSection(SolutionProperties) = preSolution
943955
HideSolutionNode = FALSE
@@ -1095,6 +1107,8 @@ Global
10951107
{A43A75AB-D6B6-4D31-99F7-6951AFEF502D} = {D207D3D7-FD4D-4FD4-A7D0-79A82086FB6F}
10961108
{4EB1FCC9-E1E2-4D2A-ACF9-A3A31AA947A5} = {D207D3D7-FD4D-4FD4-A7D0-79A82086FB6F}
10971109
{6170C4E1-A109-435A-BF59-026C85B3BD9C} = {D207D3D7-FD4D-4FD4-A7D0-79A82086FB6F}
1110+
{4C514047-64B5-1383-4564-B827B846A6A7} = {D207D3D7-FD4D-4FD4-A7D0-79A82086FB6F}
1111+
{6EB91F44-EC13-5354-0450-9A2687C3B169} = {D207D3D7-FD4D-4FD4-A7D0-79A82086FB6F}
10981112
EndGlobalSection
10991113
GlobalSection(ExtensibilityGlobals) = postSolution
11001114
SolutionGuid = {721B3030-08D7-4412-9087-D1CFBB3F5046}
Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
using DocumentFormat.OpenXml;
2+
using DocumentFormat.OpenXml.Packaging;
3+
using System.IO;
4+
using DocumentFormat.OpenXml.Wordprocessing;
5+
6+
ReplaceTextWithSAX(args[0], args[1], args[2]);
7+
8+
// <Snippet0>
9+
void ReplaceTextWithSAX(string path, string textToReplace, string replacementText)
10+
{
11+
// <Snippet1>
12+
// Open the WordprocessingDocument for editing
13+
using (WordprocessingDocument wordprocessingDocument = WordprocessingDocument.Open(path, true))
14+
{
15+
// Access the MainDocumentPart and make sure it is not null
16+
MainDocumentPart? mainDocumentPart = wordprocessingDocument.MainDocumentPart;
17+
18+
if (mainDocumentPart is not null)
19+
// </Snippet1>
20+
{
21+
// <Snippet2>
22+
// Create a MemoryStream to store the updated MainDocumentPart
23+
using (MemoryStream memoryStream = new MemoryStream())
24+
{
25+
// Create an OpenXmlReader to read the main document part
26+
// and an OpenXmlWriter to write to the MemoryStream
27+
using (OpenXmlReader reader = OpenXmlPartReader.Create(mainDocumentPart))
28+
using (OpenXmlWriter writer = OpenXmlPartWriter.Create(memoryStream))
29+
// </Snippet2>
30+
{
31+
// <Snippet3>
32+
// Write the XML declaration with the version "1.0".
33+
writer.WriteStartDocument();
34+
35+
// Read the elements from the MainDocumentPart
36+
while (reader.Read())
37+
{
38+
// Check if the element is of type Text
39+
if (reader.ElementType == typeof(Text))
40+
{
41+
// If it is the start of an element write the start element and the updated text
42+
if (reader.IsStartElement)
43+
{
44+
writer.WriteStartElement(reader);
45+
46+
string text = reader.GetText().Replace(textToReplace, replacementText);
47+
48+
writer.WriteString(text);
49+
50+
}
51+
else
52+
{
53+
// Close the element
54+
writer.WriteEndElement();
55+
}
56+
}
57+
else
58+
// Write the other XML elements without editing
59+
{
60+
if (reader.IsStartElement)
61+
{
62+
writer.WriteStartElement(reader);
63+
}
64+
else if (reader.IsEndElement)
65+
{
66+
writer.WriteEndElement();
67+
}
68+
}
69+
}
70+
// </Snippet3>
71+
}
72+
// <Snippet4>
73+
// Set the MemoryStream's position to 0 and replace the MainDocumentPart
74+
memoryStream.Position = 0;
75+
mainDocumentPart.FeedData(memoryStream);
76+
// </Snippet4>
77+
}
78+
}
79+
}
80+
}
81+
// </Snippet0>
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
<Project Sdk="Microsoft.NET.Sdk"/>
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
Imports DocumentFormat.OpenXml
2+
Imports DocumentFormat.OpenXml.Packaging
3+
Imports System.IO
4+
Imports DocumentFormat.OpenXml.Wordprocessing
5+
6+
Module Program
7+
Sub Main(args As String())
8+
ReplaceTextWithSAX(args(0), args(1), args(2))
9+
End Sub
10+
11+
' <Snippet0>
12+
Sub ReplaceTextWithSAX(path As String, textToReplace As String, replacementText As String)
13+
' <Snippet1>
14+
' Open the WordprocessingDocument for editing
15+
Using wordprocessingDocument As WordprocessingDocument = WordprocessingDocument.Open(path, True)
16+
' Access the MainDocumentPart and make sure it is not null
17+
Dim mainDocumentPart As MainDocumentPart = wordprocessingDocument.MainDocumentPart
18+
19+
If mainDocumentPart IsNot Nothing Then
20+
' </Snippet1>
21+
' <Snippet2>
22+
' Create a MemoryStream to store the updated MainDocumentPart
23+
Using memoryStream As New MemoryStream()
24+
' Create an OpenXmlReader to read the main document part
25+
' and an OpenXmlWriter to write to the MemoryStream
26+
Using reader As OpenXmlReader = OpenXmlPartReader.Create(mainDocumentPart)
27+
Using writer As OpenXmlWriter = OpenXmlPartWriter.Create(memoryStream)
28+
' </Snippet2>
29+
' <Snippet3>
30+
' Write the XML declaration with the version "1.0".
31+
writer.WriteStartDocument()
32+
33+
' Read the elements from the MainDocumentPart
34+
While reader.Read()
35+
' Check if the element is of type Text
36+
If reader.ElementType Is GetType(Text) Then
37+
' If it is the start of an element write the start element and the updated text
38+
If reader.IsStartElement Then
39+
writer.WriteStartElement(reader)
40+
41+
Dim text As String = reader.GetText().Replace(textToReplace, replacementText)
42+
43+
writer.WriteString(text)
44+
Else
45+
' Close the element
46+
writer.WriteEndElement()
47+
End If
48+
Else
49+
' Write the other XML elements without editing
50+
If reader.IsStartElement Then
51+
writer.WriteStartElement(reader)
52+
ElseIf reader.IsEndElement Then
53+
writer.WriteEndElement()
54+
End If
55+
End If
56+
End While
57+
' </Snippet3>
58+
End Using
59+
End Using
60+
' <Snippet4>
61+
' Set the MemoryStream's position to 0 and replace the MainDocumentPart
62+
memoryStream.Position = 0
63+
mainDocumentPart.FeedData(memoryStream)
64+
' </Snippet4>
65+
End Using
66+
End If
67+
End Using
68+
End Sub
69+
' </Snippet0>
70+
End Module
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
<Project Sdk="Microsoft.NET.Sdk"/>

0 commit comments

Comments
 (0)