-
Notifications
You must be signed in to change notification settings - Fork 665
FEAT Add WordDocConverter for Word document generation #1365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
9027fdd
8daf888
9ffe8c9
9983804
ab9a5c0
2e6b7e7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -12,13 +12,14 @@ | |||
| # %% [markdown] | ||||
| # # 5. File Converters | ||||
| # | ||||
| # File converters transform text into file outputs such as PDFs. These converters are useful for packaging prompts into distributable formats. | ||||
| # File converters transform text into file outputs such as PDFs and Word documents. These converters are useful for packaging prompts into distributable formats. | ||||
| # | ||||
| # ## Overview | ||||
| # | ||||
| # This notebook covers: | ||||
| # | ||||
| # - **PDFConverter**: Convert text to PDF documents with templates or direct generation | ||||
| # - **WordDocConverter**: Convert text to Word documents (.docx) with direct generation or template-based injection | ||||
|
|
||||
| # %% [markdown] | ||||
| # ## PDFConverter | ||||
|
|
@@ -203,3 +204,88 @@ | |||
|
|
||||
| result = await attack.execute_async(objective="Modify existing PDF") # type: ignore | ||||
| await ConsoleAttackResultPrinter().print_conversation_async(result=result) # type: ignore | ||||
|
|
||||
| # %% [markdown] | ||||
| # ## WordDocConverter | ||||
| # | ||||
| # The `WordDocConverter` generates Word documents (.docx) from text using `python-docx`. It supports two modes: | ||||
| # | ||||
| # 1. **Direct generation**: Convert plain text strings into Word documents. The prompt becomes the document content. | ||||
| # 2. **Template-based generation**: Supply an existing `.docx` file containing jinja2 placeholders (e.g., `{{ prompt }}`). The converter replaces placeholders with the prompt text while preserving the original document's formatting, tables, headers, and footers. The original file is never modified — a new file is always generated. | ||||
|
|
||||
| # %% [markdown] | ||||
| # ### Direct Word Document Generation | ||||
| # | ||||
| # This mode converts plain text strings directly into Word documents. Each newline in the prompt creates a new paragraph. | ||||
|
|
||||
| # %% | ||||
| from pyrit.prompt_converter import WordDocConverter | ||||
|
|
||||
| # Define a simple string prompt (no templates) | ||||
| prompt = "This is a simple test string for Word document generation. No templates here!" | ||||
|
|
||||
| # Initialize the WordDocConverter without a template | ||||
| word_doc_converter = PromptConverterConfiguration.from_converters( | ||||
| converters=[ | ||||
| WordDocConverter( | ||||
| font_name="Calibri", | ||||
| font_size=12, | ||||
| ) | ||||
| ] | ||||
| ) | ||||
|
|
||||
| converter_config = AttackConverterConfig( | ||||
| request_converters=word_doc_converter, | ||||
| ) | ||||
|
|
||||
| # Initialize the attack | ||||
| attack = PromptSendingAttack( | ||||
| objective_target=prompt_target, | ||||
| attack_converter_config=converter_config, | ||||
| ) | ||||
|
|
||||
| result = await attack.execute_async(objective=prompt) # type: ignore | ||||
| await ConsoleAttackResultPrinter().print_conversation_async(result=result) # type: ignore | ||||
|
|
||||
| # %% [markdown] | ||||
| # ### Template-Based Word Document Generation | ||||
| # | ||||
| # This mode takes an existing `.docx` file that contains jinja2 `{{ prompt }}` placeholders and replaces them with the provided prompt text. This is useful for embedding adversarial content into realistic document templates (e.g., resumes, reports, invoices) while preserving all original formatting. | ||||
|
|
||||
| # %% | ||||
| import tempfile | ||||
|
||||
| import tempfile |
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -49,6 +49,8 @@ dependencies = [ | |||
| "pydantic>=2.11.5", | ||||
| "pyodbc>=5.1.0", | ||||
| "python-dotenv>=1.0.1", | ||||
| "python-docx>=1.2.0", | ||||
| "pypdf>=5.1.0", | ||||
|
||||
| "pypdf>=5.1.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs state that template-based generation preserves the original document’s formatting. Given the current implementation can collapse run-level formatting when placeholders span multiple runs, please either update the documentation to mention this limitation or improve the implementation to truly preserve mixed formatting.