Skip to content

feat: add EmlConverter with attachment support#1663

Open
VANDRANKI wants to merge 1 commit intomicrosoft:mainfrom
VANDRANKI:feat/eml-attachment-support
Open

feat: add EmlConverter with attachment support#1663
VANDRANKI wants to merge 1 commit intomicrosoft:mainfrom
VANDRANKI:feat/eml-attachment-support

Conversation

@VANDRANKI
Copy link
Copy Markdown

Summary

Adds EmlConverter for converting RFC 822 .eml files to markdown.

This extends the gap identified in issue #1662. The converter handles:

  • Email headers (From, To, Cc, Subject, Date)
  • Body content (prefers text/plain, falls back to text/html with tag stripping)
  • Attachments: each attachment is passed through the MarkItDown converter pipeline and rendered under its own heading in the output

What this PR covers

  • converters/_eml_converter.py - new EmlConverter class
  • converters/__init__.py - registers the new converter in exports
  • _markitdown.py - registers EmlConverter(markitdown=self) in enable_builtins(), matching the ZipConverter pattern for recursive conversion
  • tests/test_eml_converter.py - 11 tests covering headers, body, attachment conversion, unsupported format fallback, and body/attachment isolation

Attachment design

Attachment conversion follows the same pattern as ZipConverter: a MarkItDown instance is injected via the constructor and used to call convert_stream() on each attachment. This means any format markitdown supports (XLSX, PDF, DOCX, etc.) is automatically available for email attachments with no additional code.

If no MarkItDown instance is available (e.g., the converter is instantiated directly), attachments are listed by filename with a note rather than silently skipped.

Example output

# Email Message

**From:** sender@example.com
**To:** recipient@example.com
**Subject:** Q1 Report
**Date:** Sat, 04 Apr 2026 12:00:00 +0000

## Content

Please find the Q1 report attached.

## Attachments

### Q1_Report.xlsx

## Sheet1
| Quarter | Revenue |
|---------|---------|
| Q1 2026 | 1.2M    |

Testing

All 11 tests pass. ruff check and ruff format pass with zero issues.

Closes #1662

Adds EmlConverter for converting RFC 822 .eml files to markdown.
Extracts email headers (From, To, Cc, Subject, Date) and body content,
preferring text/plain over text/html (with tag stripping as fallback).

Attachments are converted by passing them back through the MarkItDown
converter pipeline when a MarkItDown instance is available. Each
attachment is rendered under its own heading in the output. Unsupported
or failing attachments show a descriptive note rather than silently
failing.

Closes microsoft#1662
@VANDRANKI
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EmlConverter: add support for converting email attachments

1 participant