Skip to content

Alternative: render from_markdown via markdown-it-py#61

Open
MattFisher wants to merge 1 commit into
ma2za:mainfrom
MattFisher:markdown-it-from-markdown
Open

Alternative: render from_markdown via markdown-it-py#61
MattFisher wants to merge 1 commit into
ma2za:mainfrom
MattFisher:markdown-it-from-markdown

Conversation

@MattFisher

@MattFisher MattFisher commented Jun 26, 2026

Copy link
Copy Markdown

What this is

An alternative implementation of Post.from_markdown() that delegates parsing to markdown-it-py (a CommonMark parser) plus the standard footnote plugin, with a small renderer (mdrender.py) that maps the syntax tree onto Substack's node schema. Node construction is centralised in a new nodes.py module so the (undocumented) schema lives in one place.

Why I'm opening it

First off — this approach wasn't discussed beforehand, and I realise it's a bigger change than a typical PR, so I'm putting it up purely for your consideration; entirely your call on whether it's a direction you want to take.

I started out adding footnote support to the existing hand-rolled parser (#56). As that grew — footnotes, fenced/inline-code edge cases, multi-paragraph definitions — it started to feel like we were re-implementing a Markdown parser. So I prototyped this alternative to see how it compared, and it turned out significantly simpler:

  • from_markdown() drops from a ~270-line hand-rolled parser to a few lines delegating to the renderer (net post.py ≈ −215 lines).
  • Footnotes come essentially for free from the footnote plugin (including multi-paragraph definitions), rather than via bespoke pre-parse text extraction.
  • A real CommonMark parser brings correctness for free (nested structures, edge cases) and removes the fragile overlapping-regex inline parsing.

Trade-offs (flagging honestly)

  • Adds two runtime dependencies: markdown-it-py and mdit-py-plugins (both widely used and well maintained). This is the main thing to weigh.
  • Two intentional, CommonMark-correct behaviour changes vs the old parser (tests updated to match, with comments):
    • Consecutive > lines are one paragraph; blank > lines split paragraphs (standard CommonMark). The old parser made one paragraph per line.
    • Footnote definitions that are never referenced are dropped (not appended to the end).
  • parse_inline() / tokens_to_text_nodes() remain as public helpers (still used by the manual footnote() builder), but from_markdown() no longer relies on them.

Tests

  • All existing from_markdown/parse_inline tests pass (the two semantic-difference tests above were updated).
  • Added test_from_markdown_features.py with end-to-end coverage of every feature listed in the from_markdown() docstring (headings 1–6, bold/italic/bold+italic/inline-code/strikethrough, links, images, linked images, code blocks with/without language, blockquotes, bullet/ordered lists, horizontal rules, paragraphs).
  • Footnote tests include references/definitions, named labels, numbering, multi-paragraph definitions, and code-span safety. 86 passing (excludes the pre-existing live-API tests that require credentials).

Relationship to #56

This is an alternative to #56 — if you'd prefer this approach, #56 can be closed in its favour; if not, no harm done and #56 stands on its own.

@MattFisher MattFisher changed the title Alternative: render from_markdown via markdown-it-py (simpler, footnotes for free) Alternative: render from_markdown via markdown-it-py Jun 26, 2026
Replace the hand-rolled Markdown parser in from_markdown() with markdown-it-py
plus the standard footnote plugin, and a small renderer (mdrender.py) that maps
the syntax tree to Substack's node schema. Node construction is centralised in
a new nodes.py module so the schema lives in one place.

Footnotes (including multi-paragraph definitions) come from the footnote plugin.
Adds end-to-end from_markdown feature tests covering every documented feature.

Two intentional, CommonMark-correct behaviour changes vs the old parser:
consecutive '>' lines are one paragraph (blank '>' lines split them), and
unreferenced footnote definitions are dropped rather than appended.
@MattFisher MattFisher force-pushed the markdown-it-from-markdown branch from f257c26 to 97ccc9e Compare June 26, 2026 02:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant