Add Markdown footnote support to from_markdown#56
Conversation
Parse standard Markdown footnotes (`text[^label]` references and `[^label]: definition` lines) into Substack's footnoteAnchor inline nodes and footnote blocks. Footnotes are numbered by order of first reference and labels may be numeric or named. Also adds Post.footnote_anchor() and Post.footnote() helpers for building footnotes manually, plus tests.
|
I found a regression in the footnote pass: it runs before/after Markdown parsing at document scope, so footnote-like text inside code can be removed or rewritten. Could you add regression coverage for these cases before merge? def test_footnote_definition_inside_fenced_code_stays_code():
post = make_post()
post.from_markdown("```\n[^1]: not a footnote\n```")
content = body_content(post)
assert len(content) == 1
assert content[0]["type"] == "codeBlock"
assert content[0]["content"][0]["text"] == "[^1]: not a footnote"
def test_footnote_reference_inside_fenced_code_stays_text():
post = make_post()
post.from_markdown("```\ncode [^1]\n```\n\n[^1]: note")
content = body_content(post)
assert content[0]["type"] == "codeBlock"
assert content[0]["content"][0]["text"] == "code [^1]"
def test_footnote_reference_inside_inline_code_stays_text():
post = make_post()
post.from_markdown("`code [^1]`\n\n[^1]: note")
content = body_content(post)
assert content[0]["type"] == "paragraph"
assert content[0]["content"][0]["text"] == "code [^1]"
assert content[0]["content"][0]["marks"] == [{"type": "code"}]The first test currently fails on this branch with |
Footnote definition extraction now skips fenced code blocks, and anchor injection skips codeBlock nodes and text marked as inline code. This fixes footnote-like text inside code being removed or rewritten. Adds regression tests for fenced and inline code cases.
|
Good catch, thanks. Fixed in 698e098:
All three regression tests you provided are included and pass (full post suite green: 66 passed). |
Footnote definitions can now contain multiple paragraphs (a blank line followed by an indented block); previously a second paragraph leaked into the post body and only the first was kept. Extraction preserves paragraph breaks and Post.footnote() splits blank-line-separated content into multiple paragraph nodes (verified accepted/rendered by Substack). Adds regression tests.
|
Heads-up: as footnote handling has grown here (fenced/inline-code edge cases, multi-paragraph definitions), this is starting to feel a bit complex, like we're edging toward re-implementing a Markdown parser. So I prototyped an alternative approach in #61 that uses Wasn't planned up front, so purely for your consideration, but it might be worth merging #61 instead of this one and getting the footnote handling for free. |
Parse standard Markdown footnotes (
text[^label]references and[^label]: definitionlines) into Substack'sfootnoteAnchorinline nodes and footnote blocks.Footnotes are numbered by order of first reference and labels may be numeric or named.
Also adds
Post.footnote_anchor()andPost.footnote()helpers for building footnotes manually, plus tests.