DOC-3470: Improve index.md quality with DOM preprocessing pipeline#4125
DOC-3470: Improve index.md quality with DOM preprocessing pipeline#4125kemister85 wants to merge 1 commit intomainfrom
Conversation
Add preprocessing transforms to generate-markdown.mjs that clean the Antora HTML before dom-to-semantic-markdown conversion: - Strip <style>, <script>, and signup promo elements - Convert admonition blocks to blockquote format - Extract live demo JS code, drop scaffold noise - Remove Antora heading anchor wrappers - Convert card-layout tables to bulleted lists - Fix about:blank# anchor references Refactored to const arrow functions with a composable transform pipeline, following tinymce/tinymce-premium conventions. Verified: byte-identical output across all 1,442 pages; zero about:blank, <style>, signup-promo, or kapa-widget leaks.
|
I've been looking at the quality of our generated
The fix adds a DOM preprocessing pipeline that runs before d2m conversion — each transform is a small, named function in a composable array, so adding or reordering transforms is straightforward. The script was also refactored to follow our established conventions (const arrows, single-purpose functions, verb-first naming). Tested against all 1,442 pages with zero regressions. Happy to walk through the changes if anyone has questions. |
Summary
generate-markdown.mjsthat cleans Antora HTML beforedom-to-semantic-markdownconversion, fixing broken tables,about:blankanchors, admonition formatting, live demo noise, and leaked<style>/<script>elements.constarrow functions with a composable transform array, followingtinymce/tinymce-premiumcoding conventions.Changes
stripNonContent<style>,<script>, signup promos leaked into markdownrewriteAdmonitions| | text |> **Note:** textrewriteLiveDemosstripHeadingAnchorsabout:blank#section-namein all headings## Section NamerewriteCardTables**[Link](url)** — descriptionTest plan
about:blank,<style>,signup-promo,kapa-widget, orEdit on CodePenleaks across all generated.mdfiles