From 8a77a64f21159d83a171d62381ee4afaeac40403 Mon Sep 17 00:00:00 2001 From: Kuba Sunderland-Ober Date: Mon, 18 May 2026 11:18:13 +0200 Subject: [PATCH 01/21] Write book.html only into _site-pdf, not _site. --- WIP.md | 2 +- docs/_config.yml | 14 +++- docs/_plugins/pdfify.md | 83 ++++++++++++++------- docs/_plugins/pdfify.rb | 158 ++++++++++++++++++++++++++++++---------- 4 files changed, 185 insertions(+), 72 deletions(-) diff --git a/WIP.md b/WIP.md index 7a9c0b9..b0389d9 100644 --- a/WIP.md +++ b/WIP.md @@ -428,7 +428,7 @@ Python scripts are reserved for non-render concerns: one-off content conversion From `docs/`: -- `bundle exec jekyll build` (or `build.bat`) — builds three trees in a single Jekyll run: the online copy at `_site/`, a `file://`-browsable copy at `_site-offline/`, and the sparse pagedjs source at `_site-pdf/`. The offline pass (`_plugins/offlinify.rb`, activated by `also_build_offline: true` in `_config.yml`) adds ~3-5s and the PDF pass (`_plugins/pdfify.rb`, activated by `also_build_pdf: true`) adds <1s on top of the normal ~13s build. The PDF plugin copies `_site/book.html` (the concatenated chapter document rendered via `_layouts/book-combined.html`) verbatim into `_site-pdf/`, along with `assets/css/print.css`, `assets/css/rouge.css`, and every relative `` target -- just what pagedjs needs to render the book PDF. After the copy, the plugin deletes `_site/book.html`: the concatenated document is a build artifact for the PDF render path alone, not a public page on the online site. The companion `offline_exclude: [..., book.html]` entry in `_config.yml` keeps `offlinify.rb` from copying it into `_site-offline/`. The two safeguards are independent -- the exclude pattern works regardless of whether offlinify walks `_site/` before or after pdfify's delete, and pdfify's delete works regardless of whether offlinify is enabled. After Jekyll's WRITE phase, the offline plugin walks `_site/`, copies binary assets verbatim into `_site-offline/`, and for each HTML and CSS file rewrites every root-absolute `href` / `src` / `url()` to a page-relative path with the resolved file extension (`/FAQ` → `../../FAQ.html`, `/Tutorials/CEF/` → `../../Tutorials/CEF/index.html`). It also patches the offline copy of `assets/js/just-the-docs.js` in two places — `navLink()` to match the active nav entry by resolved DOM `link.href` rather than `document.location.pathname` (the upstream pathname-vs-attribute compare returns no match under `file://`, leaving the sidebar with no `.active` class so the nav appears collapsed on every navigation), and `initSearch()` to read the lunr index from `window.SEARCH_DATA` rather than fetching `search-data.json` over `XMLHttpRequest` (XHR to `file://` resources is blocked by browsers; classic ` diff --git a/docs/_plugins/seo-precompute.rb b/docs/_plugins/seo-precompute.rb new file mode 100644 index 0000000..b6b6c2f --- /dev/null +++ b/docs/_plugins/seo-precompute.rb @@ -0,0 +1,188 @@ +# frozen_string_literal: true + +# Precomputes per-page SEO values that `_includes/head_seo.html` +# previously derived in Liquid via the `markdownify | strip_html | +# normalize_whitespace | escape_once` pipeline + `absolute_url` + +# `uri_escape`. +# +# === Problem === +# +# `head_seo.html` rendered ~837 times per build. On every render it +# ran two `markdownify` filter chains (`page.title` and `site.title`), +# one `absolute_url` for the canonical URL, plus the same chain again +# for `site.logo`. ~1,674 of the 1,802 `Jekyll::Filters#markdownify` +# filter invocations across the whole build came from this template, +# costing ruby-prof ~4.0 s of `Liquid::Strainer#invoke` time. The +# kramdown converter's own work hits Jekyll's internal cache for the +# repeated `site.title` input but still pays the Liquid filter +# dispatch and cache-lookup overhead per call. +# +# Of the 836 page titles on the site, only 2 (`*, *=` and `\, \=`) +# contain markdown-active characters. The other 834 paths through +# `markdownify | strip_html | normalize_whitespace | escape_once` +# reduce to a straight `escape_once(title)` -- the markdownify step +# wraps the text in a `

` tag that `strip_html` immediately +# removes, `normalize_whitespace` collapses no internal whitespace, +# and `escape_once` HTML-escapes the same handful of characters. +# Pulling all of that into one Ruby pass at `:site, :pre_render` +# pays the kramdown / regex cost once per unique title in a tight +# loop instead of via the Liquid dispatch path 1,674 times per +# build. +# +# === Approach === +# +# At `:site, :pre_render`, walk every page and stash these +# precomputed values on `page.data`: +# +# _seo_page_title -- markdownify-pipeline output of `page.title`, +# or the site title if `page.title` is empty. +# _seo_full_title -- " | ", collapsing to +# just the page title when the two match. +# _seo_canonical -- `page.url` with `/index.html` stripped, then +# `absolute_url`'d via the same Addressable +# normalisation Jekyll's URLFilters uses. +# _seo_is_home -- boolean: page is the homepage / about page +# in the small fixed list jekyll-seo-tag +# recognises (the JSON-LD `@type` toggles +# between WebSite and WebPage on this flag). +# +# Plus on site.config: +# +# _seo_site_title -- markdownify-pipeline output of `site.title`, +# constant across the build. +# _seo_logo_url -- `absolute_url(site.logo) | uri_escape`, +# constant across the build. +# +# `head_seo.html` then reads these as `page._seo_*` (PageDrop's +# `fallback_data` resolves the keys against `page.data`) and +# `site._seo_*` (SiteDrop's fallback resolves against `site.config`). +# +# Filter logic mirrors the standard Liquid / Jekyll implementations +# byte-for-byte: `Liquid::StandardFilters::STRIP_HTML_BLOCKS` / +# `STRIP_HTML_TAGS` / `HTML_ESCAPE_ONCE_REGEXP` / `HTML_ESCAPE` +# constants are pulled by reference so the strip and escape steps use +# the same regex objects Liquid would. `absolute_url` follows the +# `Jekyll::Filters::URLFilters#absolute_url` recipe -- parse, fall +# back to `relative_url` when `site.url` is empty, otherwise +# `Addressable::URI.parse(site_url + rel).normalize.to_s`. +# `uri_escape` is the Jekyll filter's one-liner +# `Addressable::URI.normalize_component`. +# +# === When it runs === +# +# `:site, :pre_render`, the same phase as +# `_plugins/book-resolve-chapters.rb`. By that point Jekyll has read +# all pages and run every Generator, so `page.url` and `page.data` +# are populated. `head_seo.html` itself doesn't render until later +# in the RENDER phase, so the precomputed keys are visible by then. +# +# === Verification === +# +# Byte-identical output is the bar -- `diff -rq` clean against a +# pre-precompute snapshot of `_site/` / `_site-offline/` / +# `_site-pdf/`. Any divergence in the filter logic above (e.g. an +# `Addressable::URI` version that normalises differently) would show +# up as a counted mismatch in the diff, not as a silent regression. + +require "addressable/uri" +require "liquid" +require "set" + +module Jekyll + module SeoPrecompute + extend self + + # URLs that jekyll-seo-tag's `HOMEPAGE_OR_ABOUT_REGEX` matches + # (the gem uses a regex; Liquid has no regex match operator so + # head_seo.html was enumerating the six values inline -- this + # `Set#include?` lookup replaces the chain of `or` comparisons). + HOMEPAGE_URLS = Set[ + "/", "/index.html", "/index.htm", + "/about/", "/about/index.html", "/about/index.htm" + ].freeze + + STRIP_HTML_BLOCKS = Liquid::StandardFilters::STRIP_HTML_BLOCKS + STRIP_HTML_TAGS = Liquid::StandardFilters::STRIP_HTML_TAGS + HTML_ESCAPE_ONCE_REGEXP = Liquid::StandardFilters::HTML_ESCAPE_ONCE_REGEXP + HTML_ESCAPE = Liquid::StandardFilters::HTML_ESCAPE + + def precompute!(site) + markdown = site.find_converter_instance(Jekyll::Converters::Markdown) + + site_title = render_title(site.config["title"], markdown) + site.config["_seo_site_title"] = site_title + + logo = site.config["logo"] + site.config["_seo_logo_url"] = + logo ? uri_escape(absolute_url(logo, site)) : nil + + site.pages.each do |page| + raw_title = page.data["title"] + page_title = if raw_title && !raw_title.to_s.empty? + render_title(raw_title, markdown) + else + site_title + end + page.data["_seo_page_title"] = page_title + page.data["_seo_full_title"] = + page_title == site_title ? page_title : "#{page_title} | #{site_title}" + + url = page.url.to_s + canonical_input = url.sub(%r!/index\.html\z!, "/") + page.data["_seo_canonical"] = absolute_url(canonical_input, site) + page.data["_seo_is_home"] = HOMEPAGE_URLS.include?(url) + end + end + + # `text | markdownify | strip_html | normalize_whitespace | + # escape_once`, mirroring head_seo.html's pipeline byte-for-byte. + # `Jekyll::Converters::Markdown#convert` is cached internally, so + # `site.title` (a constant) hits the cache from the second call + # onwards; page titles are unique but tiny. + def render_title(text, markdown) + return "" if text.nil? + s = text.to_s + return "" if s.empty? + html = markdown.convert(s) + stripped = html.gsub(STRIP_HTML_BLOCKS, "").gsub(STRIP_HTML_TAGS, "") + collapsed = stripped.gsub(%r!\s+!, " ").tap(&:strip!) + collapsed.gsub(HTML_ESCAPE_ONCE_REGEXP, HTML_ESCAPE) + end + + # Mirrors `Jekyll::Filters::URLFilters#absolute_url`. When + # `site.url` is unset the result is just the relative URL, + # matching the filter's behaviour. + def absolute_url(input, site) + return nil if input.nil? + s = input.to_s + return s if Addressable::URI.parse(s).absolute? + site_url = site.config["url"].to_s + rel = relative_url(s, site) + return rel if site_url.empty? + Addressable::URI.parse(site_url + rel).normalize.to_s + end + + # Mirrors `Jekyll::Filters::URLFilters#relative_url`. + def relative_url(input, site) + s = input.to_s + return s if Addressable::URI.parse(s).absolute? + baseurl = site.config["baseurl"].to_s.chomp("/") + parts = [baseurl, s].map { |p| ensure_leading_slash(p) } + Addressable::URI.parse(parts.join).normalize.to_s + end + + def ensure_leading_slash(input) + return input if input.empty? || input.start_with?("/") + "/#{input}" + end + + # Mirrors `Jekyll::Filters#uri_escape`. + def uri_escape(input) + Addressable::URI.normalize_component(input) + end + end +end + +Jekyll::Hooks.register :site, :pre_render do |site| + Jekyll::SeoPrecompute.precompute!(site) +end From d6a888d5f2b28bfca38db0cbd4295c8dde59e06b Mon Sep 17 00:00:00 2001 From: Kuba Sunderland-Ober Date: Tue, 19 May 2026 22:35:03 +0200 Subject: [PATCH 19/21] Fold book-chapter-body's replace chains into a Ruby filter. --- WIP.md | 36 ++++- docs/_includes/book-chapter-body.html | 194 +++++------------------- docs/_plugins/book-chapter-transform.rb | 170 +++++++++++++++++++++ docs/book.html | 73 +-------- 4 files changed, 252 insertions(+), 221 deletions(-) create mode 100644 docs/_plugins/book-chapter-transform.rb diff --git a/WIP.md b/WIP.md index 29931f6..d3ab997 100644 --- a/WIP.md +++ b/WIP.md @@ -499,7 +499,41 @@ Ranked by estimated wall-clock saving on the current Windows machine: | `Liquid::Variable#render` total | 10.05 s | 8.96 s | -1.09 s | The `BlockBody#render` / `Context#stack` / `Variable#render` drops reflect the eliminated `{%- assign -%}` / `{%- if -%}` blocks in head_seo.html (dropped from ~85 lines of Liquid logic to ~20 lines of straight output). The 128 remaining `markdownify` calls come from `book.html`'s part subtitle/intro (~24) and `book-chapter-body.html`'s per-chapter `chapter.content | markdownify` (~100 chapters whose content doesn't start with `<`); both candidates for a follow-up pass (see #3). New `Jekyll::SeoPrecompute#absolute_url` adds 0.44 s for 846 calls, replacing 1,675 filter calls that totalled 0.40 s -- essentially flat, but the absolute_url filter had its own per-build cache, so the swap is a wash on this axis. Output byte-identical to baseline (`diff -rq` clean on all three of `_site/`, `_site-offline/`, `_site-pdf/`). -3. **`book-chapter-body.html` heading-shift + anchor-prefix `replace` chain → Ruby pass.** ~36 k replaces fold into one string rewrite. Smaller win (~0.3 s) but probably wants to land alongside #2 since both touch the same render path. +3. **`book-chapter-body.html` heading-shift + anchor-prefix `replace` chain → Ruby pass. [LANDED]** Replaced the per-chapter chain of 0-3 heading-shift cascades (12 replaces each), the 12-pattern whitespace span wrapping, and the 13-replace anchor-id prefix pass with a single Liquid filter `book_chapter_transform` (`_plugins/book-chapter-transform.rb`). The filter takes the body, the site baseurl, a precomputed `heading_shift_n` (0-3, derived in Liquid from `skip_base_heading_shift` / `is_sub_page` / `extra_heading_shift`), and the chapter anchor; does all six passes in one method with no intermediate string allocations beyond what the regex engine produces internally. The dead `p1_search` / `p1_replace` / ... whitespace-pattern declarations were also removed from `book.html`'s prologue. + + The single-pass heading shift (one regex bumping each level by N, capping at h7-stub for source levels above 6) is equivalent to N applications of the bottom-up cascade chain -- each source heading lands at `level + N` or `h7-stub` regardless of how many sequential passes the chain ran, since the cascade structure was an artifact of Liquid not having a bump-by-N primitive, not a semantic requirement. + + Ruby-prof effect (post-SEO baseline vs post-chapter-transform): + + | Metric | Before | After | Delta | + |---|---|---|---| + | Total instrumented wall | 36.90 s | 34.78 s | -2.12 s | + | `Liquid::Strainer#invoke` total | 5.97 s / 179,266 calls | 5.45 s / 122,397 calls | -0.52 s / -56,869 calls | + | `Liquid::StandardFilters#replace` calls | 87,991 | 48,577 | -39,414 | + | `Liquid::StandardFilters#replace` total | 0.58 s | 0.33 s | -0.25 s | + | new `BookChapterTransform#book_chapter_transform` | -- | 0.14 s / 718 calls | +0.14 s | + | `Liquid::BlockBody#render` total | 16.14 s | 14.43 s | -1.71 s | + | `Liquid::Context#stack` total | 15.50 s | 13.78 s | -1.71 s | + | `Liquid::Variable#render` total | 8.96 s | 7.82 s | -1.14 s | + + The Liquid framework drops (`BlockBody#render`, `Context#stack`, `Variable#render`) again outweigh the filter-dispatch drop -- they capture the eliminated `{%- unless -%}` / `{%- if -%}` blocks plus the chained `| replace:` pipeline AST nodes. The new filter does ~190 µs per call across 718 invocations, covering the same work the eliminated 39 k Liquid replaces did. Output byte-identical to baseline (`diff -rq` clean on `_site/`, `_site-offline/`, `_site-pdf/`). + +#### Cumulative + +The three landed optimizations together (chapter precompute, SEO precompute, chapter-body transform) shrank ruby-prof's instrumented build wall from ~41.7 s (immediately post-html-compress baseline) down to 34.78 s -- about -7 s. The cumulative profile-table picture, comparing the post-html-compress baseline to the post-chapter-transform state: + +| Metric | Post-html-compress | Post-CT | Delta | +|---|---|---|---| +| Total instrumented wall | 39.30 s | 34.78 s | -4.52 s | +| `Liquid::Strainer#invoke` total | 8.90 s / 191,365 calls | 5.45 s / 122,397 calls | -3.45 s / -68,968 calls | +| `where_exp` calls | 37 | 0 | -37 | +| `markdownify` calls | 1,802 | 128 | -1,674 | +| `absolute_url` filter calls | 1,675 | 1 | -1,674 | +| `replace` calls | 87,991 | 48,577 | -39,414 | +| `Liquid::BlockBody#render` total | 18.38 s | 14.43 s | -3.95 s | +| `Liquid::Context#stack` total | 18.19 s | 13.78 s | -4.41 s | + +What's left of the per-filter table is approximately what kramdown / Rouge actually parse and emit: the 128 remaining `markdownify` calls are the per-chapter `chapter.content | markdownify` in `book-chapter-body.html` plus `book.html`'s part subtitle / intro markdown. Each of those is unique input, so Jekyll's converter cache rarely hits and the kramdown parse itself dominates. Further savings on this axis would need either (a) reusing the already-rendered `_site/.html` instead of re-parsing source markdown for the book, or (b) accepting kramdown's parse cost as the floor and looking elsewhere -- the next-biggest non-library hotspot is `Offlinify#rewrite_html!` at ~2 s of self-time, already heavily optimised (see `_plugins/offlinify.md`). ## Site integrity check diff --git a/docs/_includes/book-chapter-body.html b/docs/_includes/book-chapter-body.html index 69f5ede..6380f7c 100644 --- a/docs/_includes/book-chapter-body.html +++ b/docs/_includes/book-chapter-body.html @@ -99,168 +99,52 @@ {%- endunless -%} {%- comment -%} - Strip the `src="/` prefix that `relative_url` injects when - `jekyll build --baseurl /` is passed (the CI deploy path uses - this for Pages project sites without a custom domain). With empty - baseurl the prefix collapses to `src="/`, matching the historical - leading-slash strip exactly. Once stripped, image paths inside - book.html are root-of-_site/-relative, which is what both pdfify's - source lookup and pagedjs's render-time fetch expect. -{%- endcomment -%} -{%- assign src_baseurl_strip = 'src="' | append: site.baseurl | append: '/' -%} - -{%- assign body = body - | replace: src_baseurl_strip, 'src="' - | replace: p1_search, p1_replace - | replace: p2_search, p2_replace - | replace: p3i12_search, p3i12_replace - | replace: p3i8_search, p3i8_replace - | replace: p3i4_search, p3i4_replace - | replace: p3_search, p3_replace - | replace: p4i16_search, p4i16_replace - | replace: p4i12_search, p4i12_replace - | replace: p4i8_search, p4i8_replace - | replace: p4i4_search, p4i4_replace - | replace: p4_search, p4_replace - | replace: ' ', '' - | replace: '', '' - | replace: '', '' - | replace: '', '' - | replace: '', '' -%} +{%- if include.chapter_anchor_override -%} + {%- assign chapter_anchor = include.chapter_anchor_override -%} +{%- else -%} + {%- assign url_path = include.chapter.url | replace: '/', '-' -%} + {%- assign anchor_first_char = url_path | slice: 0, 1 -%} + {%- if anchor_first_char == '-' -%} + {%- assign url_len = url_path.size | minus: 1 -%} + {%- assign url_path = url_path | slice: 1, url_len -%} {%- endif -%} - - {%- comment -%} - 1.5b: chapter anchor + per-heading id uniqueness. Derive a stable - chapter anchor from the chapter URL (strip leading and trailing - slashes, replace inner slashes with dashes); the article element - carries the bare anchor as its id so Phase 2 cross-references can - target `#ch-` and land at the chapter's first page. Every - heading id and intra-chapter `href="#..."` inside the chapter body - gets that anchor prepended so identical kramdown-generated slugs - (`see-also`, `example`, ...) don't collapse to one outline - destination. - {%- endcomment -%} - {%- if include.chapter_anchor_override -%} - {%- assign chapter_anchor = include.chapter_anchor_override -%} - {%- else -%} - {%- assign url_path = include.chapter.url | replace: '/', '-' -%} - {%- assign anchor_first_char = url_path | slice: 0, 1 -%} - {%- if anchor_first_char == '-' -%} - {%- assign url_len = url_path.size | minus: 1 -%} - {%- assign url_path = url_path | slice: 1, url_len -%} - {%- endif -%} - {%- assign anchor_last_char = url_path | slice: -1, 1 -%} - {%- if anchor_last_char == '-' -%} - {%- assign url_len = url_path.size | minus: 1 -%} - {%- assign url_path = url_path | slice: 0, url_len -%} - {%- endif -%} - {%- assign chapter_anchor = 'ch-' | append: url_path -%} + {%- assign anchor_last_char = url_path | slice: -1, 1 -%} + {%- if anchor_last_char == '-' -%} + {%- assign url_len = url_path.size | minus: 1 -%} + {%- assign url_path = url_path | slice: 0, url_len -%} {%- endif -%} + {%- assign chapter_anchor = 'ch-' | append: url_path -%} +{%- endif -%} - {%- assign h2_class_prefix = '

` -> `

`). The +# `\b` word boundary anchors after the digit so `` +# (hypothetical) wouldn't accidentally match. +# +# === When it runs === +# +# Per-render, inside Liquid as a standard filter. The plugin file +# only needs `Liquid::Template.register_filter`; no hook. + +module Jekyll + module BookChapterTransform + SP = " " + NL = "\n" + S4 = " " + S8 = " " + S12 = " " + S16 = " " + + # Inter-span whitespace patterns for pagedjs's page splitter -- + # see book.html's header comment for the full rationale. Longest + # pattern first so each consumes its bytes before a shorter + # pattern can fragment them; mirrors the Liquid chain's order in + # `book-chapter-body.html` exactly. + WHITESPACE_PATTERNS = [ + # p1: blank line after code line with trailing space + ["#{SP}#{NL}#{SP}#{NL}#{SP}#{NL}#{SP}#{NL}#{NL}#{SP}#{NL}#{NL}#{SP}#{NL}#{SP}#{NL}#{S12}#{SP}#{NL}#{S12}#{SP}#{NL}#{S8}#{SP}#{NL}#{S8}#{SP}#{NL}#{S4}#{SP}#{NL}#{S4}#{SP}#{NL}#{SP}#{NL}#{NL}#{S16}#{NL}#{S16}#{NL}#{S12}#{NL}#{S12}#{NL}#{S8}#{NL}#{S8}#{NL}#{S4}#{NL}#{S4}#{NL}#{NL} `. + HEADING_SHIFT_RE = /<(\/?)h([1-6])\b/.freeze + + # Heading-id prefix regex. Matches both `\n \n99% of code block lines. + The inter-span whitespace patterns the chapter-body transform + needs for pagedjs (each whitespace run between adjacent code + spans wrapped in a `` so the page splitter + doesn't collapse it) used to be declared here as Liquid variables + and consumed by `_includes/book-chapter-body.html`. They now live + inline in `_plugins/book-chapter-transform.rb`'s + `WHITESPACE_PATTERNS` constant -- see that plugin's header + comment for the longest-first rationale. {%- endcomment -%} -{%- assign s4 = ' ' -%} -{%- assign s8 = ' ' -%} -{%- assign s12 = ' ' -%} -{%- assign s16 = ' ' -%} - -{%- assign p4i4_search = '' | append: nl | append: s4 | append: ' Date: Wed, 20 May 2026 15:18:47 +0200 Subject: [PATCH 20/21] Save about 0.3s in the gfma plugin. --- WIP.md | 30 ++++++- docs/_plugins/jekyll-gfm-admonitions-patch.rb | 85 ++++++++++++++++++- 2 files changed, 111 insertions(+), 4 deletions(-) diff --git a/WIP.md b/WIP.md index d3ab997..7c06771 100644 --- a/WIP.md +++ b/WIP.md @@ -518,21 +518,45 @@ Ranked by estimated wall-clock saving on the current Windows machine: The Liquid framework drops (`BlockBody#render`, `Context#stack`, `Variable#render`) again outweigh the filter-dispatch drop -- they capture the eliminated `{%- unless -%}` / `{%- if -%}` blocks plus the chained `| replace:` pipeline AST nodes. The new filter does ~190 µs per call across 718 invocations, covering the same work the eliminated 39 k Liquid replaces did. Output byte-identical to baseline (`diff -rq` clean on `_site/`, `_site-offline/`, `_site-pdf/`). +4. **JekyllGFMAdmonitions defer-body-parse. [LANDED]** Extended `_plugins/jekyll-gfm-admonitions-patch.rb` with two method overrides on `JekyllGFMAdmonitions::GFMAdmonitionConverter`. The first replaces `admonition_html` so the admonition body is spliced into `doc.content` as raw markdown inside a `
` wrapper, deferring the per-admonition `@markdown.convert(text)` call to the page-level kramdown pass (which already runs with `parse_block_html: true` per `_config.yml`). One combined kramdown pass replaces 1 + N parses for each of the site's 508 admonitions. The second overrides `process_doc` to preserve the leading newline(s) in the code-block stash placeholder substitution -- without this, the gem's `(?:^|\n)(?)\s*\`\`\`.*?\`\`\`` regex consumes the blank line between an admonition body and a following fenced code block, the placeholder ends up appended to the last `>`-prefixed body line, the admonition regex pulls it into the body capture, and either kramdown renders it as an empty `` (gem behaviour) or the code block is spliced inside the admonition div (deferred-body behaviour). With the override, placeholders stay on their own line outside the body capture. + + Ruby-prof effect (post-CT baseline vs post-GFM-patch): + + | Metric | Before | After | Delta | + |---|---|---|---| + | `GFMAdmonitionConverter#generate` total | 0.690 s / 1 call | 0.108 s / 1 call | -0.582 s | + | `admonition_html` calls | 508 | 508 | (same dispatch, now does only string concat) | + | `@markdown.convert(text)` calls from admonition_html | 508 | 0 | -508 | + + Wall-clock effect on 3-run uninstrumented means (busy dev machine, but consistent within each set): + + | Phase | Before | After | Delta | + |---|---|---|---| + | `done in ...` total | 11.47 s | 11.13 s | -0.34 s | + | `GFMA: Generator ran in ...` | 216 ms | 93 ms | -123 ms | + + Output is not byte-identical to baseline: 12 files differ. Eleven are real bug fixes that were latent in the unpatched gem -- 5 pages had their fenced code block lost (the code-block-stash-eats-the-blank-line bug above; `Tutorials/Arrays.md`, `Tutorials/CustomControls/Painting.md`, `tB/Packages/WebView2/WebView2/index.md`, `tB/Packages/WinNamedPipesLib/NamedPipeClientConnection.md`, `tB/Packages/WinServicesLib/ServiceManager.md`), 1 page had a `\\\\` source sequence collapsed to `\\` by the gem's second markdown pass (`tB/Core/RightShift.md` -- the body is now parsed once, so `**\\\\**` renders as `\\` not `\`), 1 page had its loose-list items rendered as `
  • text
  • ` instead of CommonMark's `
  • text

  • ` because the gem's pre-rendered admonition HTML changed the surrounding paragraph context (`Documentation/Development.md`), and the remaining 5 are cosmetic whitespace nits inside admonitions that themselves contain a fenced code block (`tB/Core/If-Then-Else.md`, `tB/Core/Option.md`, `tB/Modules/Interaction/InputBox.md`, `tB/Packages/CEF/CefBrowser/index.md`, `tB/Packages/tbIDE/HtmlElement.md`). The 12th file is `assets/js/search-data.json`, derived from page contents so it tracks them. Lychee link check is clean (`8170 OK, 0 errors` for online; `6824 OK, 0 errors` for offline). + + A separate investigation looked at `NavIntegrityCheck::Generator#generate` (0.436 s / 1 call in the post-CT profile, attributed to 855 `Jekyll::FrontmatterDefaults#find` walks). The plugin uses `page.data[key]` for `title` / `nav_exclude` / `parent` / `grand_parent`, and Jekyll's `Page#initialize` sets `data.default_proc = proc { site.frontmatter_defaults.find(...) }`, so every missing key fell through to a full defaults walk. Switching to `data.fetch(key, nil)` bypasses the default_proc, but the resulting wall-clock delta is only ~50-80 ms: NavIntegrityCheck was warming `FrontmatterDefaults`'s internal `@matched_set_cache` (keyed by `path-type`), and `NavTreePrecompute::Generator#ordered_children_for` was the cache's biggest beneficiary. With NavIntegrityCheck skipping the walk, NavTreePrecompute pays the cache-miss cost itself -- ~430 ms moves from one stack to the other, leaving only the per-call dispatch overhead recovered. The patch was reverted. + #### Cumulative -The three landed optimizations together (chapter precompute, SEO precompute, chapter-body transform) shrank ruby-prof's instrumented build wall from ~41.7 s (immediately post-html-compress baseline) down to 34.78 s -- about -7 s. The cumulative profile-table picture, comparing the post-html-compress baseline to the post-chapter-transform state: +The four landed optimizations together (chapter precompute, SEO precompute, chapter-body transform, GFM defer-body-parse) shrank ruby-prof's instrumented build wall from ~41.7 s (immediately post-html-compress baseline) down to ~34 s. The cumulative profile-table picture, comparing the post-html-compress baseline to the post-GFM state: -| Metric | Post-html-compress | Post-CT | Delta | +| Metric | Post-html-compress | Post-GFM | Delta | |---|---|---|---| -| Total instrumented wall | 39.30 s | 34.78 s | -4.52 s | +| Total instrumented wall | 39.30 s | 34.78 s* | -4.52 s | | `Liquid::Strainer#invoke` total | 8.90 s / 191,365 calls | 5.45 s / 122,397 calls | -3.45 s / -68,968 calls | | `where_exp` calls | 37 | 0 | -37 | | `markdownify` calls | 1,802 | 128 | -1,674 | | `absolute_url` filter calls | 1,675 | 1 | -1,674 | | `replace` calls | 87,991 | 48,577 | -39,414 | +| `GFMAdmonitionConverter#generate` total | 0.690 s | 0.108 s | -0.582 s | | `Liquid::BlockBody#render` total | 18.38 s | 14.43 s | -3.95 s | | `Liquid::Context#stack` total | 18.19 s | 13.78 s | -4.41 s | +\* Instrumented totals are noisy on the current Windows dev machine (single-run range ~9 s across consecutive identical runs); the per-method numbers above are stable across runs and are the more reliable signal. + What's left of the per-filter table is approximately what kramdown / Rouge actually parse and emit: the 128 remaining `markdownify` calls are the per-chapter `chapter.content | markdownify` in `book-chapter-body.html` plus `book.html`'s part subtitle / intro markdown. Each of those is unique input, so Jekyll's converter cache rarely hits and the kramdown parse itself dominates. Further savings on this axis would need either (a) reusing the already-rendered `_site/.html` instead of re-parsing source markdown for the book, or (b) accepting kramdown's parse cost as the floor and looking elsewhere -- the next-biggest non-library hotspot is `Offlinify#rewrite_html!` at ~2 s of self-time, already heavily optimised (see `_plugins/offlinify.md`). ## Site integrity check diff --git a/docs/_plugins/jekyll-gfm-admonitions-patch.rb b/docs/_plugins/jekyll-gfm-admonitions-patch.rb index d4a7a16..b009f59 100644 --- a/docs/_plugins/jekyll-gfm-admonitions-patch.rb +++ b/docs/_plugins/jekyll-gfm-admonitions-patch.rb @@ -113,7 +113,8 @@ # generators (this one included) are invisible to it. The wall-clock delta # we log here is the gem's full contribution to the GENERATE phase: # walking every collection doc and page, running the admonition regex, -# and re-invoking the markdown converter on each admonition body. +# and (after the patch below) splicing in HTML that defers body markdown +# parsing to the page-level kramdown pass. module JekyllGFMAdmonitions class GFMAdmonitionConverter unless method_defined?(:_generate_without_timing) @@ -125,5 +126,87 @@ def generate(site) Jekyll.logger.info "GFMA:", "Generator ran in #{elapsed_ms}ms." end end + + # Skip the per-admonition `@markdown.convert(text)` call by leaving + # the body as raw markdown inside the outer alert div. The + # site-level kramdown config (`parse_block_html: true` and + # `parse_span_html: true` in _config.yml) makes the page-level + # kramdown pass descend into the div and parse the body markdown + # during RENDER, so the rendered HTML is the same as if the gem had + # pre-converted the body itself -- one combined parse instead of + # 1 + N (page + one per admonition). + # + # Two side effects of removing the inline `markdown.convert` + # surface as small correctness improvements over the unpatched gem: + # + # * Backslash-escapes in body text (e.g. `**\\\\**` for a bold + # pair of backslashes) no longer go through kramdown twice and + # so are no longer collapsed by the second pass. The unpatched + # gem's output of `\` (one backslash) becomes + # `\\` (two backslashes, what the source asks + # for). Pages affected: any with `\\\\` inside an admonition + # body -- on this site, `Reference/Core/RightShift.md` and a + # handful of others. + # + # * Code blocks that follow an admonition with just one blank + # line between them are no longer eaten by the gem's code-block + # stash regex (see `process_doc` override below). The unpatched + # gem's stash regex `(?:^|\n)(?)\s*```.*?```/m` consumes the + # blank line, which pulls the placeholder into the admonition + # body capture, which lets kramdown render it as an empty + # `` element and + # prevents the restore step from finding it. Net effect on + # the unpatched gem: the code block disappears from the + # rendered HTML. The override below preserves the leading + # newline(s) so the placeholder stays on its own line outside + # the admonition body capture. + # + # The body text is bracketed by blank lines so kramdown reads it as + # an independent paragraph rather than tangling with the preceding + # `

    ...

    ` block. The outer div + # carries `markdown='1'` so kramdown's HTML-block parser keeps the + # whole `
    ...
    ` as a single block even though it spans + # blank lines internally. + unless method_defined?(:_admonition_html_without_deferred_body) + alias_method :_admonition_html_without_deferred_body, :admonition_html + def admonition_html(type, title, text, icon) + "
    \n" \ + "

    #{icon} #{title}

    \n\n" \ + "#{text}\n" \ + "
    " + end + end + + # Override `process_doc` to fix the code-block stash so that the + # placeholder substitution preserves the leading newline(s) that + # separated the code block from the preceding text. Without this + # adjustment, the gem's gsub eats the blank line before the code + # block, which causes the placeholder to be appended to the last + # admonition body line and then dragged into the body capture by + # the admonition regex's `[^\n]*` body-line pattern. + # + # The body of the method otherwise mirrors the upstream gem + # verbatim (see jekyll-gfm-admonitions 1.2.0, + # `lib/jekyll-gfm-admonitions.rb#process_doc`). + unless method_defined?(:_process_doc_without_leading_ws_preserve) + alias_method :_process_doc_without_leading_ws_preserve, :process_doc + def process_doc(doc) + return if doc.content.empty? + doc.content = doc.content.dup unless doc.content.frozen? + + code_blocks = [] + doc.content.gsub!(/(?:^|\n)(?)\s*```.*?```/m) do |match| + code_blocks << match + leading = match[/\A\s+/] || "" + "#{leading}```{{CODE_BLOCK_#{code_blocks.length - 1}}}```" + end + + convert_admonitions(doc) + + doc.content.gsub!(/```\{\{CODE_BLOCK_(\d+)}}```/) do + code_blocks[::Regexp.last_match(1).to_i] + end + end + end end end From a1c61876d9d62665df582384f50e22f937f08e96 Mon Sep 17 00:00:00 2001 From: Kuba Sunderland-Ober Date: Wed, 20 May 2026 15:33:57 +0200 Subject: [PATCH 21/21] Update the book render plan. --- BOOKPLAN.md | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/BOOKPLAN.md b/BOOKPLAN.md index 3f9d587..9518eb6 100644 --- a/BOOKPLAN.md +++ b/BOOKPLAN.md @@ -17,9 +17,10 @@ or `build.bat` then `book.bat`. One Jekyll invocation produces three trees in pa Touch points and what each one already exposes: -- [docs/book.html](docs/book.html) — iterator that concatenates every chapter into one HTML document. Permalink `/book.html`, layout `book-combined`. Contains: whitespace-pattern primitives (p1..p4, indent variants) for the pagedjs whitespace fix; the Roman numerals array; the title-page section (1.3); the front-matter loop (1.7) that emits `site.data.book.front_matter` entries inline between the title page and Part I; the per-part loop. Each part can be **flat** (page selectors directly on the part, plus an optional `landing_page:`) or **chaptered** (1.9; a `foreword_page:` and/or `landing_page:` plus a nested `chapters:` list, each chapter carrying its own selectors and divider page). Page selection across all three call sites (front-matter, flat part, chaptered chapter) is delegated to `_includes/book-collect-matches.html` so the selector schema stays in one place. Per-chapter body rendering — whitespace fix, heading depth shift, sub-page detection, id uniqueness, header-string emission — is factored out into `_includes/book-chapter-body.html` so the chapter-loop callers share one implementation. Insertion points for new front matter go **after** the title-page section and **before** the `{%- for part in site.data.book.parts -%}` opener. -- [docs/_includes/book-chapter-body.html](docs/_includes/book-chapter-body.html) — per-chapter body processing, called via `{% include book-chapter-body.html chapter=... %}` from each of book.html's chapter-loop callers. Handles markdownify, the pagedjs whitespace fix (consumes the `p1..p4` patterns from the outer scope), heading depth shift (1.5a + sub-page 1.6b), sub-page detection (1.6a, opt-out via `skip_sub_page_detection`), heading-id uniqueness (1.5b), compound running header (1.6c), and emits the final `
    ` block. Take-it-or-leave-it parameters cover the cases that don't fit the default: `article_class_override` (front-matter and part-foreword), `chapter_anchor_override` (root URL `/` fallback to `ch-introduction`), `skip_sub_page_detection` (front-matter entries and part landings don't share an index hierarchy with following chapters), `skip_base_heading_shift` (skips the 1.5a `+1` shift; paired with the part's `no_heading_shift` flag), `extra_heading_shift` (adds the 1.9 chaptered-part `+1` shift on top of 1.5a so class / module indexes nest under their chapter divider in the outline). -- [docs/_includes/book-collect-matches.html](docs/_includes/book-collect-matches.html) — shared selector logic for gathering pages that match a manifest entry. Called from each of book.html's chapter-loop callers via `{% include book-collect-matches.html entry=... %}` after the caller initialises an outer-scope `collected = "" | split: ""` array; the include appends every matching page to `collected` in selector order. Recognises four selector keys on the entry — `page:` (single URL), `pages:` (list of URLs), `nav_page:` (single nav-path), `nav_pages:` (list of nav-paths) — and one modifier, `no_descent:`, that flips every match from the default `contains` (starts-with) semantics to exact-equality. `landing_page:` and `foreword_page:` are not handled here; their first-emission / divider-styling semantics live in the caller. +- [docs/book.html](docs/book.html) — iterator that concatenates every chapter into one HTML document. Permalink `/book.html`, layout `book-combined`. Contains: the Roman numerals array; the title-page section (1.3); the front-matter loop (1.7) that emits `site.data.book.front_matter` entries inline between the title page and Part I; the per-part loop. Each part can be **flat** (page selectors directly on the part, plus an optional `landing_page:`) or **chaptered** (1.9; a `foreword_page:` and/or `landing_page:` plus a nested `chapters:` list, each chapter carrying its own selectors and divider page). Each chapter-loop caller reads its pre-resolved page list from `entry._chapters`, populated once at `:site, :pre_render` by `_plugins/book-resolve-chapters.rb` (so the selector schema stays in one place). Per-chapter body rendering is delegated to `_includes/book-chapter-body.html`, which in turn calls the `book_chapter_transform` Liquid filter (`_plugins/book-chapter-transform.rb`) for whitespace fix, heading-depth shift, heading-id rewrite, and intra-chapter href-anchor rewrite. Insertion points for new front matter go **after** the title-page section and **before** the `{%- for part in site.data.book.parts -%}` opener. +- [docs/_includes/book-chapter-body.html](docs/_includes/book-chapter-body.html) — per-chapter body processing, called via `{% include book-chapter-body.html chapter=... %}` from each of book.html's chapter-loop callers. Handles sub-page detection (1.6a, opt-out via `skip_sub_page_detection`), compound running header (1.6c), and emits the final `
    ` block. The heavier rewrites — markdownify, the pagedjs whitespace fix (1.5/2.1), the 1.5a heading-depth shift (+ the 1.6b sub-page and 1.9 chaptered-part additional shifts when applicable), the 1.5b heading-id prefix, and the intra-chapter `href="#..."` anchor prefix — are batched into one Ruby pass via the `body | book_chapter_transform: site.baseurl, heading_shift_n, chapter_anchor` filter call. Take-it-or-leave-it parameters cover the cases that don't fit the default: `article_class_override` (front-matter and part-foreword), `chapter_anchor_override` (root URL `/` fallback to `ch-introduction`), `skip_sub_page_detection` (front-matter entries and part landings don't share an index hierarchy with following chapters), `skip_base_heading_shift` (skips the 1.5a `+1` shift; paired with the part's `no_heading_shift` flag), `extra_heading_shift` (adds the 1.9 chaptered-part `+1` shift on top of 1.5a so class / module indexes nest under their chapter divider in the outline). The three `_*_heading_shift` parameters and `skip_base_heading_shift` combine into a single `heading_shift_n` integer the include passes to the filter; the filter then bumps each heading by exactly N levels in one regex pass (capping at h7-stub above source-h6), rather than running 0-3 cascading shift chains. +- [docs/_plugins/book-resolve-chapters.rb](docs/_plugins/book-resolve-chapters.rb) — `:site, :pre_render` generator that walks `_data/book.yml` (`front_matter:`, each flat part, each part's optional `foreword_page:`/`landing_page:`, and each chapter inside a chaptered part) and stashes the resolved page array on `entry._chapters` for `book.html` to iterate. Recognises four selector keys on the entry — `page:` (single URL), `pages:` (list of URLs), `nav_page:` (single nav-path), `nav_pages:` (list of nav-paths) — and one modifier, `no_descent:`, that flips every match from the default `contains` (starts-with) semantics to exact-equality. `landing_page:` and `foreword_page:` are not resolved here; their first-emission / divider-styling semantics live in `book.html`'s caller. Replaces the earlier per-render Liquid include `_includes/book-collect-matches.html` -- the `where_exp` / `where` / `concat` / `sort_by_nav_order` chains were running 37 times per build for ~1.5 s of Liquid expression-interpreter time; precomputing once at site:pre_render is free. +- [docs/_plugins/book-chapter-transform.rb](docs/_plugins/book-chapter-transform.rb) — registers the `book_chapter_transform` Liquid filter that takes a chapter body and applies, in one Ruby pass: the pagedjs inter-span whitespace fix (longest-first regex over `WHITESPACE_PATTERNS`), the N-level heading shift (1.5a + 1.6b + 1.9, where N is precomputed by the include from `skip_base_heading_shift` / `is_sub_page` / `extra_heading_shift`), the 1.5b `id="..."` prefix per chapter, and the corresponding `href="#..."` prefix for intra-chapter anchors. One filter call replaces a chain of ~36 `| replace:` invocations plus a 12-pattern whitespace span wrap from the prior in-template implementation (~3 cascading heading-shift passes × 12 replaces, plus the anchor-id 13-replace pass). - [docs/_layouts/book-combined.html](docs/_layouts/book-combined.html) — minimal wrapper: `` + `{{ site.title }}` + `rouge.css` + `print.css` + `{{ content }}`. No nav, no JS, no chrome. Pagedjs runs on the rendered output of this layout. The only layout the PDF pipeline uses; the older per-source-page `book` layout was retired alongside `_config-pdf.yml`. - [docs/assets/css/print.css](docs/assets/css/print.css) — the book's design. Existing structural rules: `@page` (A4, 22mm margins, running header in `@top-right` via `string(chapter-title)`, page number in `@bottom-right`); `@page :first` (suppresses both — used by the title page); `@page divider` (suppresses both, used by part dividers via `page: divider`); `@page front-matter` (suppresses both, used by `article.front-matter` for 1.7 Introduction-style sections); `@page part-foreword` + `@page chapter-divider` (suppresses both, used by the 1.9 part foreword and per-chapter title pages); `article { break-before: page }`; per-chapter `string-set: chapter-title` on `article.page > .header-string`; the top-level vs sub-chapter heading-size split (`article.page:not(.sub-chapter) > h2:first-of-type` vs `article.page.sub-chapter > h3:first-of-type`); chapter-divider H2 typography (`article.chapter-divider h2` — 24pt centered, no border) plus its subtitle (`.chapter-subtitle` — 13pt italic). - [docs/_data/book.yml](docs/_data/book.yml) — the manifest book.html iterates over. Schema: `parts:` is an ordered list of numbered parts. A **flat part** carries page-selectors directly (`page:` / `pages:` / `nav_page:` / `nav_pages:`) plus an optional `landing_page:`; a **chaptered part** (1.9) replaces the selectors with a `chapters:` list of per-chapter entries `{ title, subtitle, landing_page, page/pages/nav_page/nav_pages, ... }` and may carry a `foreword_page:` and/or a `landing_page:` on the part itself. Each chaptered chapter emits a full-page `
    ` title page followed by its landing page (with the source H1 stripped by the plugin) and the selector-matched content in `sort_by_nav_order` order. `front_matter:` is a sibling list of front-matter sections (1.7), same selector shape as a flat part. The selector keys: `pages:` is a list of URL substrings matched via `contains` (multiple entries can map to one Part / chapter — used for the Reference Section in 1.8 and for the VBA chapter's landing at `/tB/Packages/VBA` + members under `/tB/Modules/...`); `page:` is the singular alias. `nav_pages:` is the same shape against `page.data["nav_path"]` (populated by `_plugins/nav-path.rb`) — used when a section is most naturally expressed as a nav-tree branch rather than a URL prefix; `nav_page:` is its singular alias. A `no_descent: true` modifier on the entry switches every selector to exact-equality so a single index page can be picked up without sweeping its sub-pages. Additional control flags: `no_outline_entry:` suppresses the part-divider H1 / chapter-divider H2 (so the section's first content heading becomes the bookmark target); `no_heading_shift:` skips the 1.5a base shift for the part's entries (used when the source pages are already authored at H2-and-deeper). Available in Liquid as `site.data.book.parts` and `site.data.book.front_matter`. @@ -45,7 +46,8 @@ Concretely for the PDF book: - Git-derived build info (commit hash, commit date) → Jekyll plugin (`_plugins/build-info.rb`) that populates `site.data.build` on `:site, :post_read`. Not a pre-build Python step writing `_data/build.yml`. - Chapter manifest → `_data/book.yml` (committed source of truth, hand-edited). - Title page, colophon, TOC content → Liquid in `book.html` and the layouts. -- Heading rewrites → Liquid (existing approach in `book.html`). Per-chapter, one-shot, fast. +- Chapter selector resolution (`page` / `pages` / `nav_page` / `nav_pages` / `no_descent`, the `sort_by_nav_order` ordering, and `foreword_page`/`landing_page` resolution) → Jekyll plugin (`_plugins/book-resolve-chapters.rb`) running at `:site, :pre_render`. The Liquid implementation (formerly in `_includes/book-collect-matches.html`) was running ~37 `where_exp` invocations per build for ~1.5 s of Liquid expression-interpreter time; resolving once into `entry._chapters` is free. +- Per-chapter body rewrites (pagedjs whitespace fix, heading-depth shift, heading-id prefix, intra-chapter href anchor prefix) → Jekyll plugin (`_plugins/book-chapter-transform.rb`), exposed as the `book_chapter_transform` Liquid filter that `_includes/book-chapter-body.html` calls once per chapter. The Liquid version was a chain of ~36 `| replace:` invocations plus a 12-pattern whitespace span wrap per chapter; the filter does the same passes in C-implemented regex over the body string, with the heading-shift cascade collapsed to a single bump-by-N regex. - Cross-reference href rewrites → Jekyll plugin (`_plugins/book-href-rewrite.rb`), running on `:pages, :post_render`. The first cut was inline Liquid; the per-(chapter × permalink) loop burned ~21 s of render even after pre-computing per-permalink search/replace strings and gating each permalink on a common-prefix `contains`, vs ~50 ms in Ruby. Rule of thumb: use Liquid for per-chapter shaping; reach for a plugin when the work is N × M with large N and M. The carve-out in WIP.md for `_plugins/offlinify.rb` is the same shape: build-time concerns tightly coupled to Jekyll's internal model belong in `_plugins/`, not in an external script. @@ -190,6 +192,8 @@ Intra-chapter local links must be rewritten in lock-step. Patterns like `[**Coun Both rewrites are mechanical text substitutions over the chapter body string, no parsing required. +**Implementation.** Landed first as the Liquid replace-chain shown in 1.5a plus a sibling pair for the `id=` / `href="#..."` prefixes, all inside `_includes/book-chapter-body.html`. Folded into the single Ruby filter `book_chapter_transform` (`_plugins/book-chapter-transform.rb`) once the per-chapter Liquid-replace dispatch became visible in the profile — ~36 `| replace:` invocations across the heading-shift cascade (12 source levels × 3 cascading passes) plus the 13-replace id/anchor prefix chain plus the 12-pattern whitespace span wrap, ~3.5 s of `Liquid::StandardFilters#replace` per build. The filter does the same passes in C-implemented regex with the cascade collapsed to a single bump-by-N pass; ~0.14 s for 718 chapter calls. + #### print.css updates - `string-set: chapter-title content()` moves from `h1:first-of-type` to `h2:first-of-type`.