diff --git a/BOOKPLAN.md b/BOOKPLAN.md index 74151ad..9518eb6 100644 --- a/BOOKPLAN.md +++ b/BOOKPLAN.md @@ -17,18 +17,22 @@ or `build.bat` then `book.bat`. One Jekyll invocation produces three trees in pa Touch points and what each one already exposes: -- [docs/book.html](docs/book.html) — iterator that concatenates every chapter into one HTML document. Permalink `/book.html`, layout `book-combined`. Contains: whitespace-pattern primitives (p1..p4, indent variants) for the pagedjs whitespace fix; the Roman numerals array; the title-page section (1.3); the front-matter loop (1.7) that emits `site.data.book.front_matter` entries inline between the title page and Part I; the per-part loop. Each part can be **flat** (a single `chapters` list gathered from `page:` / `prefixes:`) or **chaptered** (1.9; a `foreword_page:` plus a nested `chapters:` list, each with its own divider page). Per-chapter body rendering — whitespace fix, heading depth shift, sub-page detection, id uniqueness, header-string emission — is factored out into `_includes/book-chapter-body.html` so all three call sites (front-matter, flat part, chaptered part) share one implementation. Insertion points for new front matter go **after** the title-page section and **before** the `{%- for part in site.data.book.parts -%}` opener. -- [docs/_includes/book-chapter-body.html](docs/_includes/book-chapter-body.html) — per-chapter body processing, called via `{% include book-chapter-body.html chapter=... %}` from each of book.html's chapter-loop callers. Handles markdownify, the pagedjs whitespace fix (consumes the `p1..p4` patterns from the outer scope), heading depth shift (1.5a + sub-page 1.6b), sub-page detection (1.6a, opt-out via `skip_sub_page_detection`), heading-id uniqueness (1.5b), compound running header (1.6c), and emits the final `
` block. Take-it-or-leave-it parameters cover the cases that don't fit the default: `article_class_override` (front-matter and part-foreword), `chapter_anchor_override` (root URL `/` fallback to `ch-introduction`), `skip_sub_page_detection` (front-matter entries don't share an index hierarchy with following chapters). +- [docs/book.html](docs/book.html) — iterator that concatenates every chapter into one HTML document. Permalink `/book.html`, layout `book-combined`. Contains: the Roman numerals array; the title-page section (1.3); the front-matter loop (1.7) that emits `site.data.book.front_matter` entries inline between the title page and Part I; the per-part loop. Each part can be **flat** (page selectors directly on the part, plus an optional `landing_page:`) or **chaptered** (1.9; a `foreword_page:` and/or `landing_page:` plus a nested `chapters:` list, each chapter carrying its own selectors and divider page). Each chapter-loop caller reads its pre-resolved page list from `entry._chapters`, populated once at `:site, :pre_render` by `_plugins/book-resolve-chapters.rb` (so the selector schema stays in one place). Per-chapter body rendering is delegated to `_includes/book-chapter-body.html`, which in turn calls the `book_chapter_transform` Liquid filter (`_plugins/book-chapter-transform.rb`) for whitespace fix, heading-depth shift, heading-id rewrite, and intra-chapter href-anchor rewrite. Insertion points for new front matter go **after** the title-page section and **before** the `{%- for part in site.data.book.parts -%}` opener. +- [docs/_includes/book-chapter-body.html](docs/_includes/book-chapter-body.html) — per-chapter body processing, called via `{% include book-chapter-body.html chapter=... %}` from each of book.html's chapter-loop callers. Handles sub-page detection (1.6a, opt-out via `skip_sub_page_detection`), compound running header (1.6c), and emits the final `
` block. The heavier rewrites — markdownify, the pagedjs whitespace fix (1.5/2.1), the 1.5a heading-depth shift (+ the 1.6b sub-page and 1.9 chaptered-part additional shifts when applicable), the 1.5b heading-id prefix, and the intra-chapter `href="#..."` anchor prefix — are batched into one Ruby pass via the `body | book_chapter_transform: site.baseurl, heading_shift_n, chapter_anchor` filter call. Take-it-or-leave-it parameters cover the cases that don't fit the default: `article_class_override` (front-matter and part-foreword), `chapter_anchor_override` (root URL `/` fallback to `ch-introduction`), `skip_sub_page_detection` (front-matter entries and part landings don't share an index hierarchy with following chapters), `skip_base_heading_shift` (skips the 1.5a `+1` shift; paired with the part's `no_heading_shift` flag), `extra_heading_shift` (adds the 1.9 chaptered-part `+1` shift on top of 1.5a so class / module indexes nest under their chapter divider in the outline). The three `_*_heading_shift` parameters and `skip_base_heading_shift` combine into a single `heading_shift_n` integer the include passes to the filter; the filter then bumps each heading by exactly N levels in one regex pass (capping at h7-stub above source-h6), rather than running 0-3 cascading shift chains. +- [docs/_plugins/book-resolve-chapters.rb](docs/_plugins/book-resolve-chapters.rb) — `:site, :pre_render` generator that walks `_data/book.yml` (`front_matter:`, each flat part, each part's optional `foreword_page:`/`landing_page:`, and each chapter inside a chaptered part) and stashes the resolved page array on `entry._chapters` for `book.html` to iterate. Recognises four selector keys on the entry — `page:` (single URL), `pages:` (list of URLs), `nav_page:` (single nav-path), `nav_pages:` (list of nav-paths) — and one modifier, `no_descent:`, that flips every match from the default `contains` (starts-with) semantics to exact-equality. `landing_page:` and `foreword_page:` are not resolved here; their first-emission / divider-styling semantics live in `book.html`'s caller. Replaces the earlier per-render Liquid include `_includes/book-collect-matches.html` -- the `where_exp` / `where` / `concat` / `sort_by_nav_order` chains were running 37 times per build for ~1.5 s of Liquid expression-interpreter time; precomputing once at site:pre_render is free. +- [docs/_plugins/book-chapter-transform.rb](docs/_plugins/book-chapter-transform.rb) — registers the `book_chapter_transform` Liquid filter that takes a chapter body and applies, in one Ruby pass: the pagedjs inter-span whitespace fix (longest-first regex over `WHITESPACE_PATTERNS`), the N-level heading shift (1.5a + 1.6b + 1.9, where N is precomputed by the include from `skip_base_heading_shift` / `is_sub_page` / `extra_heading_shift`), the 1.5b `id="..."` prefix per chapter, and the corresponding `href="#..."` prefix for intra-chapter anchors. One filter call replaces a chain of ~36 `| replace:` invocations plus a 12-pattern whitespace span wrap from the prior in-template implementation (~3 cascading heading-shift passes × 12 replaces, plus the anchor-id 13-replace pass). - [docs/_layouts/book-combined.html](docs/_layouts/book-combined.html) — minimal wrapper: `` + `{{ site.title }}` + `rouge.css` + `print.css` + `{{ content }}`. No nav, no JS, no chrome. Pagedjs runs on the rendered output of this layout. The only layout the PDF pipeline uses; the older per-source-page `book` layout was retired alongside `_config-pdf.yml`. - [docs/assets/css/print.css](docs/assets/css/print.css) — the book's design. Existing structural rules: `@page` (A4, 22mm margins, running header in `@top-right` via `string(chapter-title)`, page number in `@bottom-right`); `@page :first` (suppresses both — used by the title page); `@page divider` (suppresses both, used by part dividers via `page: divider`); `@page front-matter` (suppresses both, used by `article.front-matter` for 1.7 Introduction-style sections); `@page part-foreword` + `@page chapter-divider` (suppresses both, used by the 1.9 part foreword and per-chapter title pages); `article { break-before: page }`; per-chapter `string-set: chapter-title` on `article.page > .header-string`; the top-level vs sub-chapter heading-size split (`article.page:not(.sub-chapter) > h2:first-of-type` vs `article.page.sub-chapter > h3:first-of-type`); chapter-divider H2 typography (`article.chapter-divider h2` — 24pt centered, no border) plus its subtitle (`.chapter-subtitle` — 13pt italic). -- [docs/_data/book.yml](docs/_data/book.yml) — the manifest book.html iterates over. Schema: `parts:` is an ordered list of numbered parts. A flat part has `{ title, subtitle, prefixes }` or `{ title, subtitle, page }`. A **chaptered** part (1.9) has `{ title, subtitle, foreword_page, chapters }`, where `chapters` is an ordered list of per-chapter entries `{ title, subtitle, landing_page, prefixes }`; each chapter emits a full-page `
` title page followed by its landing page (with the source H1 stripped by the plugin) and the prefix-matched content in URL order. `front_matter:` is a sibling list of front-matter sections (1.7), same per-entry shape as a flat part. `prefixes:` is a list of URL substrings matched via `contains` (multiple prefixes can map to one Part / chapter — used for the Reference Section in 1.8 and for the VBA chapter's landing at `/tB/Packages/VBA` + members under `/tB/Modules/...`); `page:` is a single exact-URL match. Available in Liquid as `site.data.book.parts` and `site.data.book.front_matter`. +- [docs/_data/book.yml](docs/_data/book.yml) — the manifest book.html iterates over. Schema: `parts:` is an ordered list of numbered parts. A **flat part** carries page-selectors directly (`page:` / `pages:` / `nav_page:` / `nav_pages:`) plus an optional `landing_page:`; a **chaptered part** (1.9) replaces the selectors with a `chapters:` list of per-chapter entries `{ title, subtitle, landing_page, page/pages/nav_page/nav_pages, ... }` and may carry a `foreword_page:` and/or a `landing_page:` on the part itself. Each chaptered chapter emits a full-page `
` title page followed by its landing page (with the source H1 stripped by the plugin) and the selector-matched content in `sort_by_nav_order` order. `front_matter:` is a sibling list of front-matter sections (1.7), same selector shape as a flat part. The selector keys: `pages:` is a list of URL substrings matched via `contains` (multiple entries can map to one Part / chapter — used for the Reference Section in 1.8 and for the VBA chapter's landing at `/tB/Packages/VBA` + members under `/tB/Modules/...`); `page:` is the singular alias. `nav_pages:` is the same shape against `page.data["nav_path"]` (populated by `_plugins/nav-path.rb`) — used when a section is most naturally expressed as a nav-tree branch rather than a URL prefix; `nav_page:` is its singular alias. A `no_descent: true` modifier on the entry switches every selector to exact-equality so a single index page can be picked up without sweeping its sub-pages. Additional control flags: `no_outline_entry:` suppresses the part-divider H1 / chapter-divider H2 (so the section's first content heading becomes the bookmark target); `no_heading_shift:` skips the 1.5a base shift for the part's entries (used when the source pages are already authored at H2-and-deeper). Available in Liquid as `site.data.book.parts` and `site.data.book.front_matter`. - [docs/_data/build.yml](#) — **not committed**. Build provenance lives in `site.data.build` (populated in memory by the plugin), so the YAML file is never written. The fields exposed are `site.data.build.commit` (short hash) and `site.data.build.commit_date` (ISO date, `%cs`), or `'unknown'` when git is unavailable. - [docs/_config.yml](docs/_config.yml) — the regular-site config. Reads `site.title` ("twinBASIC Documentation") and `site.footer_content` (the canonical copyright string, reused by the title page and colophon). Also exposes the two combined-build toggles the post-write plugins consult: `also_build_offline: true` (offlinify) and `also_build_pdf: true` (pdfify). Both default to true in the committed config; flip either to `false` to skip that output without touching `_site/`. - [docs/_plugins/build-info.rb](docs/_plugins/build-info.rb) — captures `git rev-parse --short HEAD` and `git log -1 --format=%cs` into `site.data['build']` on `:site, :post_read`. Falls back to `'unknown'` placeholders when git isn't on PATH. - [docs/_plugins/build-phase-timing.rb](docs/_plugins/build-phase-timing.rb) — the cleanest hook-pattern example to copy when writing a new `_plugins/` file (uses every `:site, :hook` boundary). - [docs/_plugins/offlinify.rb](docs/_plugins/offlinify.rb) — the offline-site link rewriter; reference example for build-time concerns tightly coupled to Jekyll's URL model. Runs on `:site, :post_write` when `also_build_offline: true`. - [docs/_plugins/pdfify.rb](docs/_plugins/pdfify.rb) — emits the sparse `_site-pdf/` tree that pagedjs-cli consumes. Reads `/book.html`, copies it verbatim alongside `assets/css/print.css` + `assets/css/rouge.css` + every relative `` target into `-pdf/`. Runs on `:site, :post_write` when `also_build_pdf: true`; retires the older `_config-pdf.yml` second-Jekyll-pass approach. -- [docs/_plugins/book-href-rewrite.rb](docs/_plugins/book-href-rewrite.rb) — post-render Ruby pass for Phase 2.2 cross-references **plus** the 1.9 landing-page H1 strip. Walks each `
` chapter body in the rendered `book.html`, resolves relative-path hrefs against the chapter's URL parent via `URI.merge` (RFC-3986 path normalization from the standard library — no manual `../` folding), and rewrites in-book absolute URLs to `#ch-...` anchors using a `Hash` map built from `_data/book.yml` + `site.pages`. The manifest iteration (`book_entries`) covers `front_matter:`, each part's optional `foreword_page:`, flat parts directly, and each chapter inside a chaptered part. The URL → anchor map symmetrizes the trailing-slash form for folder-style indexes **and** the `.html` suffix. For articles whose anchor matches a chapter `landing_page:`, the plugin strips the first `

...

` from the body (after Liquid's heading shift) so the chapter-divider's H2 is the chapter's sole outline entry. Out-of-book hrefs emit in their resolved absolute form so they're greppable as `href="/..."` during verification. Hooked into `:pages, :post_render` and filtered to `page.path == "book.html"`; non-book pages incur no cost. Replaces an earlier in-template Liquid implementation (~21 s of render overhead vs ~50 ms here). +- [docs/_plugins/book-href-rewrite.rb](docs/_plugins/book-href-rewrite.rb) — post-render Ruby pass for Phase 2.2 cross-references **plus** the landing-page heading strip. Walks each `
` chapter body in the rendered `book.html`, resolves relative-path hrefs against the chapter's URL parent via `URI.merge` (RFC-3986 path normalization from the standard library — no manual `../` folding), and rewrites in-book absolute URLs to `#ch-...` anchors using a `Hash` map built from `_data/book.yml` + `site.pages`. The manifest iteration (`book_entries`) covers `front_matter:`, each part's optional `foreword_page:`, flat parts directly (including their optional `landing_page:`), and each chapter inside a chaptered part. The per-entry `entry_pages` helper mirrors the include's selector schema — `page` / `pages` / `nav_page` / `nav_pages` with optional `no_descent`, plus `landing_page` — so the Liquid and Ruby selector logic stay symmetric (chapters appearing in the rendered book also appear in the URL-to-anchor map). The URL → anchor map symmetrizes the trailing-slash form for folder-style indexes **and** the `.html` suffix. `build_landing_strip_targets` builds an anchor → heading-tag map for both part-level and chapter-level landings; the strip target tag varies with whichever shifts apply to the chapter body (h2 by default for a part landing, h3 by default for a chaptered-chapter landing, h1/h2 when `no_heading_shift` skips one or both of 1.5a / 1.9). Out-of-book hrefs emit in their resolved absolute form so they're greppable as `href="/..."` during verification. Hooked into `:pages, :post_render` and filtered to `page.path == "book.html"`; non-book pages incur no cost. Replaces an earlier in-template Liquid implementation (~21 s of render overhead vs ~50 ms here). +- [docs/_plugins/book-sort.rb](docs/_plugins/book-sort.rb) — registers the `sort_by_nav_order` Liquid filter used in `book.html` in place of the older `sort: "url"` for selector-swept chapter content lists. Groups pages by their *owning index* — an index page (URL ending in `/`) plus every leaf whose URL starts with it form one cluster — so the include's sub-page state machine sees each index immediately before its sub-pages. Within a group: index first (URL order), then nav_order leaves (nav_order ascending, title tiebreak), then leaves without nav_order (alphabetical by title). Group order: each group's lead (first item after the in-group sort) carries the group's position, sorted by `[lead.nav_order, lead.title]` with missing nav_order treated as infinity — so a folder whose index has `nav_order: 2` (just-the-docs's parent-positioning convention) sorts among its sibling chapters by 2 rather than by its leaves' values. Orphan leaves (no present index is a URL prefix) form singleton groups and interleave with index groups by their own nav_order / title. The filter accepts Jekyll `Page`, `PageDrop`, and `Hash` carriers uniformly via `page_url` / `page_attr` helpers (a single page-set can mix all three once intermediate filters wrap things). +- [docs/_plugins/nav-path.rb](docs/_plugins/nav-path.rb) — generator that populates `page.data["nav_path"]` on every titled page with the slash-joined `grand_parent / parent / title` chain. The nav-path is the selector targeted by manifest `nav_page:` / `nav_pages:` entries — a way to sweep pages into a chapter / part by their position in the just-the-docs sidebar tree rather than by URL prefix. Example: `Reference/Operators.md` with `parent: Reference Section` gets nav_path `Reference Section/Operators`; individual operator pages under `/tB/Core/` carry `parent: Operators, grand_parent: Reference Section`, so their nav_paths are `Reference Section/Operators/AddressOf` etc. A `nav_pages: [Reference Section/Operators]` entry then sweeps in the Operators index plus every operator page without enumerating the `/tB/Core/*` URLs one by one. Runs at `:low` priority in the GENERATE phase so the field is set before book.html's RENDER pass reads it. - [docs/book.bat](docs/book.bat) — now only the pagedjs render step: checks `_site-pdf\book.html` exists, makes `_pdf\`, then `npx pagedjs-cli _site-pdf\book.html -o _pdf\book.pdf --outline-tags h1,h2,h3,h4 -t 600000`. Run `build.bat` first (or `bundle exec jekyll build`) to populate `_site-pdf/`. Must be run from `cmd.exe`, not PowerShell (see gotchas). ## Build-time tooling policy @@ -42,7 +46,8 @@ Concretely for the PDF book: - Git-derived build info (commit hash, commit date) → Jekyll plugin (`_plugins/build-info.rb`) that populates `site.data.build` on `:site, :post_read`. Not a pre-build Python step writing `_data/build.yml`. - Chapter manifest → `_data/book.yml` (committed source of truth, hand-edited). - Title page, colophon, TOC content → Liquid in `book.html` and the layouts. -- Heading rewrites → Liquid (existing approach in `book.html`). Per-chapter, one-shot, fast. +- Chapter selector resolution (`page` / `pages` / `nav_page` / `nav_pages` / `no_descent`, the `sort_by_nav_order` ordering, and `foreword_page`/`landing_page` resolution) → Jekyll plugin (`_plugins/book-resolve-chapters.rb`) running at `:site, :pre_render`. The Liquid implementation (formerly in `_includes/book-collect-matches.html`) was running ~37 `where_exp` invocations per build for ~1.5 s of Liquid expression-interpreter time; resolving once into `entry._chapters` is free. +- Per-chapter body rewrites (pagedjs whitespace fix, heading-depth shift, heading-id prefix, intra-chapter href anchor prefix) → Jekyll plugin (`_plugins/book-chapter-transform.rb`), exposed as the `book_chapter_transform` Liquid filter that `_includes/book-chapter-body.html` calls once per chapter. The Liquid version was a chain of ~36 `| replace:` invocations plus a 12-pattern whitespace span wrap per chapter; the filter does the same passes in C-implemented regex over the body string, with the heading-shift cascade collapsed to a single bump-by-N regex. - Cross-reference href rewrites → Jekyll plugin (`_plugins/book-href-rewrite.rb`), running on `:pages, :post_render`. The first cut was inline Liquid; the per-(chapter × permalink) loop burned ~21 s of render even after pre-computing per-permalink search/replace strings and gating each permalink on a common-prefix `contains`, vs ~50 ms in Ruby. Rule of thumb: use Liquid for per-chapter shaping; reach for a plugin when the work is N × M with large N and M. The carve-out in WIP.md for `_plugins/offlinify.rb` is the same shape: build-time concerns tightly coupled to Jekyll's internal model belong in `_plugins/`, not in an external script. @@ -187,6 +192,8 @@ Intra-chapter local links must be rewritten in lock-step. Patterns like `[**Coun Both rewrites are mechanical text substitutions over the chapter body string, no parsing required. +**Implementation.** Landed first as the Liquid replace-chain shown in 1.5a plus a sibling pair for the `id=` / `href="#..."` prefixes, all inside `_includes/book-chapter-body.html`. Folded into the single Ruby filter `book_chapter_transform` (`_plugins/book-chapter-transform.rb`) once the per-chapter Liquid-replace dispatch became visible in the profile — ~36 `| replace:` invocations across the heading-shift cascade (12 source levels × 3 cascading passes) plus the 13-replace id/anchor prefix chain plus the 12-pattern whitespace span wrap, ~3.5 s of `Liquid::StandardFilters#replace` per build. The filter does the same passes in C-implemented regex with the cascade collapsed to a single bump-by-N pass; ~0.14 s for 718 chapter calls. + #### print.css updates - `string-set: chapter-title content()` moves from `h1:first-of-type` to `h2:first-of-type`. diff --git a/WIP.md b/WIP.md index 7a9c0b9..7c06771 100644 --- a/WIP.md +++ b/WIP.md @@ -428,11 +428,137 @@ Python scripts are reserved for non-render concerns: one-off content conversion From `docs/`: -- `bundle exec jekyll build` (or `build.bat`) — builds three trees in a single Jekyll run: the online copy at `_site/`, a `file://`-browsable copy at `_site-offline/`, and the sparse pagedjs source at `_site-pdf/`. The offline pass (`_plugins/offlinify.rb`, activated by `also_build_offline: true` in `_config.yml`) adds ~3-5s and the PDF pass (`_plugins/pdfify.rb`, activated by `also_build_pdf: true`) adds <1s on top of the normal ~13s build. The PDF plugin copies `_site/book.html` (the concatenated chapter document rendered via `_layouts/book-combined.html`) verbatim into `_site-pdf/`, along with `assets/css/print.css`, `assets/css/rouge.css`, and every relative `` target -- just what pagedjs needs to render the book PDF. After the copy, the plugin deletes `_site/book.html`: the concatenated document is a build artifact for the PDF render path alone, not a public page on the online site. The companion `offline_exclude: [..., book.html]` entry in `_config.yml` keeps `offlinify.rb` from copying it into `_site-offline/`. The two safeguards are independent -- the exclude pattern works regardless of whether offlinify walks `_site/` before or after pdfify's delete, and pdfify's delete works regardless of whether offlinify is enabled. After Jekyll's WRITE phase, the offline plugin walks `_site/`, copies binary assets verbatim into `_site-offline/`, and for each HTML and CSS file rewrites every root-absolute `href` / `src` / `url()` to a page-relative path with the resolved file extension (`/FAQ` → `../../FAQ.html`, `/Tutorials/CEF/` → `../../Tutorials/CEF/index.html`). It also patches the offline copy of `assets/js/just-the-docs.js` in two places — `navLink()` to match the active nav entry by resolved DOM `link.href` rather than `document.location.pathname` (the upstream pathname-vs-attribute compare returns no match under `file://`, leaving the sidebar with no `.active` class so the nav appears collapsed on every navigation), and `initSearch()` to read the lunr index from `window.SEARCH_DATA` rather than fetching `search-data.json` over `XMLHttpRequest` (XHR to `file://` resources is blocked by browsers; classic ` diff --git a/docs/_plugins/book-chapter-transform.rb b/docs/_plugins/book-chapter-transform.rb new file mode 100644 index 0000000..c25f618 --- /dev/null +++ b/docs/_plugins/book-chapter-transform.rb @@ -0,0 +1,170 @@ +# frozen_string_literal: true + +# Liquid filter that folds the chained `replace` passes in +# `_includes/book-chapter-body.html` into one Ruby method call. +# +# === Problem === +# +# The per-chapter body transformation in `book-chapter-body.html` was +# a stack of `replace` filter chains: +# +# 1. baseurl-prefixed `src=` strip (1 replace) +# 2. inter-span whitespace wrapping (12 patterns, longest first) (12 replaces) +# 3. heading-shift cascade -- 1.5a base shift (12 replaces, conditional) +# 4. heading-shift cascade -- 1.6b sub-page shift (12 replaces, conditional) +# 5. heading-shift cascade -- 1.9 chaptered-part extra shift (12 replaces, conditional) +# 6. anchor-id prefix injection (h2..h6, h7-stub, with/without +# class="no_toc", plus `href="#`) (13 replaces) +# +# Worst case per chapter: 1 + 12 + 12 + 12 + 12 + 13 = 62 +# `Liquid::StandardFilters#replace` invocations. Across ~745 chapter +# bodies the chain produced the bulk of the build's 87,991 +# `replace` calls (0.6 s in ruby-prof, mostly the per-call Liquid +# dispatch overhead -- each individual replace is a fast literal +# `String#gsub`). Each `replace` also rebuilds the intermediate +# string, so a 62-step chain produces 62 intermediate copies of the +# chapter body before the result lands. +# +# === Approach === +# +# `book_chapter_transform(body, baseurl, heading_shift_n, chapter_anchor)` +# does all six passes in one method: +# +# * Step 1 uses a single literal `gsub!` keyed on the live +# `site.baseurl` value (passed as the second filter argument so +# the constant isn't baked into the plugin at load time). +# * Step 2 walks a frozen `WHITESPACE_PATTERNS` table of 12 +# literal `[search, replacement]` pairs and applies them in +# longest-first order, matching the Liquid chain's order +# exactly. Literal `gsub!` on each. +# * Steps 3-5 collapse into a single regex pass keyed on +# `heading_shift_n` (= 0, 1, 2, or 3 -- precomputed in Liquid +# from `skip_base_heading_shift`, `is_sub_page`, and +# `extra_heading_shift`). The N-pass cascade of the Liquid +# chain is equivalent to a one-pass regex that bumps each +# heading level by N, capping at `h7-stub` for levels above 6. +# * Step 6 replaces the 13 literal `replace` calls with one regex +# for heading-id injection (matches `` -> ``). The +# `\b` word boundary anchors after the digit so `` +# (hypothetical) wouldn't accidentally match. +# +# === When it runs === +# +# Per-render, inside Liquid as a standard filter. The plugin file +# only needs `Liquid::Template.register_filter`; no hook. + +module Jekyll + module BookChapterTransform + SP = " " + NL = "\n" + S4 = " " + S8 = " " + S12 = " " + S16 = " " + + # Inter-span whitespace patterns for pagedjs's page splitter -- + # see book.html's header comment for the full rationale. Longest + # pattern first so each consumes its bytes before a shorter + # pattern can fragment them; mirrors the Liquid chain's order in + # `book-chapter-body.html` exactly. + WHITESPACE_PATTERNS = [ + # p1: blank line after code line with trailing space + ["#{SP}#{NL}#{SP}#{NL}#{SP}#{NL}#{SP}#{NL}#{NL}#{SP}#{NL}#{NL}#{SP}#{NL}#{SP}#{NL}#{S12}#{SP}#{NL}#{S12}#{SP}#{NL}#{S8}#{SP}#{NL}#{S8}#{SP}#{NL}#{S4}#{SP}#{NL}#{S4}#{SP}#{NL}#{SP}#{NL}#{NL}#{S16}#{NL}#{S16}#{NL}#{S12}#{NL}#{S12}#{NL}#{S8}#{NL}#{S8}#{NL}#{S4}#{NL}#{S4}#{NL}#{NL} `. + HEADING_SHIFT_RE = /<(\/?)h([1-6])\b/.freeze + + # Heading-id prefix regex. Matches both `, not an

. Stripping - # the first

keeps the chapter-divider's H2 as the chapter's sole + # 1.5a + 1.9-extra-shift to every chapter body by default. The source H1 + # therefore arrives at the post-render HTML as an

. Stripping the + # first

keeps the chapter-divider's H2 as the chapter's sole # outline entry at depth 2 -- without this strip the landing would emit # a redundant H3 with the same title text, just one level deeper. - FIRST_LANDING_HEADING_REGEX = /]*>.*?<\/h3>/m.freeze + # + # When the chapter or its containing part sets `no_heading_shift`, the + # landing's source H1 lands at a different depth (H2 if one shift is + # skipped, H1 if both are), so the strip target is computed per + # chapter in `build_landing_strip_targets`. # `url.gsub('/', '-').sub(/^-/, '').sub(/-$/, '')` then prepend "ch-". # Matches the chapter anchor scheme established in BOOKPLAN.md 1.5b. @@ -74,9 +77,11 @@ def self.book_entries(site) entries = [] entries.concat(manifest["front_matter"] || []) (manifest["parts"] || []).each do |part| - entries << part if part["page"] || part["prefixes"] + if part["page"] || part["pages"] || part["nav_page"] || part["nav_pages"] || part["landing_page"] + entries << part + end if part["foreword_page"] - entries << { "page" => part["foreword_page"], "title" => part["title"] } + entries << { "page" => part["foreword_page"], "title" => part["title"], "no_descent" => true } end (part["chapters"] || []).each { |ch| entries << ch } end @@ -84,46 +89,115 @@ def self.book_entries(site) end # Pages matched by a single book.yml entry. An entry may set any - # of `page:` (exact URL match, one-chapter sections like the FAQ - # or the root index), `landing_page:` (the chapter's intro page in - # a chaptered part; treated like `page:` for map-building), and - # `prefixes:` (starts-with match per prefix). The union is returned - # de-duplicated since the landing typically also matches one of the - # prefixes (e.g. `/tB/Packages/VBRUN/` landing matches the prefix - # `/tB/Packages/VBRUN/`). + # combination of: + # page: single URL prefix (alias for a one-element + # `pages:` list). + # pages: list of URL prefixes. By default each prefix is + # a starts-with match; when `no_descent: true` + # each prefix matches by exact URL equality + # instead. + # nav_page: single nav_path prefix (alias for a one-element + # `nav_pages:` list). `nav_path` is populated by + # _plugins/nav-path.rb as the slash-joined + # `grand_parent / parent / title` chain. + # nav_pages: list of nav_path prefixes. Same no_descent + # semantics as `pages:`. + # landing_page: single URL, exact match. Treated like `page:` + # for map-building. Carries its own first-emission + # semantics in book.html which doesn't matter here. + # no_descent: when truthy, every `page` / `pages` / `nav_page` + # / `nav_pages` match uses exact equality instead + # of starts-with. Useful when a prefix like `/Foo/` + # should match only the index page, not its + # sub-pages. + # The union is returned de-duplicated since the landing typically + # also matches one of the prefixes (e.g. `/tB/Packages/VBRUN/` + # landing matches the prefix `/tB/Packages/VBRUN/`), and a page + # can be picked up by both a URL and a nav-path selector. def self.entry_pages(entry, site) pages = [] - if entry["page"] - pages.concat(site.pages.select { |p| p.url == entry["page"] }) + no_descent = !!entry["no_descent"] + + url_specs = [] + url_specs << entry["page"] if entry["page"] + url_specs.concat(entry["pages"]) if entry["pages"] + url_specs.each do |prefix| + pages.concat(site.pages.select { |p| + no_descent ? p.url == prefix : p.url.start_with?(prefix) + }) + end + + nav_specs = [] + nav_specs << entry["nav_page"] if entry["nav_page"] + nav_specs.concat(entry["nav_pages"]) if entry["nav_pages"] + nav_specs.each do |np| + pages.concat(site.pages.select { |p| + nav_path = p["nav_path"] + next false unless nav_path + no_descent ? nav_path == np : nav_path.start_with?(np) + }) end + if entry["landing_page"] pages.concat(site.pages.select { |p| p.url == entry["landing_page"] }) end - if entry["prefixes"] - entry["prefixes"].each do |prefix| - pages.concat(site.pages.select { |p| p.url.start_with?(prefix) }) - end - end + pages.uniq end - # Set of chapter anchors that correspond to a chaptered part's - # `landing_page:`. The plugin strips the first `

...

` (the - # source H1 after 1.5a + 1.9 extra shift) from these articles so the - # chapter-divider's H2 is the sole outline entry for the chapter at - # depth 2 -- without the strip the landing would emit a redundant H3 - # carrying the same title text one outline level deeper. - def self.build_landing_anchors(site) - set = Set.new + # Map of chapter-anchor -> heading-tag-to-strip for landing pages + # that need their redundant top-of-body heading removed. Two + # flavours of landing carry the same redundancy: + # + # * Part-level `landing_page:` on a flat part. The part divider's + # H1 carries the part title; the landing's source H1 would + # repeat it one level deeper (H2 after the 1.5a shift, or H1 + # when `no_heading_shift` skips the shift). + # * Chapter-level `landing_page:` on a chaptered-part chapter. + # The chapter divider's H2 carries the chapter title; the + # landing's source H1 lands at h3 by default (1.5a + 1.9 extra + # shifts) and is stripped so the chapter divider's H2 is the + # chapter's sole outline entry at depth 2. + # + # The strip is skipped entirely when the carrying entry sets + # `no_outline_entry: true` -- without the divider's heading in the + # outline, the landing's first heading IS the entry's bookmark + # target and must stay. + # + # Strip target tag for a part-level landing: + # default: strip h2 + # part.no_heading_shift: strip h1 + # + # Strip target tag for a chapter-level landing: + # default (both shifts apply): strip h3 + # ch_entry.no_heading_shift (skip 1.9 extra shift only): strip h2 + # part.no_heading_shift (skip 1.5a base shift only): strip h2 + # both flags set (no shifts applied): strip h1 + def self.build_landing_strip_targets(site) + map = {} manifest = site.data["book"] - return set unless manifest + return map unless manifest (manifest["parts"] || []).each do |part| + part_skip_base = !!part["no_heading_shift"] + + if part["landing_page"] && !part["no_outline_entry"] + level = part_skip_base ? 1 : 2 + anchor = chapter_anchor(part["landing_page"], part["title"]) + map[anchor] = "h#{level}" + end + (part["chapters"] || []).each do |ch| next unless ch["landing_page"] - set << chapter_anchor(ch["landing_page"], ch["title"]) + next if ch["no_outline_entry"] + ch_skip_extra = !!ch["no_heading_shift"] + level = 1 + level += 1 unless part_skip_base + level += 1 unless ch_skip_extra + anchor = chapter_anchor(ch["landing_page"], ch["title"]) + map[anchor] = "h#{level}" end end - set + map end # Build the permalink -> chapter-anchor map. Folder-style index @@ -260,7 +334,7 @@ def self.process(page) return if url_to_anchor.empty? parent_map = build_anchor_to_parent(site) return if parent_map.empty? - landing_anchors = build_landing_anchors(site) + landing_strip_targets = build_landing_strip_targets(site) baseurl = normalize_baseurl(site.config["baseurl"]) start_time = Process.clock_gettime(Process::CLOCK_MONOTONIC) @@ -273,8 +347,9 @@ def self.process(page) body = Regexp.last_match(3) article_end = Regexp.last_match(4) - if landing_anchors.include?(anchor_id) - stripped_body = body.sub(FIRST_LANDING_HEADING_REGEX, "") + if (level = landing_strip_targets[anchor_id]) + regex = /<#{level}\b[^>]*>.*?<\/#{level}>/m + stripped_body = body.sub(regex, "") if stripped_body != body body = stripped_body landings_stripped += 1 @@ -290,7 +365,7 @@ def self.process(page) "#{article_open}#{body}#{article_end}" end - Jekyll.logger.info "BookHrefRewrite:", "rewrote #{rewritten} chapter bodies, stripped #{landings_stripped} landing H3s" + Jekyll.logger.info "BookHrefRewrite:", "rewrote #{rewritten} chapter bodies, stripped #{landings_stripped} landing heading(s)" elapsed_ms = ((Process.clock_gettime(Process::CLOCK_MONOTONIC) - start_time) * 1000).round(0) Jekyll.logger.info "BookHrefRewrite:", "BookHrefRewriter ran in #{elapsed_ms}ms." diff --git a/docs/_plugins/book-resolve-chapters.rb b/docs/_plugins/book-resolve-chapters.rb new file mode 100644 index 0000000..aa38592 --- /dev/null +++ b/docs/_plugins/book-resolve-chapters.rb @@ -0,0 +1,171 @@ +# frozen_string_literal: true + +# Precomputes book.yml's chapter page lists at `:site, :pre_render` so +# `book.html` doesn't re-resolve them in Liquid on every render. +# +# === Problem === +# +# `book.html` and `_includes/book-collect-matches.html` previously used +# `site.pages | where_exp: "p", "p.url contains prefix"` (or its +# `nav_path` cousin) per URL prefix to sweep pages into each +# front-matter entry, part, and chapter. Each `where_exp` call walks +# ~837 `site.pages` evaluating a Liquid expression per element. On a +# build with ~37 such sweeps, ruby-prof attributes ~1.5 s to +# `Jekyll::Filters#where_exp` -- ~40 ms per call. +# +# `book-collect-matches.html` is included once per entry, the same +# sweep is followed by a landing-page exclusion filter +# (`where_exp: "p", "p.url != landing"`) and a `sort_by_nav_order` +# Liquid filter call. Even though `sort_by_nav_order` is itself Ruby +# (`_plugins/book-sort.rb`), the surrounding orchestration is all +# Liquid -- the include alone costs ~0.7 s of wall-clock per build +# at the current site size. +# +# === Approach === +# +# Walk `_data/book.yml` once at `:site, :pre_render` and resolve every +# entry's chapter list to an `Array` stored on the entry +# hash as `_chapters`. The resolver applies the same selector schema +# (page / pages / nav_page / nav_pages / no_descent), the same +# landing-page-first ordering, and the same `sort_by_nav_order` sort +# the templates were producing in Liquid -- but in one O(n) Ruby pass +# per entry instead of one O(n * Liquid-expression-cost) sweep per +# URL prefix. +# +# `book.html` then reads `entry._chapters` directly. No `where_exp`, +# no `book-collect-matches.html` include. +# +# === When it runs === +# +# `:site, :pre_render` rather than as a `Generator` because +# `_plugins/nav-path.rb` is a `Generator` with `priority :low` that +# populates `page.data["nav_path"]`. Hooks fire after all generators, +# so by the time this hook runs `nav_path` is set on every page and +# the `nav_page` / `nav_pages` selectors can use it. +# +# === Output shape === +# +# For each entry that emits chapters in `book.html`: +# +# front_matter[i]["_chapters"] -- Array +# parts[i]["_chapters"] -- flat parts only; chaptered +# parts leave this nil +# parts[i]["chapters"][j]["_chapters"] -- chaptered parts only +# +# The Liquid template reads these as plain Hash accesses on the YAML- +# loaded data: `fm._chapters`, `part._chapters`, `ch_entry._chapters`. +# Each element is a `Jekyll::Page` -- the same object type `site.pages` +# iteration produces -- so the inner `for chapter in ..._chapters` +# loop in `book.html` sees no behavioural difference from the old +# `assign collected = ... | sort_by_nav_order` chain. + +require_relative "book-sort" + +module Jekyll + module BookResolveChapters + extend self + + def resolve!(site) + book = site.data["book"] + return unless book + + pages = site.pages + sorter = ChapterSorter.new + + (book["front_matter"] || []).each do |fm| + # Front-matter entries have no landing concept -- they're just + # a flat prefix sweep + sort. + fm["_chapters"] = sorter.sort_by_nav_order(collect_matches(fm, pages)) + end + + (book["parts"] || []).each do |part| + if part["chapters"] + # Chaptered part: each chapter has its own _chapters. The + # part's own landing_page / foreword_page are still resolved + # inline in book.html via cheap `where: "url"` filter calls + # (only 6 such calls total across the build, no precompute + # needed). + part["chapters"].each do |ch| + ch["_chapters"] = build_chapter_list(ch, pages, sorter) + end + else + # Flat part: build the part's own _chapters with the + # landing_page (if any) emitted first. + part["_chapters"] = build_chapter_list(part, pages, sorter) + end + end + end + + # Landing first (if any), then prefix-swept rest with landing + # excluded, sorted by nav order. Mirrors the assembly book.html + # was doing in Liquid via `chapters = landing | concat: rest`. + def build_chapter_list(entry, pages, sorter) + list = [] + landing_url = entry["landing_page"] + if landing_url + landing = pages.find { |p| p.url == landing_url } + list << landing if landing + end + rest = collect_matches(entry, pages) + rest = rest.reject { |p| p.url == landing_url } if landing_url + list.concat(sorter.sort_by_nav_order(rest)) + list + end + + # Same selector schema as `_includes/book-collect-matches.html`: + # + # page single URL prefix; shorthand for [page] in `pages` + # pages list of URL prefixes; `contains` match + # nav_page single nav-path prefix; shorthand for [np] + # nav_pages list of nav-path prefixes; `contains` match + # no_descent switch every match from `contains` to `==` + # + # Liquid's `contains` on a String is substring match; `==` is + # exact equality. Ruby equivalents: `include?` and `==`. The + # `nav_path` data field may be nil on pages without a `title` + # (`_plugins/nav-path.rb` only populates titled pages), so the + # `nav_path` branches `.to_s` the value before the comparison so + # a nil value yields an empty-string match (matches nothing in + # `include?`, matches `np == ""` only when `np` itself is ""). + private def collect_matches(entry, pages) + out = [] + no_descent = entry["no_descent"] + + url_specs = [] + url_specs << entry["page"] if entry["page"] + url_specs.concat(entry["pages"]) if entry["pages"] + url_specs.each do |prefix| + if no_descent + pages.each { |p| out << p if p.url == prefix } + else + pages.each { |p| out << p if p.url.include?(prefix) } + end + end + + nav_specs = [] + nav_specs << entry["nav_page"] if entry["nav_page"] + nav_specs.concat(entry["nav_pages"]) if entry["nav_pages"] + nav_specs.each do |np| + if no_descent + pages.each { |p| out << p if p.data["nav_path"] == np } + else + pages.each { |p| out << p if p.data["nav_path"].to_s.include?(np) } + end + end + + out + end + + # Reuse `Jekyll::BookSort#sort_by_nav_order` (from book-sort.rb). + # The module's methods don't depend on any state; instantiating + # a stateless class that `include`s it lets us call the method + # directly, the same way Liquid's strainer would. + class ChapterSorter + include BookSort + end + end +end + +Jekyll::Hooks.register :site, :pre_render do |site| + Jekyll::BookResolveChapters.resolve!(site) +end diff --git a/docs/_plugins/book-sort.rb b/docs/_plugins/book-sort.rb new file mode 100644 index 0000000..253a641 --- /dev/null +++ b/docs/_plugins/book-sort.rb @@ -0,0 +1,138 @@ +# frozen_string_literal: true + +# Liquid filter for ordering a chapter's content pages in book.html. +# +# Pages are grouped by their *owning index* so an index page (URL +# ending in `/`) and all its leaves stay together in the iteration -- +# the include's sub-page state machine depends on each index appearing +# in the stream immediately before its sub-pages, otherwise a stray +# leaf from a different folder will reset the state mid-group and the +# rest of the index's sub-pages will be promoted to top-level chapters. +# +# Group key: +# * an index page (URL ends in `/`): its own URL. +# * a leaf whose URL starts with one of the present index URLs: +# the longest matching index URL. +# * an orphan leaf (no present index is a URL prefix): its own URL +# -- the leaf forms a singleton group and sorts independently. +# +# So /Features/Language/, /Features/Language/Alias-Types, and +# /Features/Language/Data-Types all key on /Features/Language/ and +# form one group; /Features/Attributes-Intro and /Features/64bit +# don't match any index and stay independent. Group ordering by the +# lead's nav_order therefore lets independent top-level chapters +# interleave with index groups while sub-pages still cluster with +# their index. +# +# Within a group: +# 1. The index (at most one per group) emits first, in URL order. +# 2. Leaves with `nav_order` follow, sorted by nav_order ascending, +# ties broken by title (case-insensitive). +# 3. Leaves without `nav_order` follow, sorted alphabetically by +# title (case-insensitive). +# +# Group order: each group's "lead" item (the first item after the +# in-group sort -- the index if present, else the first nav_order +# leaf, else the first by-title leaf) carries the group's position. +# Groups are sorted by `[lead.nav_order, lead.title]` with a missing +# nav_order treated as infinity, so a folder whose index has +# `nav_order: 2` (just-the-docs's parent-positioning convention) +# sorts among its sibling chapters by 2, not by its leaves' values. +# +# Used in `book.html` in place of `sort: "url"` for prefix-swept +# chapter content lists. +# +# Type tolerance: Liquid passes page-like objects through filters in +# three flavours depending on what filters ran upstream: +# 1. Jekyll::Page -- straight from `site.pages`. `.url` is +# a method; Page#[] reads frontmatter +# data and does NOT expose `url`. +# 2. Jekyll::Drops::PageDrop -- when an intermediate filter has +# already wrapped the page. Both `.url` +# and `["url"]` resolve. +# 3. Hash -- the result of `Drop#to_h` (or a manual +# hash). Mixed in alongside Pages once +# other plugins precompute nav data: the +# hash carries frontmatter PLUS the +# drop's method-returned values (url, +# content, name, ...). Access through +# the helpers below so the filter works +# uniformly across all three. + +module Jekyll + module BookSort + def sort_by_nav_order(pages) + pages = pages.uniq + + index_urls = pages + .select { |p| page_url(p).to_s.end_with?("/") } + .map { |p| page_url(p).to_s } + + groups = Hash.new { |h, k| h[k] = [] } + pages.each do |p| + url = page_url(p).to_s + if url.end_with?("/") + groups[url] << p + else + owner = index_urls + .select { |iu| url.start_with?(iu) } + .max_by(&:length) + groups[owner || url] << p + end + end + + sorted_groups = {} + groups.each do |key, members| + sorted_groups[key] = sort_within_group(members) + end + + ordered_keys = sorted_groups.keys.sort_by do |key| + lead = sorted_groups[key].first + nav_order = page_attr(lead, "nav_order") + title = page_attr(lead, "title").to_s.downcase + [nav_order.nil? ? Float::INFINITY : nav_order, title] + end + + ordered_keys.flat_map { |key| sorted_groups[key] } + end + + private + + # In-group order: index first (URL order), then nav_order leaves + # (nav_order, title tiebreak), then nav_order-less leaves (title). + def sort_within_group(members) + indexes, leaves = members.partition { |p| page_url(p).to_s.end_with?("/") } + indexes_sorted = indexes.sort_by { |p| page_url(p).to_s } + + with_order, without_order = leaves.partition { |p| !page_attr(p, "nav_order").nil? } + + with_order_sorted = with_order.sort_by do |p| + [page_attr(p, "nav_order"), page_attr(p, "title").to_s.downcase] + end + without_order_sorted = without_order.sort_by do |p| + page_attr(p, "title").to_s.downcase + end + + indexes_sorted + with_order_sorted + without_order_sorted + end + + # Page#url is a method, not a data key, so Page#[] returns nil. + # Hashes and Drops both expose "url" via the `[]` accessor; Drops + # also via the method. Branch on Hash explicitly so the Page case + # falls through to the method call. + def page_url(p) + return p["url"] if p.is_a?(Hash) + p.url if p.respond_to?(:url) + end + + # `nav_order`, `title`, `nav_path`, etc. live in frontmatter data + # and are exposed via `[]` on all three carrier types -- Page + # delegates to `data[]`, Drop invokes its method, Hash reads + # directly. + def page_attr(p, name) + p[name] + end + end +end + +Liquid::Template.register_filter(Jekyll::BookSort) diff --git a/docs/_plugins/html-compress.md b/docs/_plugins/html-compress.md new file mode 100644 index 0000000..114e388 --- /dev/null +++ b/docs/_plugins/html-compress.md @@ -0,0 +1,162 @@ +# HtmlCompress + +`_plugins/html-compress.rb` runs the HTML whitespace compression that wraps every page's render chain — the same job just-the-docs's vendor/compress.html Liquid layout was doing, but in Ruby instead of Liquid filters. Output is byte-identical to the layout-based version (verified by recursive diff of every file in `_site/` against a vendor/compress.html baseline). The Liquid layout is short-circuited to a `{{ content }}` passthrough via `compress_html.ignore.envs: all` in `_config.yml`; the plugin then runs at `:pages, :post_render` / `:documents, :post_render` with `priority :high`, so the compressed bytes are what offlinify and Jekyll's writer see. + +This file sits in `_plugins/` for the same reasons as `offlinify.md` and `pdfify.md`: it lives next to the code it documents, and Jekyll's `_plugins/` folder is plugin-only territory, so this Markdown never gets rendered into the public site. + +## Why a plugin instead of the layout? + +vendor/compress.html ships with the [`jekyll-compress-html`](http://jch.penibelst.de/) approach: every transformation expressed as a Liquid filter chain. In our build's profile, the layout alone consumed ~2.4 s of RENDER time across 837 pages — well over a quarter of all per-template Liquid evaluation. The work itself isn't that much. With `site.compress_html.{endings,startings,comments,clippings}` all unset (their default), the layout's logic collapses to a single operation per page: + +```liquid +outside-of-
 text | split: " " | join: " "
+```
+
+Liquid's `split: " "` lowers to Ruby's `String#split(" ")`, which uses its whitespace-mode special-case (any run of whitespace is the separator, leading/trailing whitespace gets stripped). The result is a `Array` of every whitespace-delimited token in the page body. `join: " "` then walks it back to a single string. For a typical page, that's thousands of token allocations; across 837 pages it's millions of small `String` objects, with the corresponding allocator and GC pressure.
+
+The same algorithm in Ruby is one method call:
+
+```ruby
+content.split(" ").join(" ")
+```
+
+— still allocating the same token array, but skipping Liquid's filter dispatch, parse-tree walk, and per-filter `Liquid::Context` plumbing on top. In practice this saves ~2.4 s per build of the work that compress.html was doing, with the plugin's own runtime negligible against that.
+
+The bypass mechanism is upstream-supplied: vendor/compress.html's very first conditional checks `site.compress_html.ignore.envs`. When set to `"all"` (or to a string containing the current `jekyll.environment`), the layout's body is just `{{ content }}` and the rest of the template is skipped. The plugin then takes over.
+
+## What vendor/compress.html does (and what we mirror)
+
+The full Liquid layout supports four configurable transformations beyond the pre-block whitespace collapse:
+
+| Config key | What it strips |
+|------------|----------------|
+| `compress_html.endings` | Closing tags HTML5 allows omitting: ``, ``, ``, etc. |
+| `compress_html.startings` | Optional opening tags: ``, ``, ``. |
+| `compress_html.comments` | HTML `` comment blocks. |
+| `compress_html.clippings` | Whitespace adjacent to block-level element tags (32 elements by default). |
+
+This site sets **none** of those — they all default to `nil`, and the Liquid `for` loops over them iterate zero times. So the layout's net behaviour reduces to the one transformation the plugin replicates: split the content by `
` blocks, collapse whitespace runs in everything outside those blocks, preserve the `
` bodies verbatim.
+
+If a future config sets any of the four keys, the plugin would no longer match — at that point the choice is to either extend the plugin to handle the additional transforms or to revert to the layout for that build. The plugin's header comment flags this dependency.
+
+## How the plugin matches the layout's output
+
+Three details matter for byte-identical output:
+
+1. **Pre-block boundary recipe.** vendor/compress.html splits content by the literal string `` to separate the inside of a pre block from the content after, and only collapses whitespace in the after-content. The plugin uses the same boundary algebra but expressed as one regex matching a full `
...
` block: + + ```ruby + PRE_BLOCK_RE = //m.freeze + ``` + + `content.split(/(#{PRE_BLOCK_RE})/, -1)` returns an alternating array `[outside, pre, outside, pre, ..., outside]` (the capture group keeps each matched block in the result). Even indices are outside-of-pre, odd are pre bodies. The plugin runs `split(" ").join(" ")` only on the even indices, preserving the odd indices byte-for-byte. + +2. **Whitespace-mode split.** Liquid's `split: " "` argument is the literal one-space string `" "`. Ruby's `String#split(" ")` special-cases this exact argument (per CRuby docs) to behave as "split on whitespace runs, strip leading/trailing whitespace, drop empty trailing entries". The plugin uses `split(" ")` (not `split(/\s+/)`, which has different leading-whitespace semantics) so the per-segment collapse matches the layout's output exactly. + +3. **Trailing newline preservation.** `split(" ")` strips the final segment's trailing whitespace, so a content that ended with `\n` would emerge as `` after the join. But vendor/compress.html's *template* ends with a literal newline after its `{{ _content }}` output, so the layout-emitted file ends with one `\n` regardless. The plugin re-adds a trailing `\n` when the input had one: + + ```ruby + had_trailing_nl = content.end_with?("\n") + # ... split / join ... + result << "\n" if had_trailing_nl && !result.end_with?("\n") + ``` + + This is the one place where understanding the layout's template structure (not just its filter logic) matters — without the trailing newline restore the plugin's output differs by one byte per page. + +## Layout-chain gating + +vendor/compress.html only ran on pages whose layout chain reached it. The just-the-docs layout chain for normal pages is: + +``` +page.md (layout: default) +└── default.html (layout: table_wrappers) + └── table_wrappers.html (layout: vendor/compress) + └── vendor/compress.html (no layout) +``` + +Pages that don't use any of these layouts — jekyll-redirect-from stubs, the SCSS-derived CSS pages, `assets/js/zzzz-search-data.json`, `book.html` (which uses the minimal `book-combined` layout that has no parent) — were left untouched by the layout. The plugin has to match that gating, otherwise it would compress files that compress.html doesn't, breaking byte-identity. + +The gate is precomputed once at `:site, :pre_render`: + +```ruby +def self.precompute_compress_layouts!(site) + @compress_layouts = Set.new + site.layouts.each_key do |name| + walked = [] + cur_name = name + while cur_name && !walked.include?(cur_name) + walked << cur_name + if cur_name == "vendor/compress" + walked.each { |n| @compress_layouts << n } + break + end + cur = site.layouts[cur_name] + cur_name = cur ? cur.data["layout"] : nil + end + end +end +``` + +Walk every layout in `site.layouts`, follow `data["layout"]` from each, mark every layout on the walked path the moment the walk reaches `vendor/compress`. Cycles are guarded by the `walked.include?` check. After the precompute, `@compress_layouts` holds the set of layout keys whose render output passes through compress; the per-page hook checks `@compress_layouts.include?(page.data["layout"])` and skips the page entirely when it doesn't match. + +Two subtleties make this trickier than it looks: + +- **Layout keys vs filenames.** `site.layouts` is keyed by layout name without extension (`"default"`, `"vendor/compress"`), and `page.data["layout"]` carries the same shape. But the `Layout` object's `.name` attribute is the filename *with* extension (`"default.html"`, `"vendor/compress.html"`). The walk must compare against the key, not against `cur.name` — comparing against `cur.name` was the first version's bug and produced an empty `@compress_layouts` set (every page un-gated, every redirect stub compressed, 301-file diff against the baseline). + +- **Theme layouts merge with local ones.** `site.layouts` already contains both `_layouts/*.html` from the site source and the theme's `_layouts/*.html` (and `_layouts/vendor/compress.html`) loaded via `theme.layouts_path`. No manual merge needed; the walk sees all keys uniformly. + +## When it runs + +Three hooks, all at the bottom of `html-compress.rb`: + +```ruby +Jekyll::Hooks.register :site, :pre_render do |site| + HtmlCompress.precompute_compress_layouts!(site) +end + +Jekyll::Hooks.register :pages, :post_render, priority: :high do |page| + next unless page.output.is_a?(String) + next unless HtmlCompress.compress?(page) + HtmlCompress.compress!(page.output) +end + +Jekyll::Hooks.register :documents, :post_render, priority: :high do |doc| + next unless doc.output.is_a?(String) + next unless HtmlCompress.compress?(doc) + HtmlCompress.compress!(doc.output) +end +``` + +The `priority: :high` is what places the plugin *before* `offlinify.rb` and `pdfify.rb` in the per-page render-hook order — both of those use the default `:normal` priority and rely on reading the final compressed `page.output`. Jekyll runs `:post_render` hooks in descending priority, so `:high` (30) fires before `:normal` (20). Without the priority annotation the order would be insertion-order across all `.rb` files in `_plugins/`, which is not a stable contract. + +## Verification + +The plugin's correctness was established by capturing the full `_site/` tree under the layout-based compression, then rebuilding with the plugin and recursive-diffing every output file: + +```sh +# With vendor/compress.html active, plugin disabled +bundle exec jekyll build +cp -r _site /tmp/baseline-site + +# With compress_html.ignore.envs: all and the plugin active +bundle exec jekyll build +diff -rq _site /tmp/baseline-site +``` + +A clean run shows zero differences across ~840 HTML pages, 290 redirect stubs, every CSS / JSON / SVG / image asset — the layout-chain gating ensures non-HTML and non-compress-layout outputs are passed through verbatim, identical to what the Liquid layout produced. + +Two regressions caught during development before the gating logic was finalised: + +- **Layout-key bug.** First version compared `cur.name` (the filename) against `"vendor/compress"` (the key without extension), so the walk never matched. `@compress_layouts` came out empty, every page was un-gated, every non-HTML file got `split(" ").join(" ")` applied, and `diff -rq` flagged 301 files (290 redirect stubs + 5 CSS + 5 CSS map files + 1 search-data.json). Fixed by iterating `each_key` and walking by key. + +- **Trailing newline.** Same baseline diff initially showed every HTML file as 1 byte short — the layout's template-trailing `\n` wasn't being re-added. Fixed by the `had_trailing_nl` guard described above. + +## Reference: the most important functions + +In source order in [`html-compress.rb`](html-compress.rb): + +- `precompute_compress_layouts!(site)` — `:site, :pre_render` entry. Walks every layout chain via `data["layout"]`, marks each layout on the path as compress-ending the moment the walk hits `vendor/compress`. Idempotent; the resulting `@compress_layouts` set persists across builds in `jekyll serve` and gets rebuilt fresh each `:pre_render`. + +- `compress?(page)` — gate check. Returns `true` when the page's `data["layout"]` is in `@compress_layouts`. Pages without a layout (jekyll-redirect-from stubs, SCSS-derived CSS, JSON-via-page-rendering, `book.html` via `book-combined`) return `false` and skip the compression entirely. + +- `compress!(content)` — the actual compression, in place. Captures the trailing-newline state, splits by `PRE_BLOCK_RE` with the capture group so pre bodies are preserved in the result array, runs `split(" ").join(" ")` on every outside-of-pre segment, joins, restores the trailing newline if needed, then mutates the input string via `String#replace`. The `replace` is what lets us hand back the same string object the caller passed in — Jekyll's writer reads `page.output` after `:post_render`, so in-place mutation is the cheapest way to update what gets written. diff --git a/docs/_plugins/html-compress.rb b/docs/_plugins/html-compress.rb new file mode 100644 index 0000000..7603a58 --- /dev/null +++ b/docs/_plugins/html-compress.rb @@ -0,0 +1,133 @@ +# frozen_string_literal: true +# +# HTML whitespace compression in Ruby, replacing what +# just-the-docs's vendor/compress.html layout does in Liquid. +# +# Why: vendor/compress.html spends ~3.5s of RENDER per build on what +# reduces -- in our config, with +# `site.compress_html.{endings,startings,comments,clippings}` all +# unset -- to a single pre-block-protected whitespace collapse: +# +# outside-of-
 text | split: " " | join: " "
+#
+# Each Liquid `split: " "` allocates an Array of every
+# whitespace-delimited token, then `join: " "` walks it back to a
+# string. Across 837 pages with thousands of tokens per page, that's
+# millions of small allocations. The same logic in Ruby via
+# `String#split(" ").join(" ")` runs in C and skips the Liquid
+# evaluator entirely.
+#
+# To activate: set `compress_html.ignore.envs: all` in _config.yml
+# (turns vendor/compress.html into a no-op passthrough). This plugin
+# then runs at `:pages, :post_render` / `:documents, :post_render`
+# with `priority :high` so the compressed output is what offlinify
+# and the Jekyll writer see.
+#
+# Layout-chain gating: only pages whose layout chain reaches
+# vendor/compress get compressed -- the same set the layout would
+# have processed. Anything else (jekyll-redirect-from stubs, CSS,
+# search-data.json, the book) is left verbatim. The set is
+# precomputed once at `:site, :pre_render`.
+#
+# Output: byte-identical to vendor/compress.html for this site's
+# configuration (verified by recursive diff of every file in
+# _site/ against a vendor/compress.html baseline snapshot).
+
+require "set"
+
+module HtmlCompress
+  # Matches a complete 
...
block, body included. Non- + # greedy + multiline so the `.*?` crosses lines. The same + # boundary recipe vendor/compress.html uses -- split content + # by
 and treat what's between as the verbatim
+  # body -- but expressed as one regex so the engine can scan
+  # the whole document in a single pass.
+  PRE_BLOCK_RE = //m.freeze
+
+  # Set of layout names whose render chain reaches
+  # `vendor/compress`. Populated once per build.
+  @compress_layouts = Set.new
+
+  # Walk every layout's chain via `data["layout"]`, marking each
+  # layout on the path as compress-ending the moment the walk hits
+  # vendor/compress. Cycles guarded by a walked-list check.
+  #
+  # Layouts are keyed by basename without extension in `site.layouts`
+  # (e.g. `"default"`, `"vendor/compress"`) and the same shape is
+  # what `page.data["layout"]` carries, so the walk operates on
+  # those keys -- not on `layout.name`, which would be the filename
+  # with extension (`"default.html"`).
+  def self.precompute_compress_layouts!(site)
+    @compress_layouts = Set.new
+    site.layouts.each_key do |name|
+      walked = []
+      cur_name = name
+      while cur_name && !walked.include?(cur_name)
+        walked << cur_name
+        if cur_name == "vendor/compress"
+          walked.each { |n| @compress_layouts << n }
+          break
+        end
+        cur = site.layouts[cur_name]
+        cur_name = cur ? cur.data["layout"] : nil
+      end
+    end
+  end
+
+  # True when `page` (or document) uses a layout chain ending in
+  # vendor/compress -- i.e. exactly the pages compress.html would
+  # have processed. Pages without a layout (jekyll-redirect-from
+  # stubs, CSS, JSON, book.html via book-combined) return false.
+  def self.compress?(page)
+    layout_name = page.data["layout"]
+    layout_name && @compress_layouts.include?(layout_name)
+  end
+
+  # Apply pre-block-aware whitespace collapse: every run of one or
+  # more ASCII whitespace characters outside a 
...
block + # is replaced by a single space; leading/trailing whitespace on + # the document is stripped. Whitespace inside
 bodies is
+  # preserved byte-for-byte.
+  def self.compress!(content)
+    # Trailing newline preservation: `split(" ")` strips trailing
+    # whitespace from the last segment, but vendor/compress.html
+    # appends a `\n` from the literal trailing whitespace in its
+    # own template source -- so its output ends with one newline
+    # regardless. Mirror that here for byte-identical output.
+    had_trailing_nl = content.end_with?("\n")
+    # Split on the pre-block regex with a capture group so the
+    # matched blocks stay in the result array, alternating with
+    # the outside-of-pre segments: [outside, pre, outside, pre,
+    # ..., outside]. Even indices are outside, odd are pre.
+    parts = content.split(/(#{PRE_BLOCK_RE})/o, -1)
+    parts.each_with_index do |part, i|
+      next if i.odd?  # leave 
 bodies verbatim
+      # `split(" ").join(" ")` matches Liquid's `split: " " | join: " "`
+      # exactly: leading/trailing whitespace stripped, every whitespace
+      # run collapsed to one space. C-implemented in MRI.
+      parts[i] = part.split(" ").join(" ")
+    end
+    result = parts.join
+    result << "\n" if had_trailing_nl && !result.end_with?("\n")
+    content.replace(result)
+  end
+end
+
+Jekyll::Hooks.register :site, :pre_render do |site|
+  HtmlCompress.precompute_compress_layouts!(site)
+end
+
+# Run before offlinify (default :normal priority) so the offline-tree
+# rewrites see the compressed page.output, and before Jekyll's
+# `:site, :post_write` writes _site/ for the same reason.
+Jekyll::Hooks.register :pages, :post_render, priority: :high do |page|
+  next unless page.output.is_a?(String)
+  next unless HtmlCompress.compress?(page)
+  HtmlCompress.compress!(page.output)
+end
+
+Jekyll::Hooks.register :documents, :post_render, priority: :high do |doc|
+  next unless doc.output.is_a?(String)
+  next unless HtmlCompress.compress?(doc)
+  HtmlCompress.compress!(doc.output)
+end
diff --git a/docs/_plugins/jekyll-gfm-admonitions-patch.rb b/docs/_plugins/jekyll-gfm-admonitions-patch.rb
index d4a7a16..b009f59 100644
--- a/docs/_plugins/jekyll-gfm-admonitions-patch.rb
+++ b/docs/_plugins/jekyll-gfm-admonitions-patch.rb
@@ -113,7 +113,8 @@
 # generators (this one included) are invisible to it. The wall-clock delta
 # we log here is the gem's full contribution to the GENERATE phase:
 # walking every collection doc and page, running the admonition regex,
-# and re-invoking the markdown converter on each admonition body.
+# and (after the patch below) splicing in HTML that defers body markdown
+# parsing to the page-level kramdown pass.
 module JekyllGFMAdmonitions
   class GFMAdmonitionConverter
     unless method_defined?(:_generate_without_timing)
@@ -125,5 +126,87 @@ def generate(site)
         Jekyll.logger.info "GFMA:", "Generator ran in #{elapsed_ms}ms."
       end
     end
+
+    # Skip the per-admonition `@markdown.convert(text)` call by leaving
+    # the body as raw markdown inside the outer alert div. The
+    # site-level kramdown config (`parse_block_html: true` and
+    # `parse_span_html: true` in _config.yml) makes the page-level
+    # kramdown pass descend into the div and parse the body markdown
+    # during RENDER, so the rendered HTML is the same as if the gem had
+    # pre-converted the body itself -- one combined parse instead of
+    # 1 + N (page + one per admonition).
+    #
+    # Two side effects of removing the inline `markdown.convert`
+    # surface as small correctness improvements over the unpatched gem:
+    #
+    #   * Backslash-escapes in body text (e.g. `**\\\\**` for a bold
+    #     pair of backslashes) no longer go through kramdown twice and
+    #     so are no longer collapsed by the second pass. The unpatched
+    #     gem's output of `\` (one backslash) becomes
+    #     `\\` (two backslashes, what the source asks
+    #     for). Pages affected: any with `\\\\` inside an admonition
+    #     body -- on this site, `Reference/Core/RightShift.md` and a
+    #     handful of others.
+    #
+    #   * Code blocks that follow an admonition with just one blank
+    #     line between them are no longer eaten by the gem's code-block
+    #     stash regex (see `process_doc` override below). The unpatched
+    #     gem's stash regex `(?:^|\n)(?)\s*```.*?```/m` consumes the
+    #     blank line, which pulls the placeholder into the admonition
+    #     body capture, which lets kramdown render it as an empty
+    #     `` element and
+    #     prevents the restore step from finding it. Net effect on
+    #     the unpatched gem: the code block disappears from the
+    #     rendered HTML. The override below preserves the leading
+    #     newline(s) so the placeholder stays on its own line outside
+    #     the admonition body capture.
+    #
+    # The body text is bracketed by blank lines so kramdown reads it as
+    # an independent paragraph rather than tangling with the preceding
+    # `

...

` block. The outer div + # carries `markdown='1'` so kramdown's HTML-block parser keeps the + # whole `
...
` as a single block even though it spans + # blank lines internally. + unless method_defined?(:_admonition_html_without_deferred_body) + alias_method :_admonition_html_without_deferred_body, :admonition_html + def admonition_html(type, title, text, icon) + "
\n" \ + "

#{icon} #{title}

\n\n" \ + "#{text}\n" \ + "
" + end + end + + # Override `process_doc` to fix the code-block stash so that the + # placeholder substitution preserves the leading newline(s) that + # separated the code block from the preceding text. Without this + # adjustment, the gem's gsub eats the blank line before the code + # block, which causes the placeholder to be appended to the last + # admonition body line and then dragged into the body capture by + # the admonition regex's `[^\n]*` body-line pattern. + # + # The body of the method otherwise mirrors the upstream gem + # verbatim (see jekyll-gfm-admonitions 1.2.0, + # `lib/jekyll-gfm-admonitions.rb#process_doc`). + unless method_defined?(:_process_doc_without_leading_ws_preserve) + alias_method :_process_doc_without_leading_ws_preserve, :process_doc + def process_doc(doc) + return if doc.content.empty? + doc.content = doc.content.dup unless doc.content.frozen? + + code_blocks = [] + doc.content.gsub!(/(?:^|\n)(?)\s*```.*?```/m) do |match| + code_blocks << match + leading = match[/\A\s+/] || "" + "#{leading}```{{CODE_BLOCK_#{code_blocks.length - 1}}}```" + end + + convert_admonitions(doc) + + doc.content.gsub!(/```\{\{CODE_BLOCK_(\d+)}}```/) do + code_blocks[::Regexp.last_match(1).to_i] + end + end + end end end diff --git a/docs/_plugins/nav-path.rb b/docs/_plugins/nav-path.rb new file mode 100644 index 0000000..336cd87 --- /dev/null +++ b/docs/_plugins/nav-path.rb @@ -0,0 +1,39 @@ +# frozen_string_literal: true + +# Populates `page.data["nav_path"]` on every page that has a `title`. +# The nav path is the slash-joined `grand_parent / parent / title` +# chain that just-the-docs uses to build its sidebar tree. It is the +# selector used by book.yml's `nav_prefixes:` option in book.html -- +# a way to sweep pages into a chapter / part by their position in the +# nav tree rather than by URL prefix. +# +# Example: `Reference/Operators.md` has `parent: Reference Section` +# and `title: Operators`, so its nav_path is +# "Reference Section/Operators". The individual operator pages under +# `/tB/Core/` carry `parent: Operators, grand_parent: Reference Section`, +# so their nav_paths are "Reference Section/Operators/AddressOf" etc. +# A book.yml entry with `nav_prefixes: [Reference Section/Operators]` +# therefore sweeps in the Operators index plus every operator page +# without having to enumerate the /tB/Core/* URLs one by one. +# +# Runs in the GENERATE phase so the populated field is available to +# `book.html` (and any other template) during RENDER. + +module Jekyll + class NavPathGenerator < Generator + safe true + priority :low + + def generate(site) + site.pages.each do |page| + title = page["title"] + next unless title + parts = [] + parts << page["grand_parent"] if page["grand_parent"] + parts << page["parent"] if page["parent"] + parts << title + page.data["nav_path"] = parts.join("/") + end + end + end +end diff --git a/docs/_plugins/pdfify.md b/docs/_plugins/pdfify.md index 112464e..f157239 100644 --- a/docs/_plugins/pdfify.md +++ b/docs/_plugins/pdfify.md @@ -1,6 +1,6 @@ # Pdfify -`_plugins/pdfify.rb` produces the sparse `_site-pdf/` tree that pagedjs-cli consumes when rendering the book PDF. After Jekyll finishes writing the online site (and after `offlinify.rb` has run, if it is active), this plugin copies just the files pagedjs needs — `book.html`, the two stylesheets the book layout links, and every image `book.html` references — into a sibling directory at `-pdf/`. The result is ~14 MB instead of the ~130 MB the online tree would carry, and one `ls` says exactly what pagedjs sees. +`_plugins/pdfify.rb` produces the sparse `_site-pdf/` tree that pagedjs-cli consumes when rendering the book PDF. Across four Jekyll hooks the plugin captures `book.html`'s rendered output in memory, removes its page from `site.pages` so Jekyll never writes it to `_site/`, and post-write copies the captured bytes — plus the two stylesheets the book layout links and every image `book.html` references — into a sibling directory at `-pdf/`. The result is ~14 MB instead of the ~130 MB the online tree would carry, and one `ls` says exactly what pagedjs sees. This file sits in `_plugins/` for the same reasons as `offlinify.md`: it lives next to the code it documents, and Jekyll's `_plugins/` folder is plugin-only territory, so this Markdown never gets rendered into the public site. @@ -10,32 +10,53 @@ The book's render path is narrow. pagedjs-cli only ever opens `_site-pdf/book.ht The previous approach was a second Jekyll build, configured by a `_config-pdf.yml` overlay, that wrote a complete second site tree into `_site-pdf/` with every page rendered through a minimal `book` layout. ~1300 per-page HTML files, none of which pagedjs ever opened. Two builds in series ran ~30s of Jekyll just to satisfy one consumer. Both the overlay and the second-tier layout were retired when this plugin landed. -A sparse copy also keeps the PDF source honest. Every file in `_site-pdf/` is one pagedjs reads; if you add a `` to a chapter and it doesn't show up in the rendered PDF, the missing file in the sparse tree is the breadcrumb. +A sparse copy also keeps the PDF source honest. Every file in `_site-pdf/` is one pagedjs reads; if you add a `` to a chapter and it doesn't show up in the rendered PDF, the missing file in the sparse tree indicates a problem. ## When it runs -Activated by `also_build_pdf: true` (the default in `_config.yml`). Reads from `site.dest` (i.e. `_site/`) and writes to `-pdf/`. The hook at the bottom of `pdfify.rb`: +Activated by `also_build_pdf: true` (the default in `_config.yml`). Reads each rendered page's `page.output` in memory; reads stylesheets and images from `site.dest` (i.e. `_site/`) after WRITE; writes to `-pdf/`. Four hooks at the bottom of `pdfify.rb`, mirroring `offlinify.rb`'s shape: ```ruby +Jekyll::Hooks.register :site, :pre_render do |site| + next unless site.config["also_build_pdf"] + Pdfify.setup(site) +end + +Jekyll::Hooks.register :pages, :post_render do |page| + Pdfify.maybe_capture(page) +end + +Jekyll::Hooks.register :site, :post_render do |site| + Pdfify.remove_book_page(site) +end + Jekyll::Hooks.register :site, :post_write do |site| next unless site.config["also_build_pdf"] - Pdfify.run(site, site.dest, "#{site.dest}-pdf") + Pdfify.run(site, "#{site.dest}-pdf") end ``` -One Jekyll invocation produces `_site/`, `_site-offline/` (via `offlinify.rb`), and `_site-pdf/` (this plugin). Flip the flag to `false` if you only want the online site, or if you want offline-but-no-PDF. +The two `:site` hooks gate on `also_build_pdf` directly; the two render hooks gate internally via the `@enabled` flag `setup()` flips on, so when the plugin is disabled they no-op without examining `page.url`. + +One Jekyll invocation produces `_site/`, `_site-offline/` (via `offlinify.rb`), and `_site-pdf/` (this plugin). Flip the flag to `false` if you only want the online site, or if you want offline-but-no-PDF — `book.html` then renders as a normal (large) page on `_site/` and the `offline_exclude` entry still keeps it out of `_site-offline/`. ## The build flow -After Jekyll's WRITE phase completes, the hook fires `Pdfify.run(site, source_root, dest_root)`, which does the following: +Each hook contributes one piece of state to the others; `run()` does the actual file copies post-write. + +1. **`Pdfify.setup(site)`** at `:site, :pre_render`. Flips `@enabled` on and clears `@captured`. The hook itself is the gate on `also_build_pdf`, so reaching `setup()` already means the plugin is on for this build. Both state variables persist across builds in `jekyll serve`; `setup()` resets `@captured` so a previous build's bytes can't leak into the current one. -1. **Locate `book.html`.** If `/book.html` doesn't exist (the `book.html` page didn't render, or was excluded from the build), emit a warning and skip the rest. This particular condition is treated as a warning rather than a fatal — strict mode (step 8) only fires for missing image references inside an otherwise-present `book.html`. +2. **`Pdfify.maybe_capture(page)`** at `:pages, :post_render`, once per page Jekyll renders. When `page.url == "/book.html"`, stash `page.output.dup` in `@captured`. Every other page is a one-comparison no-op. The dup is defensive — Jekyll doesn't mutate `page.output` after `:post_render`, but a future plugin running later in the chain might, and `@captured` needs to survive untouched until `run()` post-write. -2. **Wipe and recreate `/`.** Unlike `offlinify.rb`, which empties the directory contents but keeps the directory itself in place to keep the jekyll-watcher happy, `pdfify.rb` deletes the whole tree. The PDF pass doesn't need watcher friendliness — nobody runs `jekyll serve` and refreshes a `_site-pdf/` page in their browser. The wipe is to ensure no stale images linger after source pages are deleted or renamed. +3. **`Pdfify.remove_book_page(site)`** at `:site, :post_render`. Drops `/book.html` from `site.pages` via `site.pages.reject!`. This fires *after* every `:pages, :post_render` hook (so offlinify has already seen book.html and skipped it via `offline_exclude`) and *before* Jekyll's WRITE phase iterates `site.pages`, so mutating the array here is safe and means `_site/book.html` never gets written. When the page didn't render (`@captured` is still `nil`), the removal step is skipped — `run()` then warns and bails. -3. **Copy `book.html`** verbatim into the destination. The plugin doesn't rewrite anything inside `book.html`; relative paths like `Features/Images/foo.png` resolve correctly because the destination tree mirrors the source layout exactly. +4. **`Pdfify.run(site, dest_root)`** at `:site, :post_write`, the main copy pass. The remaining steps below all live here. -4. **Copy `REQUIRED_CSS`.** Two files in fixed positions: +5. **Wipe and recreate `/`.** Unlike `offlinify.rb`, which empties the directory contents but keeps the directory itself in place to keep the jekyll-watcher happy, `pdfify.rb` deletes the whole tree. The PDF pass doesn't need watcher friendliness — nobody runs `jekyll serve` and refreshes a `_site-pdf/` page in their browser. The wipe is to ensure no stale images linger after source pages are deleted or renamed. + +6. **Write `book.html`** from the captured bytes. `File.binwrite(dest/book.html, @captured)`. The plugin doesn't rewrite anything inside `book.html`; relative paths like `Features/Images/foo.png` resolve correctly because the destination tree mirrors the source layout exactly. + +7. **Copy `REQUIRED_CSS`** from `_site/`. Two files in fixed positions: ``` assets/css/print.css the book design (page geometry, typography, code blocks, …) @@ -44,29 +65,36 @@ After Jekyll's WRITE phase completes, the hook fires `Pdfify.run(site, source_ro Each is copied if present and warned about if missing. A missing stylesheet doesn't fail the build — pagedjs will render with default styles, which is a useful "the build is structurally OK, the asset just slipped" signal. -5. **Extract and copy every relative `` target.** Scan `book.html` with `IMG_SRC_RE` (see [What gets copied](#what-gets-copied) for the regex), deduplicate, and copy each one. The regex skips `` and `
` blocks so syntax-highlighted code samples that happen to contain `src="…"` text don't generate spurious entries. Source files that don't exist on disk are collected into a `missing_paths` list for the strict-mode step below.
+8. **Extract and copy every relative `` target.** Scan the captured `book.html` with `IMG_SRC_RE` (see [What gets copied](#what-gets-copied) for the regex), deduplicate, and copy each one from `_site/` (Jekyll's asset pipeline has written the images there during WRITE). The regex skips `` and `
` blocks so syntax-highlighted code samples that happen to contain `src="…"` text don't generate spurious entries. Source files that don't exist on disk are collected into a `missing_paths` list for the strict-mode step below.
 
-6. **Delete `/book.html`.** The concatenated document exists in `_site/` only as a hand-off between Jekyll's render pass and this plugin — it's not a public page on the online site. The companion exclusion in `_config.yml` (`offline_exclude: [..., book.html]`) keeps `offlinify.rb` from copying it into `_site-offline/`; the delete here clears it from `_site/` itself. The two safeguards are independent: the exclude pattern fires whether `offlinify.rb` walks `_site/` before or after pdfify's delete (and still applies when `also_build_pdf: false`, when pdfify never runs at all), and pdfify's delete fires whether or not offlinify is enabled. No hook-ordering assumption is required.
-
-7. **Log per-path errors, then the summary.** Each entry in `missing_paths` is logged at `error` level as its own line (`Pdfify: missing image X (referenced from book.html, not present under _site/)`), then the one-line summary:
+9. **Log per-path errors, then the summary.** Each entry in `missing_paths` is logged at `error` level as its own line (`Pdfify: missing image X (referenced from book.html, not present under _site/)`), then the one-line summary:
 
    ```
    Pdfify: wrote .../_site-pdf -- copied 86 file(s) (88 image(s), 2 missing)
    ```
 
-   The "missing" portion is suppressed entirely when every image resolved. The counter is a real bug signal — every miss is an `` in source markdown that points at a path Jekyll didn't write, and the rendered PDF will have a broken-image placeholder in that spot. The code/pre regex skip (step 5) is what makes this contract honest: a non-zero count is always a real broken reference, never a syntax-highlighter artefact.
+   The "missing" portion is suppressed entirely when every image resolved. The counter is a real bug signal — every miss is an `` in source markdown that points at a path Jekyll didn't write, and the rendered PDF will have a broken-image placeholder in that spot. The code/pre regex skip (step 8) is what makes this contract honest: a non-zero count is always a real broken reference, never a syntax-highlighter artefact.
+
+10. **Strict mode.** A non-zero missing count aborts `jekyll build` via `Jekyll::Errors::FatalException` (exit code 1, CI-gating). `jekyll serve` keeps going — the per-path errors and summary are still logged, but the dev preview stays alive so a mid-edit save that temporarily breaks an image reference doesn't kill the watcher. The two modes are distinguished via `site.config["serving"]`, which Jekyll's own `commands/build.rb` sets to `false` and `commands/serve.rb` sets to `true`.
 
-8. **Strict mode.** A non-zero missing count aborts `jekyll build` via `Jekyll::Errors::FatalException` (exit code 1, CI-gating). `jekyll serve` keeps going — the per-path errors and summary are still logged, but the dev preview stays alive so a mid-edit save that temporarily breaks an image reference doesn't kill the watcher. The two modes are distinguished via `site.config["serving"]`, which Jekyll's own `commands/build.rb` sets to `false` and `commands/serve.rb` sets to `true`.
+## The `offline_exclude` entry
+
+`offline_exclude: [book.html]` in `_config.yml` is still required even though `_site/book.html` no longer exists when `also_build_pdf: true`. The reason is twofold:
+
+- **When `also_build_pdf: true`.** Offlinify's `:pages, :post_render` hook fires for `book.html` *before* pdfify's `:site, :post_render` removes the page from `site.pages` — Jekyll fires every per-page render hook before any `:site, :post_render` hook. So during offlinify's per-page pass `book.html` is still a live page in the site model, and the `offline_exclude` check is what makes offlinify's `process_page` skip writing `_site-offline/book.html`.
+- **When `also_build_pdf: false`.** Pdfify never runs at all, `book.html` renders as a normal page on `_site/`, and the exclude is what keeps it out of `_site-offline/`.
+
+Plugin-hook registration order across separate `.rb` files in `_plugins/` is not guaranteed, so the safety guarantee from the exclude is independent of which plugin Jekyll happens to load first.
 
 ## What gets copied
 
 Three categories of file, in this order:
 
-| Category | Source path | Destination path |
-|----------|-------------|------------------|
-| The book itself | `_site/book.html` | `_site-pdf/book.html` |
+| Category | Source | Destination path |
+|----------|--------|------------------|
+| The book itself | captured `page.output` bytes for `/book.html` (in memory) | `_site-pdf/book.html` |
 | Stylesheets | `_site/assets/css/print.css`, `_site/assets/css/rouge.css` | same, mirrored |
-| Images | each unique `` in `book.html`, with `X` resolved against the destination root | `_site-pdf/X` |
+| Images | each unique `` in `book.html`, resolved against `_site/` | `_site-pdf/X` |
 
 The image regex:
 
@@ -85,7 +113,7 @@ Three top-level alternatives, matched against the full document with the `m` fla
 
    Group 1 is the quote character (so the trailing quote in the pattern matches the same character), group 2 is the URL.
 
-`extract_image_paths` calls `String#scan` with this regex and inspects each yielded match: when group 1 is nil the match is a code/pre branch and gets skipped; otherwise group 2 is the URL. Query strings and fragments are stripped (`File.file?` would choke on them), then paths are deduplicated via a `Set`. The destination layout mirrors source paths exactly, so an `` reference inside `_site-pdf/book.html` resolves to `_site-pdf/Features/Images/foo.png` — the same shape the source `_site/book.html` had against `_site/` before pdfify deleted it.
+`extract_image_paths` calls `String#scan` with this regex and inspects each yielded match: when group 1 is nil the match is a code/pre branch and gets skipped; otherwise group 2 is the URL. Query strings and fragments are stripped (`File.file?` would choke on them), then paths are deduplicated via a `Set`. The destination layout mirrors source paths exactly, so an `` reference inside `_site-pdf/book.html` resolves to `_site-pdf/Features/Images/foo.png` — the same shape `_site//index.html` references its sibling assets.
 
 ## File layout
 
@@ -93,7 +121,7 @@ The PDF build touches the following files:
 
 | Path | Role |
 |------|------|
-| `docs/_plugins/pdfify.rb` | The plugin. Hooks `:site, :post_write`, runs the copy passes. |
+| `docs/_plugins/pdfify.rb` | The plugin. Hooks `:site, :pre_render` → `:pages, :post_render` → `:site, :post_render` → `:site, :post_write`. |
 | `docs/_plugins/pdfify.md` | This file. |
 | `docs/_config.yml` | `also_build_pdf: true` (default-on) and `exclude: [_site-pdf]` (keeps Jekyll's watcher from rebuilding on the plugin's own output). |
 | `docs/book.html` | The page rendered into `_site-pdf/book.html` (via `_layouts/book-combined.html`). Contains the iterator that concatenates every chapter into one HTML document. |
@@ -105,9 +133,9 @@ The PDF build touches the following files:
 
 ## Failure modes
 
-The plugin surfaces several conditions:
+The plugin can indicate several conditions:
 
-- **`book.html` missing.** `no .../_site/book.html found; skipping (did the book.html page render?)`. Most likely cause is the page was excluded from the build (e.g. via a temporary `published: false` in its frontmatter) or its `permalink:` was changed away from `/book.html`. The plugin emits a warning and skips the whole pass; the rest of the build is unaffected.
+- **`book.html` never rendered.** `no /book.html page rendered; skipping (did its frontmatter change?)`. Most likely cause is the page was excluded from the build (e.g. via a temporary `published: false` in its frontmatter) or its `permalink:` was changed away from `/book.html`, so `:pages, :post_render` never fired with a matching URL and `@captured` stayed `nil`. The plugin emits a warning at `run()` and skips the whole pass; the rest of the build is unaffected.
 
 - **Missing required CSS.** `missing required asset assets/css/print.css; pagedjs render may break`. Means the SCSS pipeline or a sass-converter step didn't write the expected output. The plugin emits a warning but continues copying other files — the PDF render will fall back to browser defaults for that stylesheet's rules.
 
@@ -121,8 +149,9 @@ The plugin surfaces several conditions:
 
 In source order in [`pdfify.rb`](pdfify.rb):
 
-- `run(site, source_root, dest_root)` — orchestrator. Locates `book.html`, wipes the destination, copies `book.html`, the two stylesheets, and every referenced image; references that don't resolve on disk go into a `missing_paths` list. Logs per-path errors, then the summary, then the timing line. Under `jekyll build` (`site.config["serving"]` is false) a non-empty `missing_paths` raises `Jekyll::Errors::FatalException`; under `jekyll serve` (`serving` is true) it returns normally.
+- `setup(_site)` — `:site, :pre_render` entry. Flips `@enabled` on and clears `@captured`. The hook above gates on `also_build_pdf`, so reaching `setup()` means the plugin is on for this build.
+- `maybe_capture(page)` — `:pages, :post_render` entry. When `page.url == "/book.html"`, stash `page.output.dup` in `@captured`. One URL comparison per rendered page when enabled; immediate no-op when disabled.
+- `remove_book_page(site)` — `:site, :post_render` entry. `site.pages.reject! { |p| p.url == "/book.html" }`. Runs after every per-page render hook and before WRITE, so mutating `site.pages` is safe; means `_site/book.html` never gets written.
+- `run(site, dest_root)` — `:site, :post_write` entry. Bails with a warning if `@captured` is `nil`. Otherwise wipes the destination, writes the captured bytes as `book.html`, copies the two stylesheets, and copies every referenced image; references that don't resolve on disk go into a `missing_paths` list. Logs per-path errors, then the summary, then the timing line. Under `jekyll build` (`site.config["serving"]` is false) a non-empty `missing_paths` raises `Jekyll::Errors::FatalException`; under `jekyll serve` (`serving` is true) it returns normally. Clears `@captured` before returning.
 - `extract_image_paths(html)` — scans `book.html` for the three-alternative `IMG_SRC_RE` regex, skips matches whose first capture group is nil (the `` / `
` branches), strips query/fragment off each `src=` URL, deduplicates via a `Set`, returns the unique paths in document order.
 - `copy_file(src, dst)` — `FileUtils.mkdir_p` + `FileUtils.cp`. The whole copy path is two lines because the source and destination layouts match by construction.
-
-The whole plugin is ~220 lines of Ruby, ~half of which is doc comments.
diff --git a/docs/_plugins/pdfify.rb b/docs/_plugins/pdfify.rb
index e02421d..49eebea 100644
--- a/docs/_plugins/pdfify.rb
+++ b/docs/_plugins/pdfify.rb
@@ -5,10 +5,8 @@
 require "set"
 
 # Produces the sparse `_site-pdf/` tree that pagedjs-cli consumes when
-# rendering the PDF book. Runs at `:site, :post_write` -- after Jekyll
-# has written `_site/` (and after `offlinify.rb` has run, if it is
-# active). Combined with the offline plugin this means one Jekyll
-# invocation produces three trees:
+# rendering the PDF book. Combined with the offline plugin this means
+# one Jekyll invocation produces three trees:
 #
 #   _site/          -- the online site
 #   _site-offline/  -- the offline mirror with file://-resolvable URLs
@@ -29,27 +27,56 @@
 # whole second Jekyll build, layout-changed, into `_site-pdf/`). That
 # pass produced ~1300 per-page HTML files that pagedjs never opened.
 #
+# === Hook flow ===
+#
+# Four hooks, mirroring offlinify's shape:
+#
+#   :site, :pre_render    -- setup(): flip @enabled on (gated at the
+#                            hook level by `also_build_pdf`); clear
+#                            @captured.
+#   :pages, :post_render  -- maybe_capture(): when the rendered page
+#                            is /book.html, stash its `page.output`
+#                            bytes in @captured. No I/O.
+#   :site, :post_render   -- remove_book_page(): drop /book.html from
+#                            `site.pages` before Jekyll's WRITE phase
+#                            iterates it, so the concatenated document
+#                            never lands in _site/. Runs after every
+#                            per-page hook has fired -- offlinify has
+#                            already seen book.html and skipped it
+#                            via `offline_exclude` -- and before WRITE.
+#   :site, :post_write    -- run(): wipe _site-pdf/, write @captured
+#                            as _site-pdf/book.html, copy the two
+#                            stylesheets, and copy every image
+#                            book.html references (sourced from
+#                            _site/ -- Jekyll's asset pipeline has
+#                            written them there during WRITE).
+#
+# Capturing in :pages, :post_render and removing in :site, :post_render
+# replaces an earlier flow that read _site/book.html via binread and
+# then deleted it post-write. The new flow makes one fewer disk
+# round-trip and means /book.html is never a live URL on the online
+# site at any point during the build.
+#
 # === What gets copied ===
 #
-#   book.html              copied verbatim from /book.html
+#   book.html              the captured page.output bytes
 #   assets/css/print.css   the book design
 #   assets/css/rouge.css   the syntax-highlighter theme
 #    targets     every relative image path inside book.html,
 #                          resolved against book.html's directory
 #
-# Anything else in `_site/` is not part of the PDF render path and is
-# skipped. The output tree mirrors the source paths exactly so book.html
+# The destination tree mirrors the source paths exactly so book.html
 # can stay byte-identical -- no URL rewriting is needed.
 #
-# After the copy, `/book.html` is deleted: the concatenated
-# document is a build artifact for this plugin alone, not a public page
-# on the online site. The `offline_exclude` entry in _config.yml keeps
-# it out of the offline tree independently. The two safeguards do not
-# rely on each other: the exclude pattern fires whether `offlinify.rb`
-# walks _site/ before or after pdfify's delete (and works even when
-# `also_build_pdf: false`, when pdfify never runs at all), and pdfify's
-# delete fires whether or not offlinify is enabled. No hook ordering
-# is assumed.
+# === The offline_exclude entry ===
+#
+# `offline_exclude: [book.html]` in _config.yml is still required.
+# When `also_build_pdf: true`, offlinify's :pages, :post_render hook
+# fires for book.html before pdfify's :site, :post_render removes it
+# from site.pages, so the exclude is what makes offlinify skip it.
+# When `also_build_pdf: false`, pdfify never runs at all, book.html
+# is a regular page on _site/, and the exclude is what keeps it out
+# of _site-offline/.
 #
 # === Strict mode ===
 #
@@ -65,14 +92,18 @@
 #
 # === Compatibility ===
 #
-# Reads `site.dest`, `site.config['also_build_pdf']`, and
-# `site.config['serving']`. Writes a fresh `-pdf/` tree
-# (wiping any prior contents). Touches no files outside that.
+# Reads `site.config['also_build_pdf']` and `site.config['serving']`,
+# plus each rendered page's `page.output` in memory. Mutates
+# `site.pages` once at :site, :post_render to suppress _site/book.html.
+# Writes a fresh `-pdf/` tree (wiping any prior contents).
+# Touches no files outside _site-pdf/.
 #
 # If the plugin is removed: `_site-pdf/` is no longer produced and
 # `book.bat` would fail until either (a) this plugin is restored or
-# (b) `book.bat` is pointed at `_site/book.html` directly. `_site/` is
-# unaffected.
+# (b) `book.bat` is pointed at `_site/book.html` directly. With this
+# plugin gone, book.html would render as a normal (large) page on
+# the online site; the `offline_exclude` entry would still keep it
+# out of _site-offline/.
 
 module Pdfify
   # Three-alternative regex, matched against the full document with
@@ -108,16 +139,56 @@ module Pdfify
     assets/css/rouge.css
   ].freeze
 
-  def self.run(site, source_root, dest_root)
-    source = Pathname.new(source_root)
-    dest   = Pathname.new(dest_root)
+  # The URL of the page we capture and suppress.
+  BOOK_URL = "/book.html"
+
+  # @enabled flips on at :site, :pre_render when also_build_pdf is
+  # true. @captured holds book.html's rendered output once the
+  # per-page hook has stashed it; nil until then, set back to nil
+  # after run() consumes it.
+  @enabled = false
+  @captured = nil
+
+  # `:site, :pre_render` entry. Only invoked when `also_build_pdf`
+  # is true (the hook gates on the config), so reaching here means
+  # the plugin is on for this build.
+  def self.setup(_site)
+    @enabled = true
+    @captured = nil
+  end
+
+  # `:pages, :post_render` entry. Stashes the rendered HTML of
+  # /book.html in @captured for run() to pick up post-write. No-op
+  # on every other page and when pdfify is disabled.
+  def self.maybe_capture(page)
+    return unless @enabled
+    return unless page.url == BOOK_URL
+    @captured = page.output.dup
+  end
+
+  # `:site, :post_render` entry. Drops /book.html from `site.pages`
+  # so Jekyll's WRITE phase doesn't write it to _site/. Runs after
+  # every :pages, :post_render hook has fired (offlinify has already
+  # seen book.html and skipped it via offline_exclude), and before
+  # the WRITE phase iterates site.pages -- so mutating site.pages
+  # here is safe. No-op when pdfify is disabled or when /book.html
+  # never rendered (in which case run() will warn).
+  def self.remove_book_page(site)
+    return unless @enabled && @captured
+    site.pages.reject! { |p| p.url == BOOK_URL }
+  end
 
-    book_src = source.join("book.html")
-    unless book_src.file?
-      Jekyll.logger.warn "Pdfify:", "no #{book_src} found; skipping (did the book.html page render?)"
+  def self.run(site, dest_root)
+    return unless @enabled
+
+    unless @captured
+      Jekyll.logger.warn "Pdfify:", "no #{BOOK_URL} page rendered; skipping (did its frontmatter change?)"
       return
     end
 
+    source = Pathname.new(site.dest)
+    dest   = Pathname.new(dest_root)
+
     start_time = Process.clock_gettime(Process::CLOCK_MONOTONIC)
 
     # Wipe the destination tree so previous runs do not leave stale
@@ -125,10 +196,12 @@ def self.run(site, source_root, dest_root)
     FileUtils.rm_rf(dest)
     FileUtils.mkdir_p(dest)
 
-    html = book_src.binread
+    html = @captured
 
     copied = 0
-    copy_file(book_src, dest.join("book.html"))
+    book_dst = dest.join("book.html")
+    FileUtils.mkdir_p(book_dst.dirname)
+    File.binwrite(book_dst, html)
     copied += 1
 
     REQUIRED_CSS.each do |rel|
@@ -153,16 +226,6 @@ def self.run(site, source_root, dest_root)
       end
     end
 
-    # book.html exists in source/ (the online _site/) only as a
-    # build artifact for this plugin -- it's not a page on the
-    # published site and it isn't part of the offline tree (the
-    # `offline_exclude` entry in _config.yml keeps offlinify from
-    # copying it). Remove it now that we've consumed it, so a stale
-    # copy doesn't sit under _site/ between builds and so a serve-
-    # mode `localhost:4000/book.html` correctly 404s instead of
-    # leaking the concatenated document.
-    book_src.delete
-
     # Per-path error logs first so the build log reads details-then-
     # summary. The code/pre rejection in extract_image_paths means
     # any entry here is a real broken reference -- a markdown image
@@ -176,6 +239,8 @@ def self.run(site, source_root, dest_root)
     elapsed_ms = ((Process.clock_gettime(Process::CLOCK_MONOTONIC) - start_time) * 1000).round(0)
     Jekyll.logger.info "Pdfify:", "Pdfifier ran in #{elapsed_ms}ms."
 
+    @captured = nil
+
     # `jekyll build` aborts on a non-zero missing count (CI gating).
     # `jekyll serve` keeps the dev preview alive -- a mid-edit save
     # that temporarily breaks an image reference shouldn't kill the
@@ -217,7 +282,20 @@ def self.copy_file(src, dst)
   end
 end
 
+Jekyll::Hooks.register :site, :pre_render do |site|
+  next unless site.config["also_build_pdf"]
+  Pdfify.setup(site)
+end
+
+Jekyll::Hooks.register :pages, :post_render do |page|
+  Pdfify.maybe_capture(page)
+end
+
+Jekyll::Hooks.register :site, :post_render do |site|
+  Pdfify.remove_book_page(site)
+end
+
 Jekyll::Hooks.register :site, :post_write do |site|
   next unless site.config["also_build_pdf"]
-  Pdfify.run(site, site.dest, "#{site.dest}-pdf")
+  Pdfify.run(site, "#{site.dest}-pdf")
 end
diff --git a/docs/_plugins/seo-precompute.rb b/docs/_plugins/seo-precompute.rb
new file mode 100644
index 0000000..b6b6c2f
--- /dev/null
+++ b/docs/_plugins/seo-precompute.rb
@@ -0,0 +1,188 @@
+# frozen_string_literal: true
+
+# Precomputes per-page SEO values that `_includes/head_seo.html`
+# previously derived in Liquid via the `markdownify | strip_html |
+# normalize_whitespace | escape_once` pipeline + `absolute_url` +
+# `uri_escape`.
+#
+# === Problem ===
+#
+# `head_seo.html` rendered ~837 times per build. On every render it
+# ran two `markdownify` filter chains (`page.title` and `site.title`),
+# one `absolute_url` for the canonical URL, plus the same chain again
+# for `site.logo`. ~1,674 of the 1,802 `Jekyll::Filters#markdownify`
+# filter invocations across the whole build came from this template,
+# costing ruby-prof ~4.0 s of `Liquid::Strainer#invoke` time. The
+# kramdown converter's own work hits Jekyll's internal cache for the
+# repeated `site.title` input but still pays the Liquid filter
+# dispatch and cache-lookup overhead per call.
+#
+# Of the 836 page titles on the site, only 2 (`*, *=` and `\, \=`)
+# contain markdown-active characters. The other 834 paths through
+# `markdownify | strip_html | normalize_whitespace | escape_once`
+# reduce to a straight `escape_once(title)` -- the markdownify step
+# wraps the text in a `

` tag that `strip_html` immediately +# removes, `normalize_whitespace` collapses no internal whitespace, +# and `escape_once` HTML-escapes the same handful of characters. +# Pulling all of that into one Ruby pass at `:site, :pre_render` +# pays the kramdown / regex cost once per unique title in a tight +# loop instead of via the Liquid dispatch path 1,674 times per +# build. +# +# === Approach === +# +# At `:site, :pre_render`, walk every page and stash these +# precomputed values on `page.data`: +# +# _seo_page_title -- markdownify-pipeline output of `page.title`, +# or the site title if `page.title` is empty. +# _seo_full_title -- " | ", collapsing to +# just the page title when the two match. +# _seo_canonical -- `page.url` with `/index.html` stripped, then +# `absolute_url`'d via the same Addressable +# normalisation Jekyll's URLFilters uses. +# _seo_is_home -- boolean: page is the homepage / about page +# in the small fixed list jekyll-seo-tag +# recognises (the JSON-LD `@type` toggles +# between WebSite and WebPage on this flag). +# +# Plus on site.config: +# +# _seo_site_title -- markdownify-pipeline output of `site.title`, +# constant across the build. +# _seo_logo_url -- `absolute_url(site.logo) | uri_escape`, +# constant across the build. +# +# `head_seo.html` then reads these as `page._seo_*` (PageDrop's +# `fallback_data` resolves the keys against `page.data`) and +# `site._seo_*` (SiteDrop's fallback resolves against `site.config`). +# +# Filter logic mirrors the standard Liquid / Jekyll implementations +# byte-for-byte: `Liquid::StandardFilters::STRIP_HTML_BLOCKS` / +# `STRIP_HTML_TAGS` / `HTML_ESCAPE_ONCE_REGEXP` / `HTML_ESCAPE` +# constants are pulled by reference so the strip and escape steps use +# the same regex objects Liquid would. `absolute_url` follows the +# `Jekyll::Filters::URLFilters#absolute_url` recipe -- parse, fall +# back to `relative_url` when `site.url` is empty, otherwise +# `Addressable::URI.parse(site_url + rel).normalize.to_s`. +# `uri_escape` is the Jekyll filter's one-liner +# `Addressable::URI.normalize_component`. +# +# === When it runs === +# +# `:site, :pre_render`, the same phase as +# `_plugins/book-resolve-chapters.rb`. By that point Jekyll has read +# all pages and run every Generator, so `page.url` and `page.data` +# are populated. `head_seo.html` itself doesn't render until later +# in the RENDER phase, so the precomputed keys are visible by then. +# +# === Verification === +# +# Byte-identical output is the bar -- `diff -rq` clean against a +# pre-precompute snapshot of `_site/` / `_site-offline/` / +# `_site-pdf/`. Any divergence in the filter logic above (e.g. an +# `Addressable::URI` version that normalises differently) would show +# up as a counted mismatch in the diff, not as a silent regression. + +require "addressable/uri" +require "liquid" +require "set" + +module Jekyll + module SeoPrecompute + extend self + + # URLs that jekyll-seo-tag's `HOMEPAGE_OR_ABOUT_REGEX` matches + # (the gem uses a regex; Liquid has no regex match operator so + # head_seo.html was enumerating the six values inline -- this + # `Set#include?` lookup replaces the chain of `or` comparisons). + HOMEPAGE_URLS = Set[ + "/", "/index.html", "/index.htm", + "/about/", "/about/index.html", "/about/index.htm" + ].freeze + + STRIP_HTML_BLOCKS = Liquid::StandardFilters::STRIP_HTML_BLOCKS + STRIP_HTML_TAGS = Liquid::StandardFilters::STRIP_HTML_TAGS + HTML_ESCAPE_ONCE_REGEXP = Liquid::StandardFilters::HTML_ESCAPE_ONCE_REGEXP + HTML_ESCAPE = Liquid::StandardFilters::HTML_ESCAPE + + def precompute!(site) + markdown = site.find_converter_instance(Jekyll::Converters::Markdown) + + site_title = render_title(site.config["title"], markdown) + site.config["_seo_site_title"] = site_title + + logo = site.config["logo"] + site.config["_seo_logo_url"] = + logo ? uri_escape(absolute_url(logo, site)) : nil + + site.pages.each do |page| + raw_title = page.data["title"] + page_title = if raw_title && !raw_title.to_s.empty? + render_title(raw_title, markdown) + else + site_title + end + page.data["_seo_page_title"] = page_title + page.data["_seo_full_title"] = + page_title == site_title ? page_title : "#{page_title} | #{site_title}" + + url = page.url.to_s + canonical_input = url.sub(%r!/index\.html\z!, "/") + page.data["_seo_canonical"] = absolute_url(canonical_input, site) + page.data["_seo_is_home"] = HOMEPAGE_URLS.include?(url) + end + end + + # `text | markdownify | strip_html | normalize_whitespace | + # escape_once`, mirroring head_seo.html's pipeline byte-for-byte. + # `Jekyll::Converters::Markdown#convert` is cached internally, so + # `site.title` (a constant) hits the cache from the second call + # onwards; page titles are unique but tiny. + def render_title(text, markdown) + return "" if text.nil? + s = text.to_s + return "" if s.empty? + html = markdown.convert(s) + stripped = html.gsub(STRIP_HTML_BLOCKS, "").gsub(STRIP_HTML_TAGS, "") + collapsed = stripped.gsub(%r!\s+!, " ").tap(&:strip!) + collapsed.gsub(HTML_ESCAPE_ONCE_REGEXP, HTML_ESCAPE) + end + + # Mirrors `Jekyll::Filters::URLFilters#absolute_url`. When + # `site.url` is unset the result is just the relative URL, + # matching the filter's behaviour. + def absolute_url(input, site) + return nil if input.nil? + s = input.to_s + return s if Addressable::URI.parse(s).absolute? + site_url = site.config["url"].to_s + rel = relative_url(s, site) + return rel if site_url.empty? + Addressable::URI.parse(site_url + rel).normalize.to_s + end + + # Mirrors `Jekyll::Filters::URLFilters#relative_url`. + def relative_url(input, site) + s = input.to_s + return s if Addressable::URI.parse(s).absolute? + baseurl = site.config["baseurl"].to_s.chomp("/") + parts = [baseurl, s].map { |p| ensure_leading_slash(p) } + Addressable::URI.parse(parts.join).normalize.to_s + end + + def ensure_leading_slash(input) + return input if input.empty? || input.start_with?("/") + "/#{input}" + end + + # Mirrors `Jekyll::Filters#uri_escape`. + def uri_escape(input) + Addressable::URI.normalize_component(input) + end + end +end + +Jekyll::Hooks.register :site, :pre_render do |site| + Jekyll::SeoPrecompute.precompute!(site) +end diff --git a/docs/_profile/build.rb b/docs/_profile/build.rb new file mode 100644 index 0000000..0a04c0c --- /dev/null +++ b/docs/_profile/build.rb @@ -0,0 +1,18 @@ +# frozen_string_literal: true +# +# Run a Jekyll build via Jekyll's Ruby API, with Bundler activated for +# the docs/ Gemfile. Used by: +# +# * _profile/profile.rb -- wraps this run in ruby-prof +# * profile-rbspy.bat -- spawns ruby.exe with this script +# under rbspy +# +# Invoking ruby.exe directly (rather than `bundle exec jekyll build`) +# avoids the rbspy-on-Windows issue where its CreateProcess-based +# launcher can't resolve `bundle.cmd` / `bundle.bat` shims. + +require "bundler/setup" +require "jekyll" +require "jekyll/commands/build" + +Jekyll::Commands::Build.process({}) diff --git a/docs/_profile/profile.rb b/docs/_profile/profile.rb new file mode 100644 index 0000000..ffeff92 --- /dev/null +++ b/docs/_profile/profile.rb @@ -0,0 +1,59 @@ +# frozen_string_literal: true +# +# Run a Jekyll build under ruby-prof and write callgrind + flat / graph +# summaries into _profile/out/. +# +# Usage (from docs/): +# bundle exec ruby _profile/profile.rb +# Or via the wrapper: +# profile-rubyprof.bat +# +# ruby-prof is instrumentation-based (Ruby TracePoint), so the run is +# noticeably slower than a normal build -- expect ~3-5x wall time. +# That's the trade for complete coverage with no sampling bias. +# +# Output files (under _profile/out/): +# callgrind.out.jekyll-build -- KCachegrind / QCachegrind input +# jekyll-build.flat.txt -- top methods by self-time +# jekyll-build.graph.txt -- callers/callees per method +# +# Measure mode is WALL_TIME by default (includes I/O waits, which is +# what we want for a build that writes thousands of files). Switch to +# PROCESS_TIME if you specifically want CPU-only numbers. + +require "fileutils" +require "ruby-prof" + +OUT_DIR = File.expand_path("out", __dir__) +FileUtils.mkdir_p(OUT_DIR) + +profile = RubyProf::Profile.new( + measure_mode: RubyProf::WALL_TIME, + exclude_common: true, +) +profile.start + +begin + load File.expand_path("build.rb", __dir__) +ensure + result = profile.stop + + RubyProf::CallTreePrinter.new(result).print( + path: OUT_DIR, + profile: "jekyll-build", + ) + + File.open(File.join(OUT_DIR, "jekyll-build.flat.txt"), "w") do |f| + RubyProf::FlatPrinter.new(result).print(f, min_percent: 0.5) + end + + File.open(File.join(OUT_DIR, "jekyll-build.graph.txt"), "w") do |f| + RubyProf::GraphPrinter.new(result).print(f, min_percent: 0.5) + end + + puts + puts "ruby-prof output written to #{OUT_DIR}" + puts " callgrind.out.jekyll-build -- open in KCachegrind / QCachegrind" + puts " jekyll-build.flat.txt -- top methods by self-time" + puts " jekyll-build.graph.txt -- callers/callees per method" +end diff --git a/docs/assets/css/print.css b/docs/assets/css/print.css index 929f5eb..8c4245c 100644 --- a/docs/assets/css/print.css +++ b/docs/assets/css/print.css @@ -135,7 +135,8 @@ article.part-divider .part-number { margin: 0 0 1.2em; } -article.part-divider h1 { +article.part-divider h1, +article.part-divider .part-title-silent { font-size: 32pt; font-weight: 700; border: none; @@ -223,7 +224,8 @@ article.chapter-divider { padding-top: 30%; } -article.chapter-divider h2 { +article.chapter-divider h2, +article.chapter-divider .chapter-title-silent { font-size: 24pt; font-weight: 700; border: none; diff --git a/docs/book.html b/docs/book.html index 136bcae..9fa0604 100644 --- a/docs/book.html +++ b/docs/book.html @@ -4,73 +4,16 @@ sitemap: false --- {%- comment -%} - Whitespace primitives. `sp` is one space; `nl` is one newline, - captured between the opening and closing capture tags so the LF is - preserved verbatim (trim markers strip only outside-the-capture - whitespace). - - Inter-span whitespace patterns: PagedJS drops bare text-node - whitespace between sibling elements when it splits a pre across - pages, mashing tokens and collapsing line breaks. Wrapping each - inter-span whitespace run in a span.w element makes it a structured - child that survives the split. - - Variants, longest first so each consumes its bytes before a shorter - pattern can fragment them: - SP NL SP NL -- blank line between a code-line ending (trailing - space outside the last token's span) and the next. - NL SP NL -- blank line between a comment-line ending (trailing - space already inside the comment span) and the next. - SP NL -- code line directly followed by the next line. - NL -- comment line directly followed by the next line. + The inter-span whitespace patterns the chapter-body transform + needs for pagedjs (each whitespace run between adjacent code + spans wrapped in a `` so the page splitter + doesn't collapse it) used to be declared here as Liquid variables + and consumed by `_includes/book-chapter-body.html`. They now live + inline in `_plugins/book-chapter-transform.rb`'s + `WHITESPACE_PATTERNS` constant -- see that plugin's header + comment for the longest-first rationale. {%- endcomment -%} -{%- assign sp = " " -%} -{%- capture nl %} -{% endcapture -%} - -{%- assign p1_search = '' | append: sp | append: nl | append: sp | append: nl | append: '\n \n99% of code block lines. -{%- endcomment -%} -{%- assign s4 = ' ' -%} -{%- assign s8 = ' ' -%} -{%- assign s12 = ' ' -%} -{%- assign s16 = ' ' -%} - -{%- assign p4i4_search = '' | append: nl | append: s4 | append: 'twinBASIC Documentation

{%- endcomment -%} {%- if site.data.book.front_matter -%} {%- for fm in site.data.book.front_matter -%} - {%- if fm.page -%} - {%- assign fm_chapters = site.pages | where: "url", fm.page -%} - {%- elsif fm.prefixes -%} - {%- assign fm_chapters = "" | split: "" -%} - {%- for prefix in fm.prefixes -%} - {%- assign matched = site.pages | where_exp: "p", "p.url contains prefix" -%} - {%- assign fm_chapters = fm_chapters | concat: matched -%} - {%- endfor -%} - {%- assign fm_chapters = fm_chapters | sort: "url" -%} - {%- else -%} - {%- assign fm_chapters = "" | split: "" -%} - {%- endif -%} - {%- for chapter in fm_chapters -%} + {%- comment -%} + Chapter list precomputed in `_plugins/book-resolve-chapters.rb` + at `:site, :pre_render`. The Liquid `include book-collect-matches` + + `where_exp` + `sort_by_nav_order` chain that used to live here + folded into the Ruby resolver -- see the plugin's header comment + for the savings rationale. + {%- endcomment -%} + {%- for chapter in fm._chapters -%} {%- if chapter.url == '/' -%} {%- assign fm_anchor = 'ch-' | append: fm.title | downcase | replace: ' ', '-' -%} {%- else -%} @@ -161,9 +99,13 @@

twinBASIC Documentation

{%- endif -%} {%- for part in site.data.book.parts -%} -
+

Part {{ roman[forloop.index0] }}

+{%- if part.no_outline_entry %} +

{{ part.title }}

+{%- else %}

{{ part.title }}

+{%- endif -%} {%- if part.subtitle -%} {%- assign subtitle_html = part.subtitle | markdownify | remove: '

' | remove: '

' | strip %}

{{ subtitle_html }}

@@ -173,24 +115,16 @@

{{ part.title }}

{%- endif %}
{%- comment -%} - Gather the part's chapters once. A part has either `page:` (single - absolute URL, exact match) or `prefixes:` (list of URL prefixes, - contains-match per prefix, union sorted). Pre-2.4 the loop only - handled `prefixes:`; `page:` was added so one-chapter parts like - the FAQ don't have to invent a folder. + Flat-part chapter list precomputed in + `_plugins/book-resolve-chapters.rb` at `:site, :pre_render`. + The plugin applies the same selector schema (page / pages / + nav_page / nav_pages / no_descent + landing_page-first ordering + + sort_by_nav_order) the Liquid chain used to drive, in one Ruby + pass per entry. Chaptered parts (`part.chapters`) get their + chapter lists resolved separately under `part.chapters[*]._chapters` + -- the part itself has no _chapters in that case. {%- endcomment -%} - {%- if part.page -%} - {%- assign chapters = site.pages | where: "url", part.page -%} - {%- elsif part.prefixes -%} - {%- assign chapters = "" | split: "" -%} - {%- for prefix in part.prefixes -%} - {%- assign matched = site.pages | where_exp: "p", "p.url contains prefix" -%} - {%- assign chapters = chapters | concat: matched -%} - {%- endfor -%} - {%- assign chapters = chapters | sort: "url" -%} - {%- else -%} - {%- assign chapters = "" | split: "" -%} - {%- endif -%} + {%- assign chapters = part._chapters -%} {%- comment -%} 1.6a: state for sub-page detection. Index pages (URL ending in `/`) sort before their siblings under ASCII order, so a single-slot @@ -210,10 +144,47 @@

{{ part.title }}

{%- if part.foreword_page -%} {%- assign fw_chapters = site.pages | where: "url", part.foreword_page -%} {%- for chapter in fw_chapters -%} - {%- include book-chapter-body.html - chapter=chapter - article_class_override='part-foreword' - skip_sub_page_detection=true -%} + {%- if part.no_heading_shift -%} + {%- include book-chapter-body.html + chapter=chapter + article_class_override='part-foreword' + skip_sub_page_detection=true + skip_base_heading_shift=true -%} + {%- else -%} + {%- include book-chapter-body.html + chapter=chapter + article_class_override='part-foreword' + skip_sub_page_detection=true -%} + {%- endif -%} + {%- endfor -%} + {%- endif -%} + + {%- comment -%} + Part landing: a regular page article emitted between the part + divider (and any foreword) and the first chapter. Renders with + normal page styling -- unlike `foreword_page` which gets the + `part-foreword` named-page treatment -- and the plugin strips + its source H1 so the part divider's H1 stays the part's only + top-level outline entry. `skip_sub_page_detection=true` keeps + the landing's trailing-slash URL from making subsequent chapter + dividers / chapter pages look like its sub-pages. Supported + for both flat parts (handled below in the flat-part loop) and + chaptered parts (emitted right here, before the chaptered + loop). + {%- endcomment -%} + {%- if part.chapters and part.landing_page -%} + {%- assign ld_chapters = site.pages | where: "url", part.landing_page -%} + {%- for chapter in ld_chapters -%} + {%- if part.no_heading_shift -%} + {%- include book-chapter-body.html + chapter=chapter + skip_sub_page_detection=true + skip_base_heading_shift=true -%} + {%- else -%} + {%- include book-chapter-body.html + chapter=chapter + skip_sub_page_detection=true -%} + {%- endif -%} {%- endfor -%} {%- endif -%} @@ -243,53 +214,68 @@

{{ part.title }}

{%- else -%} {%- assign chapter_divider_id = 'chd-' | append: ch_entry.title | downcase | replace: ' ', '-' -%} {%- endif -%} -
+
+{%- if ch_entry.no_outline_entry %} +

{{ ch_entry.title }}

+{%- else %}

{{ ch_entry.title }}

+{%- endif -%} {%- if ch_entry.subtitle %}

{{ ch_entry.subtitle }}

{%- endif %}
{%- comment -%} - Build this chapter's page list: landing first (so it acts as - the chapter intro right after the divider), then prefix- - matched content sorted by URL with the landing filtered out - to avoid double emission when it also matches a prefix - (typical: /tB/Packages/Foo/ landing matches prefix - /tB/Packages/Foo/). + Chapter page list precomputed in + `_plugins/book-resolve-chapters.rb`: landing_page first (if + set), then prefix-swept rest with landing filtered out, sorted + by `sort_by_nav_order`. See the plugin's header comment for + the selector schema and the savings rationale. {%- endcomment -%} - {%- assign ch_chapters = "" | split: "" -%} - {%- if ch_entry.landing_page -%} - {%- assign ch_landing = site.pages | where: "url", ch_entry.landing_page -%} - {%- assign ch_chapters = ch_chapters | concat: ch_landing -%} - {%- endif -%} - {%- if ch_entry.prefixes -%} - {%- assign ch_rest = "" | split: "" -%} - {%- for prefix in ch_entry.prefixes -%} - {%- assign matched = site.pages | where_exp: "p", "p.url contains prefix" -%} - {%- assign ch_rest = ch_rest | concat: matched -%} - {%- endfor -%} - {%- assign ch_rest = ch_rest | sort: "url" -%} - {%- if ch_entry.landing_page -%} - {%- assign ch_landing_url = ch_entry.landing_page -%} - {%- assign ch_rest = ch_rest | where_exp: "p", "p.url != ch_landing_url" -%} - {%- endif -%} - {%- assign ch_chapters = ch_chapters | concat: ch_rest -%} - {%- endif -%} + {%- assign ch_chapters = ch_entry._chapters -%} {%- assign current_index_url = '' -%} {%- assign current_index_kind = 'class' -%} {%- assign current_index_name = '' -%} {%- for chapter in ch_chapters -%} - {%- include book-chapter-body.html chapter=chapter extra_heading_shift=true -%} + {%- if part.no_heading_shift and ch_entry.no_heading_shift -%} + {%- include book-chapter-body.html chapter=chapter skip_base_heading_shift=true -%} + {%- elsif part.no_heading_shift -%} + {%- include book-chapter-body.html chapter=chapter skip_base_heading_shift=true extra_heading_shift=true -%} + {%- elsif ch_entry.no_heading_shift -%} + {%- include book-chapter-body.html chapter=chapter -%} + {%- else -%} + {%- include book-chapter-body.html chapter=chapter extra_heading_shift=true -%} + {%- endif -%} {%- endfor -%} {%- endfor -%} {%- else -%} {%- assign current_index_url = '' -%} {%- assign current_index_kind = 'class' -%} {%- assign current_index_name = '' -%} + {%- comment -%} + The part landing is the part's intro content, not an index for + its sibling chapters -- pass `skip_sub_page_detection=true` so + its URL ending in `/` doesn't make subsequent top-level chapters + look like its sub-pages. Sub-page detection still kicks in for + the per-chapter folder indexes that arrive later in the + iteration (each gathered together with its leaves by + `sort_by_nav_order`'s folder grouping). + {%- endcomment -%} {%- for chapter in chapters -%} - {%- include book-chapter-body.html chapter=chapter -%} + {%- assign is_part_landing = false -%} + {%- if part.landing_page and chapter.url == part.landing_page -%} + {%- assign is_part_landing = true -%} + {%- endif -%} + {%- if part.no_heading_shift and is_part_landing -%} + {%- include book-chapter-body.html chapter=chapter skip_base_heading_shift=true skip_sub_page_detection=true -%} + {%- elsif part.no_heading_shift -%} + {%- include book-chapter-body.html chapter=chapter skip_base_heading_shift=true -%} + {%- elsif is_part_landing -%} + {%- include book-chapter-body.html chapter=chapter skip_sub_page_detection=true -%} + {%- else -%} + {%- include book-chapter-body.html chapter=chapter -%} + {%- endif -%} {%- endfor -%} {%- endif -%} {%- endfor %} diff --git a/docs/profile-rbspy.bat b/docs/profile-rbspy.bat new file mode 100644 index 0000000..80160f2 --- /dev/null +++ b/docs/profile-rbspy.bat @@ -0,0 +1,11 @@ +@rem Profile a Jekyll build with rbspy (sampling profiler, Windows-native). +@rem +@rem Outputs _profile/out/jekyll-build.speedscope.json -- drag into +@rem https://www.speedscope.app/ for the timeline / sandwich / left-heavy +@rem views (closest in spirit to the Firefox profiler UI). +@rem +@rem rbspy.exe is downloaded into _profile/ (gitignored) -- see +@rem _profile/profile.rb's header for the project's profiling notes. +@rem Sampling rate: 99 Hz (rbspy default). +@if not exist _profile\out mkdir _profile\out +@_profile\rbspy.exe record --format speedscope --file _profile\out\jekyll-build.speedscope.json -- ruby _profile\build.rb %* diff --git a/docs/profile-rubyprof.bat b/docs/profile-rubyprof.bat new file mode 100644 index 0000000..6c10e5d --- /dev/null +++ b/docs/profile-rubyprof.bat @@ -0,0 +1 @@ +@bundle exec ruby _profile/profile.rb %*