Skip to content

Code block fencing: 2.7x fewer fenced blocks than docling across 38 papers #1

@sentientsergio

Description

@sentientsergio

We ran a full evaluation of tomd against docling on the 38 PDF papers in the 2026-02 pre-Croydon mailing. Full report with side-by-side markdowns: https://github.com/cppalliance/paperlint-eval/blob/main/tomd-eval/report.md

tomd retains 476 code block fences across the corpus vs docling's 1,270 (2.7x gap). This is the primary blocker for adoption in the paperlint evaluation pipeline, where evidence quotes must be findable as substrings in the extracted text.

Specific papers where code is affected:

  • P3984R0 — code examples rendered as bold inline text instead of fenced blocks. Compare docling.
  • P3181R1 — multi-thread code examples fragmented into inline code spans. Compare docling.
  • P3596R0 — 226 fences in docling, 4 in tomd.
  • P4003R0 — 196 fences in docling, 4 in tomd. Code identifiers wrapped in :::wording blocks.
  • P3844R4 — 54 fences in docling, 0 in tomd.

Fixing this would substantially close the gap for pipeline adoption.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions