Reproducer
Tested against tomd master at commit ad567e3.
Reproduce
tomd p3977r0.pdf --outdir out/
awk '{print length, NR}' out/p3977r0.md | sort -rn | head -5
The longest line in the output is several thousand characters and contains the entire region from Definition 5.1 through the end of section II concatenated.
Symptom
Numbered definitions and examples that should be separate paragraphs are concatenated, with every word in body text wrapped in single backticks. Excerpt:
**Definition** **5.5.** `A` `contract` `is` **disconnecting** `if` `and` `only` `if` `neither` `the` `primary` `nor` `secondary` `domains` `are` `empty,` `and` `for` `at` `least` `one` `element` `of` `the` `secondary` `domain` `1.` `A` `call` `is` `made` `that` `attempts` `to` `end` `the` `program;` `or` `2.` `Program` `execution` `continues` `indefinitely` `without` `return` `control` `to` `the` `caller` **Example** **5.5.a.** `A` `version` `of` `float` `sqrt(float)` `which,` `for` `negative` `numbers,` `is` `specified` `to` `call` `std::abort` `has` `a` `disconnecting` `contract.` ...
The pattern is: bold for the keyword (**Definition**, **Example**) and identifier/Latin label, then every word of the body wrapped in single backticks, with successive definitions and examples joined onto the same line.
Expected
Prose remains prose. Numbered definitions and examples are separate paragraphs/blocks. Words are not wrapped in inline code spans. docling on the same PDF produces clean prose paragraphs with proper line breaks between definitions.
Impact
- Discovery treats the region as code, not prose. Word-level grammar and spelling defects inside become invisible to LLM scanning.
- In paperlint pipeline-in-the-loop runs, three real grammar/spelling findings docling identified in this region were missed in the tomd-pipeline run, including:
- "Program execution continues indefinitely without return control to the caller" (missing word "returning")
- "the adverb reasonably is used where the adjective reasonable is required"
- "the C++ standard macro FE_INVALID is written as FE INVALID"
Uncertainty signal
p3977r0.prompts.md is written for this paper (54KB), but it covers reconciliation regions starting from page 0 — it does not surgically point at the definitions/examples region as the problem area.
Hypothesis on root cause
One of three symptoms of the same classifier-confidence bug — see the two companion issues filed alongside this one (bullets becoming deep headings; bibliography lists flattened). All three involve over-aggressive structural classification of ambiguous content.
Reproducer
Tested against tomd master at commit ad567e3.
Reproduce
The longest line in the output is several thousand characters and contains the entire region from Definition 5.1 through the end of section II concatenated.
Symptom
Numbered definitions and examples that should be separate paragraphs are concatenated, with every word in body text wrapped in single backticks. Excerpt:
The pattern is: bold for the keyword (
**Definition**,**Example**) and identifier/Latin label, then every word of the body wrapped in single backticks, with successive definitions and examples joined onto the same line.Expected
Prose remains prose. Numbered definitions and examples are separate paragraphs/blocks. Words are not wrapped in inline code spans. docling on the same PDF produces clean prose paragraphs with proper line breaks between definitions.
Impact
Uncertainty signal
p3977r0.prompts.mdis written for this paper (54KB), but it covers reconciliation regions starting from page 0 — it does not surgically point at the definitions/examples region as the problem area.Hypothesis on root cause
One of three symptoms of the same classifier-confidence bug — see the two companion issues filed alongside this one (bullets becoming deep headings; bibliography lists flattened). All three involve over-aggressive structural classification of ambiguous content.