Papers with color formatting and strikethrough in the PDF (P4003R0, P4007R0) produce output where most code identifiers are wrapped in <del> tags inside :::wording-remove blocks. tomd is correctly detecting real PDF formatting that other extractors ignore — P4007R0 has 292 strikethrough spans and 422 colored spans in just the first 5 pages.
The issue is readability: when every std::execution, coroutine_handle<>, set_value, paper reference, and library name is wrapped in <del>, the output is hard to read and hard to search.
Possible approaches:
Full evaluation report: https://github.com/cppalliance/paperlint-eval/blob/main/tomd-eval/report.md (see Check 2b and Limitations §3).
Papers with color formatting and strikethrough in the PDF (P4003R0, P4007R0) produce output where most code identifiers are wrapped in
<del>tags inside:::wording-removeblocks. tomd is correctly detecting real PDF formatting that other extractors ignore — P4007R0 has 292 strikethrough spans and 422 colored spans in just the first 5 pages.The issue is readability: when every
std::execution,coroutine_handle<>,set_value, paper reference, and library name is wrapped in<del>, the output is hard to read and hard to search.Possible approaches:
A flag to control strikethrough detection sensitivity
Emitting detected formatting as metadata/annotations rather than inline
<del>tagsA threshold: only mark as
<del>when the strikethrough is on a contiguous block, not individual code tokensP4003R0 tomd: https://github.com/cppalliance/paperlint-eval/blob/main/tomd-eval/tomd/p4003r0.md
P4003R0 docling: https://github.com/cppalliance/paperlint-eval/blob/main/tomd-eval/docling/p4003r0.md
P4007R0 tomd: https://github.com/cppalliance/paperlint-eval/blob/main/tomd-eval/tomd/p4007r0.md
P4007R0 docling: https://github.com/cppalliance/paperlint-eval/blob/main/tomd-eval/docling/p4007r0.md
Full evaluation report: https://github.com/cppalliance/paperlint-eval/blob/main/tomd-eval/report.md (see Check 2b and Limitations §3).