Skip to content

XMLReader::readOuterXml() returns empty string for DOCTYPE nodes since libxml2 2.13.0 #21876

@lahwaacz

Description

@lahwaacz

Description

The following code:

<?php
$svg = <<<'XML'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
  "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg xmlns="http://www.w3.org/2000/svg" width="400" height="300">
  <rect width="400" height="300" fill="white"/>
</svg>
XML;

$reader = new XMLReader();
$reader->XML($svg, null, LIBXML_NOERROR | LIBXML_NOWARNING | LIBXML_NONET);
$reader->setParserProperty(XMLReader::SUBST_ENTITIES, true);

while (@$reader->read()) {
    if ($reader->nodeType == XMLReader::DOC_TYPE) {
        $outer = $reader->readOuterXml();
        $inner = $reader->readInnerXml();
        echo "  readOuterXml(): " . ($outer !== '' ? '"' . substr($outer, 0, 120) . '"' : '[EMPTY STRING]') . "\n";
        echo "  readOuterXml() length: " . strlen($outer) . "\n";
        echo "  readInnerXml(): " . ($inner !== '' ? '"' . substr($inner, 0, 120) . '"' : '[EMPTY STRING]') . "\n";
        echo "  readInnerXml() length: " . strlen($inner) . "\n";
    }
}
$reader->close();

Results in this output with old libxml2 (<= 2.12.9):

readOuterXml(): "<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">"
readOuterXml() length: 98
readInnerXml(): [EMPTY STRING]
readInnerXml() length: 0

But the behavior is different with libxml2 >= 2.13.0:

readOuterXml(): [EMPTY STRING]
readOuterXml() length: 0
readInnerXml(): [EMPTY STRING]
readInnerXml() length: 0

Root cause

libxml2 commit reader: Rework xmlTextReaderRead{Inner,Outer}Xml (2024-04-22, released in v2.13.0) introduced xmlTextReaderDumpCopy() which serializes reader nodes into a buffer. This function skips XML_DTD_NODE entries, whereas the old code had explicit xmlCopyDtd() handling that correctly serialized DTD nodes.

Is this is an upstream libxml2 regression or intended behavior? Does PHP need a workaround for XMLReader?

This issue affects at least MediaWiki (Uploading SVG file generated by matplotlib fails with libxml2 >= 2.13.0): https://phabricator.wikimedia.org/T399990

Reproducer

Two self-contained scripts that build libxml2 from source at specific versions inside a Fedora 41 Docker container:

  • host_php.sh — host-side driver (usage: ./host_php.sh [good|bad|both])
  • container_php.sh — runs inside the container, builds libxml2, runs the PHP test

Usage:

./host_php.sh both

container_php.sh
host_php.sh

PHP Version

PHP 8.3.x / 8.4.x (any version linked against libxml2 >= 2.13.0)

Operating System

Arch Linux, also confirmed on Alpine edge, Debian sid, and inside Fedora 41 containers with custom-built libxml2.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions