Skip to content

libxml2 2.12.x compatibility issue (found on Almalinux 10.x) #595

@jesltc

Description

@jesltc

TLDR: ltfs code is incompatible with libxml2 at version 2.12 on AlmaLinux causing malformed xml encoding string in the index xml and thus unable to mount. I used Claude code to find the issue. Patch attached.

Details:
I built ltfs on Almalinux 10.2 successfully, but ran into a runtime issue trying to mount a tape (freshly formatted using this same ltfs I built) - seemingly related to xml parsing for the index.:

/usr/local/bin/ltfs -o devname=/dev/sg0 /mnt/tape
a10c LTFS14000I LTFS starting, LTFS version 2.4.5.1 (Prelim), log level 2.
a10c LTFS14058I LTFS Format Specification version 2.4.0.
a10c LTFS14104I Launched by "/usr/local/bin/ltfs -o devname=/dev/sg0 /mnt/tape".
a10c LTFS14105I This binary is built for Linux (x86_64).
a10c LTFS14106I GCC version is 14.3.1 20251022 (Red Hat 14.3.1-4).
a10c LTFS17087I Kernel version: Linux version 6.12.0-211.7.3.el10_2.x86_64 (mockbuild@x64-builder03.almalinux.org) (gcc (GCC) 14.3.1 20251022 (Red Hat 14.3.1-4), GNU ld version 2.41-63.el10.alma.1) #1 SMP PREEMPT_DYNAMIC Tue May 19 12:46:58 EDT 2026 i386.
a10c LTFS17089I Distribution: AlmaLinux release 10.2 (Lavender Lion).
a10c LTFS17089I Distribution: NAME="AlmaLinux".
a10c LTFS17089I Distribution: AlmaLinux release 10.2 (Lavender Lion).
a10c LTFS17089I Distribution: AlmaLinux release 10.2 (Lavender Lion).
a10c LTFS14063I Sync type is "time", Sync time is 300 sec.
a10c LTFS17085I Plugin: Loading "sg" tape backend.
a10c LTFS17085I Plugin: Loading "unified" iosched backend.
a10c LTFS14095I Set the tape device write-anywhere mode to avoid cartridge ejection.
a10c LTFS30209I Opening a device through sg-ibmtape driver (/dev/sg0).
a10c LTFS30250I Opened the SCSI tape device 0.0.0.0 (/dev/sg0).
a10c LTFS30207I Vendor ID is IBM     .
a10c LTFS30208I Product ID is ULTRIUM-HH9     .
a10c LTFS30214I Firmware revision is T3Q1.
a10c LTFS30215I Drive serial is 1097010764.
a10c LTFS30285I The reserved buffer size of /dev/sg0 is 1048576.
a10c LTFS30294I Setting up timeout values from RSOC.
a10c LTFS17160I Maximum device block size is 1048576.
a10c LTFS11330I Loading cartridge.
a10c LTFS30252I Logical block protection is disabled.
a10c LTFS11332I Load successful.
a10c LTFS17157I Changing the drive setting to write-anywhere mode.
a10c LTFS11005I Mounting the volume.
a10c LTFS30252I Logical block protection is disabled.
a10c LTFS17018E XML parser: unsupported encoding '(null)'.            <<-------------
a10c LTFS17016E Cannot parse index direct from medium (-5012).
a10c LTFS11194W Cannot read index: failed to read and parse XML data (-5012).
a10c LTFS11024E Cannot mount volume: read index failed on the index partition.
a10c LTFS14013E Cannot mount the volume.
a10c LTFS30252I Logical block protection is disabled.

I don't know anything about libxml2, so I had Claude code take a look, and it found an issue with the xml writer/reader routines while using later versions of libxml2.. From Claude: "libxml2 2.12 changed UTF-8 to native handling. xmlFindCharEncodingHandler("UTF-8") now returns NULL (no handler needed). xmlTextWriterStartDocument() only writes encoding="UTF-8" to the XML prolog when writer->out->encoder != NULL — which is never true for UTF-8 in libxml2 2.12+. LTFS writes valid UTF-8 data to tape but without the encoding declaration, causing mount to fail at read time."

I am going to include the full patch it generated here. I don't know if I can get to a PR, but might be able to. I'm a bit rusty :) , plus don't have much time right now. But, the patch works for me, and I was able to properly format and mount after I rebuilt with the patch. I also don't know if this might be otherwise incompatible with older libxm2 versions, but a quick look at it seems like it should be ok.

Full patch and Claude's description:

From: libxml2 2.12 compatibility fix
Subject: [PATCH] Fix XML encoding declaration for libxml2 2.12+ compatibility

In libxml2 2.12, UTF-8 encoding is handled natively without a conversion
handler object. As a result, xmlFindCharEncodingHandler("UTF-8") returns
NULL, and xmlTextWriterStartDocument() only writes the encoding="UTF-8"
attribute in the XML declaration when writer->out->encoder != NULL.

Since LTFS creates its output buffers with a NULL encoder (correct for
UTF-8 passthrough), the encoding attribute is silently dropped from every
XML index written to tape. The tape format itself is otherwise correct --
the data is valid UTF-8 -- but the missing declaration causes mount
failure.

On reading, xmlTextReaderConstEncoding() returns NULL because the XML
declaration contains no encoding attribute. The existing reader check
treats NULL as an error:

LTFS17018E XML parser: unsupported encoding '(null)'.
LTFS11024E Cannot mount volume: read index failed on the index partition.

Two fixes:

  1. Writer (xml_writer_libltfs.c): In _xml_write_schema(), replace the
    xmlTextWriterStartDocument() call with xmlTextWriterWriteRaw() to
    write the XML declaration explicitly, bypassing the encoder != NULL
    guard in libxml2's implementation.

  2. Reader (xml_reader_libltfs.c): In _xml_parser_init(), treat a NULL
    encoding as UTF-8 rather than rejecting it. Per the XML specification,
    absence of an encoding declaration implies UTF-8 for UTF-8-encoded
    content. This also makes tapes formatted before the writer fix
    mountable without reformatting.

Affected platforms: any build against libxml2 >= 2.12.0, which includes
RHEL/AlmaLinux 10, Fedora 40+, Ubuntu 24.04+, and anything else shipping
libxml2 2.12 or later.

Tested: AlmaLinux 10.2, libxml2-2.12.5-10.el10.x86_64,
LTFS 2.4.5.1 built from master (LinearTapeFileSystem/ltfs),
IBM ULTRIUM-HH9 (LTO-9) drive.

 src/libltfs/xml_reader_libltfs.c | 5 +++--
 src/libltfs/xml_writer_libltfs.c | 7 ++++---
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/src/libltfs/xml_reader_libltfs.c b/src/libltfs/xml_reader_libltfs.c
--- a/src/libltfs/xml_reader_libltfs.c
+++ b/src/libltfs/xml_reader_libltfs.c
@@ _xml_parser_init() @@
 	/* reject this XML file if it isn't UTF-8 */
 	encoding = (const char *)xmlTextReaderConstEncoding(reader);
-	if (! encoding || strcmp(encoding, "UTF-8")) {
+	/* libxml2 2.12+ returns NULL for UTF-8: it is the native encoding and
+	 * has no conversion handler. NULL encoding means the XML default (UTF-8)
+	 * per the XML specification; only reject an explicit non-UTF-8 declaration. */
+	if (encoding != NULL && strcmp(encoding, "UTF-8") != 0) {
 		ltfsmsg(LTFS_ERR, 17018E, encoding);
 		return -LTFS_XML_WRONG_ENCODING;
 	}

diff --git a/src/libltfs/xml_writer_libltfs.c b/src/libltfs/xml_writer_libltfs.c
--- a/src/libltfs/xml_writer_libltfs.c
+++ b/src/libltfs/xml_writer_libltfs.c
@@ _xml_write_schema() @@
-	ret = xmlTextWriterStartDocument(writer, NULL, "UTF-8", NULL);
-	if (ret < 0) {
+	/* xmlTextWriterStartDocument only writes encoding="UTF-8" in the XML
+	 * declaration when writer->out->encoder != NULL. In libxml2 2.12+,
+	 * UTF-8 is handled natively without an encoder object, so
+	 * xmlFindCharEncodingHandler("UTF-8") returns NULL and the attribute
+	 * is silently dropped. Write the XML declaration explicitly to
+	 * maintain LTFS spec compliance across all libxml2 versions. */
+	ret = xmlTextWriterWriteRaw(writer, BAD_CAST "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n");
+	if (ret < 0) {
 		ltfsmsg(LTFS_ERR, 17057E, ret);
 		return -1;
 	}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions