Skip to content

Conversation

@Ch111p
Copy link
Contributor

@Ch111p Ch111p commented Dec 24, 2025

Similar to the issue discussed in #165940 and in some apps I’ve seen at work, we’ve observed dSYM files being generated where a section’s offset exceeds 4GB.

This seems to happen because many DWARF sections produced for DWARF4 end up with an alignment of 0. Before PR #168925 (which added validation for sections with a zero alignment attribute), this could go unnoticed and result in an invalid MachO.

However, such a dSYM is effectively an invalid Mach-O file today. Unless/until Mach-O explicitly defines standards like:
"32-bit section offsets can exceed 4G but must be ordered for 64-bit offset reconstruction", I think the proper approach is to add a new argument to support this.

@llvmbot
Copy link
Member

llvmbot commented Dec 24, 2025

@llvm/pr-subscribers-debuginfo

Author: Ch1p (Ch111p)

Changes

Similar to the issue discussed in #165940 and in some apps I’ve seen at work, we’ve observed dSYM files being generated where a section’s offset exceeds 4GB.

This seems to happen because many DWARF sections produced for DWARF4 end up with an alignment of 0. Before PR #168925 (which added validation for sections with a zero alignment attribute), this could go unnoticed and result in an invalid MachO.

However, such a dSYM is effectively an invalid Mach-O file today. Unless/until Mach-O explicitly defines standards like:
"32-bit section offsets can exceed 4G but must be ordered for 64-bit offset reconstruction", I think the proper approach is to add a new argument to support this.


Full diff: https://github.com/llvm/llvm-project/pull/173503.diff

6 Files Affected:

  • (modified) llvm/tools/dsymutil/DwarfLinkerForBinary.cpp (+1-1)
  • (modified) llvm/tools/dsymutil/LinkUtils.h (+4)
  • (modified) llvm/tools/dsymutil/MachOUtils.cpp (+4-4)
  • (modified) llvm/tools/dsymutil/MachOUtils.h (+1-1)
  • (modified) llvm/tools/dsymutil/Options.td (+6)
  • (modified) llvm/tools/dsymutil/dsymutil.cpp (+2)
diff --git a/llvm/tools/dsymutil/DwarfLinkerForBinary.cpp b/llvm/tools/dsymutil/DwarfLinkerForBinary.cpp
index 1fc5bba602d8b..68756a891d90d 100644
--- a/llvm/tools/dsymutil/DwarfLinkerForBinary.cpp
+++ b/llvm/tools/dsymutil/DwarfLinkerForBinary.cpp
@@ -883,7 +883,7 @@ bool DwarfLinkerForBinary::linkImpl(
       ObjectType == Linker::OutputFileType::Object)
     return MachOUtils::generateDsymCompanion(
         Options.VFS, Map, *Streamer->getAsmPrinter().OutStreamer, OutFile,
-        RelocationsToApply);
+        RelocationsToApply, Options.AllowInvalidMachO);
 
   Streamer->finish();
   return true;
diff --git a/llvm/tools/dsymutil/LinkUtils.h b/llvm/tools/dsymutil/LinkUtils.h
index c333a3d4afee0..d950f5f19474c 100644
--- a/llvm/tools/dsymutil/LinkUtils.h
+++ b/llvm/tools/dsymutil/LinkUtils.h
@@ -123,6 +123,10 @@ struct LinkOptions {
   bool IncludeSwiftModulesFromInterface = false;
   /// @}
 
+  /// Whether to allow generating Mach-O files that exceed certain format
+  /// limitations, such as sections with file offsets greater than 4GB.
+  bool AllowInvalidMachO = false;
+
   LinkOptions() = default;
 };
 
diff --git a/llvm/tools/dsymutil/MachOUtils.cpp b/llvm/tools/dsymutil/MachOUtils.cpp
index fba698896618b..deb2225dc22c6 100644
--- a/llvm/tools/dsymutil/MachOUtils.cpp
+++ b/llvm/tools/dsymutil/MachOUtils.cpp
@@ -324,7 +324,7 @@ static void transferSegmentAndSections(
 // Write the __DWARF segment load command to the output file.
 static bool createDwarfSegment(const MCAssembler& Asm,uint64_t VMAddr, uint64_t FileOffset,
                                uint64_t FileSize, unsigned NumSections,
-                                MachObjectWriter &Writer) {
+                                MachObjectWriter &Writer, bool AllowInvalidMachO) {
   Writer.writeSegmentLoadCommand("__DWARF", NumSections, VMAddr,
                                  alignTo(FileSize, 0x1000), FileOffset,
                                  FileSize, /* MaxProt */ 7,
@@ -340,7 +340,7 @@ static bool createDwarfSegment(const MCAssembler& Asm,uint64_t VMAddr, uint64_t
       VMAddr = alignTo(VMAddr, Alignment);
       FileOffset = alignTo(FileOffset, Alignment);
     }
-    if (FileOffset > UINT32_MAX)
+    if (FileOffset > UINT32_MAX && !AllowInvalidMachO)
       return error("section " + Sec->getName() +
                    "'s file offset exceeds 4GB."
                    " Refusing to produce an invalid Mach-O file.");
@@ -374,7 +374,7 @@ bool generateDsymCompanion(
     llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS, const DebugMap &DM,
     MCStreamer &MS, raw_fd_ostream &OutFile,
     const std::vector<MachOUtils::DwarfRelocationApplicationInfo>
-        &RelocationsToApply) {
+        &RelocationsToApply, bool AllowInvalidMachO) {
   auto &ObjectStreamer = static_cast<MCObjectStreamer &>(MS);
   MCAssembler &MCAsm = ObjectStreamer.getAssembler();
   auto &Writer = static_cast<MachObjectWriter &>(MCAsm.getWriter());
@@ -586,7 +586,7 @@ bool generateDsymCompanion(
 
   // Write the load command for the __DWARF segment.
   if (!createDwarfSegment(MCAsm, DwarfVMAddr, DwarfSegmentStart, DwarfSegmentSize,
-                          NumDwarfSections, Writer))
+                          NumDwarfSections, Writer, AllowInvalidMachO))
     return false;
 
   assert(OutFile.tell() == LoadCommandSize + HeaderSize);
diff --git a/llvm/tools/dsymutil/MachOUtils.h b/llvm/tools/dsymutil/MachOUtils.h
index 0229647b00f85..d276a21b6de72 100644
--- a/llvm/tools/dsymutil/MachOUtils.h
+++ b/llvm/tools/dsymutil/MachOUtils.h
@@ -59,7 +59,7 @@ bool generateDsymCompanion(
     llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS, const DebugMap &DM,
     MCStreamer &MS, raw_fd_ostream &OutFile,
     const std::vector<MachOUtils::DwarfRelocationApplicationInfo>
-        &RelocationsToApply);
+        &RelocationsToApply, bool AllowInvalidMachO);
 
 std::string getArchName(StringRef Arch);
 } // namespace MachOUtils
diff --git a/llvm/tools/dsymutil/Options.td b/llvm/tools/dsymutil/Options.td
index 571f90c1e46f5..036094f5eeca5 100644
--- a/llvm/tools/dsymutil/Options.td
+++ b/llvm/tools/dsymutil/Options.td
@@ -108,6 +108,12 @@ def fat64: F<"fat64">,
 def update: F<"update">,
   HelpText<"Updates existing dSYM files to contain the latest accelerator tables and other DWARF optimizations.">,
   Group<grp_general>;
+
+def allow_invalid_macho: F<"allow-invalid-macho">,
+  HelpText<"Allow generating Mach-O files that exceed certain format limitations, such as sections with file offsets greater than 4GB."
+           "Use with caution, as such files may not be compatible with all tools.">,
+  Group<grp_general>;
+
 def: Flag<["-"], "u">,
   Alias<update>,
   HelpText<"Alias for --update">,
diff --git a/llvm/tools/dsymutil/dsymutil.cpp b/llvm/tools/dsymutil/dsymutil.cpp
index a71c57c60593c..0a3af1bdb0cec 100644
--- a/llvm/tools/dsymutil/dsymutil.cpp
+++ b/llvm/tools/dsymutil/dsymutil.cpp
@@ -315,6 +315,8 @@ static Expected<DsymutilOptions> getOptions(opt::InputArgList &Args) {
   Options.LinkOpts.Fat64 = Args.hasArg(OPT_fat64);
   Options.LinkOpts.KeepFunctionForStatic =
       Args.hasArg(OPT_keep_func_for_static);
+  Options.LinkOpts.AllowInvalidMachO =
+      Args.hasArg(OPT_allow_invalid_macho);
 
   if (opt::Arg *ReproducerPath = Args.getLastArg(OPT_use_reproducer)) {
     Options.ReproMode = ReproducerMode::Use;

@github-actions
Copy link

github-actions bot commented Dec 24, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@Ch111p Ch111p force-pushed the feature/add-dsymutil-flag branch from b3be6e6 to 9ca9117 Compare December 25, 2025 19:13
@Ch111p
Copy link
Contributor Author

Ch111p commented Jan 5, 2026

Hello @JDevlieghere ! Could you help me review this PR?

@JDevlieghere
Copy link
Member

I had some trouble following the PR description, but here's what I think is going on. Please correct me if I misunderstood anything.

  • We have binaries for which the linked DWARF exceeds the Mach-O 4GB limit (within the same slice) and results in an invalid Mach-O file.
  • We have a check in dsymutil to prevent this, but previously the check was (incorrectly) guarded by the alignment being non-zero. You fixed that in [dsymutil] Add missing validation for zero alignment section #168925.
  • You discovered that alignment of zero is pretty common, so with the previous fix, this error becomes a lot more prevalent (and previously would go unnoticed).

Separately, as long as the actual offset is less than UINT32_MAX and sections are ordered, this can be made to work, even if the computed offset exceeds it, but as you point out that's not standard and currently only supported by LLVM's libObject (as of #165940).

Presumably, the motivation for this new flag is to support the latter, rather than just provide an escape hatch for your previous fix? Because without that you're just generating Mach-Os you can't read. Assuming that's the case, then I think a new flag to allow this is reasonable given:

  1. We explicitly confirm the requirement for this to work (i.e the offset is <4GB, which may be implicitly enforced by it being a uint32_t) and the sections are ordered (which I assume is always the case in dsymutil). If there's nothing to check, a comment that states the prerequisites is fine.
  2. We change the name of the flag to be more descriptive and update the help to be more explicit about what it does and how it can be used, so that if someone wants to use it, they can add support to their tools for it. It should still be off by default and mention that not all tools support this.

@Ch111p
Copy link
Contributor Author

Ch111p commented Jan 5, 2026

Sorry for the ambiguous PR description. And yes, your understanding matches what I’m seeing.

  1. We explicitly confirm the requirement for this to work (i.e the offset is <4GB, which may be implicitly enforced by it being a uint32_t) and the sections are ordered (which I assume is always the case in dsymutil). If there's nothing to check, a comment that states the prerequisites is fine.

Agreed. I’ll add a comment at the relevant point in the code.

  1. We change the name of the flag to be more descriptive and update the help to be more explicit about what it does and how it can be used, so that if someone wants to use it, they can add support to their tools for it. It should still be off by default and mention that not all tools support this.

Agreed. I’ll rename the flag to be more descriptive and update the help text.

HelpText<"Updates existing dSYM files to contain the latest accelerator tables and other DWARF optimizations.">,
Group<grp_general>;

def allow_slice_section_header_offset_overflow : F<"allow-slice-section-header-offset-overflow">,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Let's drop the slice part. We should include that in the description (to disambiguate with the fat64 header flag) but I think it's a bit too verbose for this option. I'd drop it form the variable name too (i.e AllowSectionHeaderOffsetOverflow).

Comment on lines 108 to 120
def update: F<"update">,
HelpText<"Updates existing dSYM files to contain the latest accelerator tables and other DWARF optimizations.">,
Group<grp_general>;

def allow_slice_section_header_offset_overflow : F<"allow-slice-section-header-offset-overflow">,
HelpText<"Allow emitting Mach-O where, within a single slice, section header offsets (section.offset, 32-bit) exceed the 4GB limit (non-standard). "
"This option overrides dsymutil's per-slice 4GB check when writing section headers. "
"Use with caution: some tools may not be able to read the output.">,
Group<grp_general>;

def: Flag<["-"], "u">,
Alias<update>,
HelpText<"Alias for --update">,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

Allow emitting a section offset (32-bit) that, when applied, would exceed the Mach-O 4GB limit. This leverages the fact that the offset can be computed as a 64-bit value if the sections are ordered (always the case for dsymutil). Note that this generates an invalid Mach-O and requires explicit support by the tools that will need to consume it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's much cleaner! I’ll update the help text and drop the 'slice' part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants