Add installation of data-files build artifacts into Python package by jakelishman · Pull Request #574 · PyO3/setuptools-rust

jakelishman · 2026-02-22T16:15:49Z

This allows a RustExtension to "own" some data files, produced programmatically as part of its build.rs script, and for these to be installed along with the built extension module somewhere into the Python package tree. The motivating example use is an extension module that provides a C API, and generates the header file needed to access it and a function-pointer table in a PyCapsule as part of its build script; the header file ideally should be installed as a regular file inside the Python wheel, so the package can be used as a build dependency.

I've written in a basic example to use as a test case because I didn't want to just to immediately writing something like the cbindgen-based setup that motivated this work from my side. I got repeatedly bitten by TOML 1.0 (what Python's tomllib stdlib library can handle) not supporting newlines in inline tables when trying to write the data-files key! But happily there's syntax that works, even with the array-of-tables form there.

I'm not convinced by the logic around universal2 handling, but I also can't entirely see the use case for generated files and universal2 other than only needing one copy, so maybe what I've done is overkill.

I was able to use this logic entirely successfully with my own downstream package (Qiskit/qiskit#15711, though the code there is much more involved).

Close #563.

This allows a `RustExtension` to "own" some data files, produced programmatically as part of its `build.rs` script, and for these to be installed along with the built extension module somewhere into the Python package tree. The motivating example use is an extension module that provides a C API, and generates the header file needed to access it and a function-pointer table in a `PyCapsule` as part of its build script; the header file ideally should be installed as a regular file inside the Python wheel, so the package can be used as a build dependency.

davidhewitt

Thanks, a bunch of thoughts posted as comments below. Good to see this can all come together though I guess we need to think a bit about the config.

I wonder, an alternative design I could think of which would give user more control would be to maybe use an "in-tree" build backend https://peps.python.org/pep-0517/#in-tree-build-backends - maybe we could define some public API "hooks" into setuptools-rust build which in-tree backends could use to e.g. parse the cargo messages themselves and do custom logic? That might be more flexible than trying to fit this option directly into setuptools-rust itself...

davidhewitt · 2026-02-26T13:33:07Z

examples/data-files/pyproject.toml

+# Keys correspond to files/directories in the Rust extension's build directory `OUT_DIR`.
+# Values are Python packages that the corresponding file or directory should be placed inside.
+"my_file.txt" = "data_files"
+"dir" = "data_files._data"


I wonder if the value should be the final filename inside the directory? e.g. data_files/my_file.txt, data_files/data/dir. Not sure about the Python dotted path syntax here.

I agree it's a bit weird as-is. I'd prototyped it this way because "find the installation directory of this Python package" is a built-in function to the setuptools machinery, and I was concerned that if I did it in terms of file structure, it would be harder to associate the relative position without at least having to first split off the top level into an implicit Python package to locate the installation location.

davidhewitt · 2026-02-26T13:33:56Z

examples/data-files/pyproject.toml

+[[tool.setuptools-rust.ext-modules]]
+target = "data_files._lib"
+[tool.setuptools-rust.ext-modules.data-files]


Possible sticking point here - ext-modules is a list, but ext-modules.data-files is a table? How does this interact if there are multiple ext-modules?

I'm relatively sure that the data-files table attaches to the most recent entry in the ext-modules list - I think this is just the single-item case of it.

I actually wanted to write this example as

[[tool.setuptools-rust.ext-modules]] target = "data_files._lib" data-files = { "my_file.txt" = "data_files", "dir" = "data_files._data", }

but using an inline table with linebreaks inside it only arrived in TOML 1.1 late last year, and the Python tomllib doesn't handle it yet.

Sorry, to finish the thought - so the options you can write in TOML 1.0 today are

[[tool.setuptools-rust.ext-modules]] target = "pkg_a" data-files = { "my_file.txt" = "pkg_a", "my_file2.txt" = "pkg_a.sub" } [[tool.setuptools-rust.ext-modules]] target = "pkg_b" data-files = { "file3" = "pkg_b" }

or

[[tool.setuptools-rust.ext-modules]] target = "pkg_a" [tool.setuptools-rust.ext-modules.data-files] "my_file.txt" = "pkg_a" "my_file2.txt" = "pkg_a.sub" [[tool.setuptools-rust.ext-modules]] target = "pkg_b" [tool.setuptools-rust.ext-modules.data-files] "file3" = "pkg_b"

and these parse to the same thing:

import tomllib a = """ [[tool.setuptools-rust.ext-modules]] target = "pkg_a" [tool.setuptools-rust.ext-modules.data-files] "my_file.txt" = "pkg_a" "my_file2.txt" = "pkg_a.sub" [[tool.setuptools-rust.ext-modules]] target = "pkg_b" [tool.setuptools-rust.ext-modules.data-files] "file3" = "pkg_b" """ b = """ [[tool.setuptools-rust.ext-modules]] target = "pkg_a" data-files = { "my_file.txt" = "pkg_a", "my_file2.txt" = "pkg_a.sub" } [[tool.setuptools-rust.ext-modules]] target = "pkg_b" data-files = { "file3" = "pkg_b" } """ assert tomllib.loads(a) == tomllib.loads(b)

davidhewitt · 2026-02-26T13:35:57Z

setuptools_rust/extension.py

+        universal2_data_files_from: If there are ``data_files`` to copy over during a
+            ``universal2`` build, take them from this location.  By default, this uses
+            only the AArch64 build.


I'm unsure how I feel about this; it seems to me that content generated by the build.rs is likely to be target-specific. A conservative option might be to refuse to build multiple targets if data_files is set, i.e. require thin macOS builds for each platform. (Not sure how this interacts with multiple ext-modules.)

I'm quite content with forbidding universal2 builds entirely, or at least until there's a further user request for it.

Fwiw, in my particular use-case, the build files aren't target-specific (they're platform-agnostic header files), but that's just one case.

davidhewitt · 2026-02-26T13:36:58Z

setuptools_rust/extension.py

+        if self.data_files and len(self.target) > 1:
+            raise ValueError(
+                "using 'data_files' with multiple targets is not supported"
+            )


Exactly this makes sense to me, we should check how the config interacts with multiple ext-modules, and I'd be in favour of disallowing universal2 with data files unless someone has a really good reason why they need it.

I will check the multiple ext-modules, but I don't expect that particular one to be a problem, other than the user defining files that conflict between RustExtensions and making themselves build-order dependent. I don't think we could meaningfully restrict data-files across different RustExtension definitions because they don't see each other.

Just to make sure (because I got mixed up here too) - the self.target referred to here is potentially multiple Rust binaries to build and install, rather than multiple target tuples (it's a different duplication to the universal2 case).

jakelishman · 2026-02-26T15:26:31Z

Thanks for the tip on in-tree build backends - I hadn't come across those. If we went that route, I think there's more internal details of setuptools-rust that we'd want to expose as user-facing APIs (e.g. cargo_build_this() using our discovered environment for cargo, which presents structured data on the out directories, etc).

One thing I thought about after I wrote this is the name data-files: my initial use case is just C header files, but actually I quite likely want to code-gen a Python file containing ctypes bindings to the same C API, which is generated from the same structured data as the build script was already using. Perhaps generated-files is a better name?

This was referenced Feb 22, 2026

Add qiskit.capi module Qiskit/qiskit#15711

Open

Options for extracting data files from a RustExtension's build-script OUT_DIR #563

Open

davidhewitt reviewed Feb 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add installation of data-files build artifacts into Python package#574

Add installation of data-files build artifacts into Python package#574
jakelishman wants to merge 1 commit intoPyO3:mainfrom
jakelishman:data-files

jakelishman commented Feb 22, 2026

Uh oh!

davidhewitt left a comment

Uh oh!

davidhewitt Feb 26, 2026

Uh oh!

jakelishman Feb 26, 2026

Uh oh!

davidhewitt Feb 26, 2026

Uh oh!

jakelishman Feb 26, 2026

Uh oh!

jakelishman Feb 26, 2026 •

edited

Loading

Uh oh!

davidhewitt Feb 26, 2026

Uh oh!

jakelishman Feb 26, 2026

Uh oh!

davidhewitt Feb 26, 2026

Uh oh!

jakelishman Feb 26, 2026 •

edited

Loading

Uh oh!

jakelishman commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jakelishman commented Feb 22, 2026

Uh oh!

davidhewitt left a comment

Choose a reason for hiding this comment

Uh oh!

davidhewitt Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

jakelishman Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

davidhewitt Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

jakelishman Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

jakelishman Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidhewitt Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

jakelishman Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

davidhewitt Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

jakelishman Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jakelishman commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jakelishman Feb 26, 2026 •

edited

Loading

jakelishman Feb 26, 2026 •

edited

Loading