Add installation of data-files build artifacts into Python package#574
Add installation of data-files build artifacts into Python package#574jakelishman wants to merge 1 commit intoPyO3:mainfrom
Conversation
This allows a `RustExtension` to "own" some data files, produced programmatically as part of its `build.rs` script, and for these to be installed along with the built extension module somewhere into the Python package tree. The motivating example use is an extension module that provides a C API, and generates the header file needed to access it and a function-pointer table in a `PyCapsule` as part of its build script; the header file ideally should be installed as a regular file inside the Python wheel, so the package can be used as a build dependency.
davidhewitt
left a comment
There was a problem hiding this comment.
Thanks, a bunch of thoughts posted as comments below. Good to see this can all come together though I guess we need to think a bit about the config.
I wonder, an alternative design I could think of which would give user more control would be to maybe use an "in-tree" build backend https://peps.python.org/pep-0517/#in-tree-build-backends - maybe we could define some public API "hooks" into setuptools-rust build which in-tree backends could use to e.g. parse the cargo messages themselves and do custom logic? That might be more flexible than trying to fit this option directly into setuptools-rust itself...
| # Keys correspond to files/directories in the Rust extension's build directory `OUT_DIR`. | ||
| # Values are Python packages that the corresponding file or directory should be placed inside. | ||
| "my_file.txt" = "data_files" | ||
| "dir" = "data_files._data" |
There was a problem hiding this comment.
I wonder if the value should be the final filename inside the directory? e.g. data_files/my_file.txt, data_files/data/dir. Not sure about the Python dotted path syntax here.
There was a problem hiding this comment.
I agree it's a bit weird as-is. I'd prototyped it this way because "find the installation directory of this Python package" is a built-in function to the setuptools machinery, and I was concerned that if I did it in terms of file structure, it would be harder to associate the relative position without at least having to first split off the top level into an implicit Python package to locate the installation location.
| [[tool.setuptools-rust.ext-modules]] | ||
| target = "data_files._lib" | ||
| [tool.setuptools-rust.ext-modules.data-files] |
There was a problem hiding this comment.
Possible sticking point here - ext-modules is a list, but ext-modules.data-files is a table? How does this interact if there are multiple ext-modules?
There was a problem hiding this comment.
I'm relatively sure that the data-files table attaches to the most recent entry in the ext-modules list - I think this is just the single-item case of it.
I actually wanted to write this example as
[[tool.setuptools-rust.ext-modules]]
target = "data_files._lib"
data-files = {
"my_file.txt" = "data_files",
"dir" = "data_files._data",
}but using an inline table with linebreaks inside it only arrived in TOML 1.1 late last year, and the Python tomllib doesn't handle it yet.
There was a problem hiding this comment.
Sorry, to finish the thought - so the options you can write in TOML 1.0 today are
[[tool.setuptools-rust.ext-modules]]
target = "pkg_a"
data-files = { "my_file.txt" = "pkg_a", "my_file2.txt" = "pkg_a.sub" }
[[tool.setuptools-rust.ext-modules]]
target = "pkg_b"
data-files = { "file3" = "pkg_b" }or
[[tool.setuptools-rust.ext-modules]]
target = "pkg_a"
[tool.setuptools-rust.ext-modules.data-files]
"my_file.txt" = "pkg_a"
"my_file2.txt" = "pkg_a.sub"
[[tool.setuptools-rust.ext-modules]]
target = "pkg_b"
[tool.setuptools-rust.ext-modules.data-files]
"file3" = "pkg_b"and these parse to the same thing:
import tomllib
a = """
[[tool.setuptools-rust.ext-modules]]
target = "pkg_a"
[tool.setuptools-rust.ext-modules.data-files]
"my_file.txt" = "pkg_a"
"my_file2.txt" = "pkg_a.sub"
[[tool.setuptools-rust.ext-modules]]
target = "pkg_b"
[tool.setuptools-rust.ext-modules.data-files]
"file3" = "pkg_b"
"""
b = """
[[tool.setuptools-rust.ext-modules]]
target = "pkg_a"
data-files = { "my_file.txt" = "pkg_a", "my_file2.txt" = "pkg_a.sub" }
[[tool.setuptools-rust.ext-modules]]
target = "pkg_b"
data-files = { "file3" = "pkg_b" }
"""
assert tomllib.loads(a) == tomllib.loads(b)| universal2_data_files_from: If there are ``data_files`` to copy over during a | ||
| ``universal2`` build, take them from this location. By default, this uses | ||
| only the AArch64 build. |
There was a problem hiding this comment.
I'm unsure how I feel about this; it seems to me that content generated by the build.rs is likely to be target-specific. A conservative option might be to refuse to build multiple targets if data_files is set, i.e. require thin macOS builds for each platform. (Not sure how this interacts with multiple ext-modules.)
There was a problem hiding this comment.
I'm quite content with forbidding universal2 builds entirely, or at least until there's a further user request for it.
Fwiw, in my particular use-case, the build files aren't target-specific (they're platform-agnostic header files), but that's just one case.
| if self.data_files and len(self.target) > 1: | ||
| raise ValueError( | ||
| "using 'data_files' with multiple targets is not supported" | ||
| ) |
There was a problem hiding this comment.
Exactly this makes sense to me, we should check how the config interacts with multiple ext-modules, and I'd be in favour of disallowing universal2 with data files unless someone has a really good reason why they need it.
There was a problem hiding this comment.
I will check the multiple ext-modules, but I don't expect that particular one to be a problem, other than the user defining files that conflict between RustExtensions and making themselves build-order dependent. I don't think we could meaningfully restrict data-files across different RustExtension definitions because they don't see each other.
Just to make sure (because I got mixed up here too) - the self.target referred to here is potentially multiple Rust binaries to build and install, rather than multiple target tuples (it's a different duplication to the universal2 case).
|
Thanks for the tip on in-tree build backends - I hadn't come across those. If we went that route, I think there's more internal details of One thing I thought about after I wrote this is the name |
This allows a
RustExtensionto "own" some data files, produced programmatically as part of itsbuild.rsscript, and for these to be installed along with the built extension module somewhere into the Python package tree. The motivating example use is an extension module that provides a C API, and generates the header file needed to access it and a function-pointer table in aPyCapsuleas part of its build script; the header file ideally should be installed as a regular file inside the Python wheel, so the package can be used as a build dependency.I've written in a basic example to use as a test case because I didn't want to just to immediately writing something like the
cbindgen-based setup that motivated this work from my side. I got repeatedly bitten by TOML 1.0 (what Python'stomllibstdlib library can handle) not supporting newlines in inline tables when trying to write thedata-fileskey! But happily there's syntax that works, even with the array-of-tables form there.I'm not convinced by the logic around
universal2handling, but I also can't entirely see the use case for generated files and universal2 other than only needing one copy, so maybe what I've done is overkill.I was able to use this logic entirely successfully with my own downstream package (Qiskit/qiskit#15711, though the code there is much more involved).
Close #563.