We generate Python bindings using Pybind11. You can try using the latest version of it; or, at the time of writing this, the commit 741d86f is known to work. (Note that latest Pybind has a bug that makes it crash when exiting Python when binding enums, consider using this commit or older.)
You're expected to have a basic understanding of how to work with Pybind. I recommend going through the basic Pybind tutorial, compiling at least one test module, and making sure you can import it in Python.
Python modules are shared libraries (sometimes with a customized extension, typically .pyd instead of .dll on Windows, and .so elsewhere). When using Pybind, you create a .cpp file with a function that gets called on module import (using the PYBIND11_MODULE macro, the name passed to it must match the module filename minus the extension), where you register all your functions/classes using Pybind API (see the link above).
Pybind is a header-only library, so you'll need to clone it and add it to your include paths. You also need to add Python to the include paths (on Linux use pkg-config --cflags python3-embed or ... python-3.??-embed). You also need some platform-dependent linker flags: On Windows, you must link pythonXY.lib. On Linux, you don't link anything (and rely on the default behavior of not checking for undefined references when building shared libraries). On Mac, you don't link anything and additionally pass -Xlinker -undefined -Xlinker dynamic_lookup.
Modules generated with Pybind need to be recompiled on every OS, for every minor Python version (X.Y, e.g. 3.13) that you want to support. We have a Pybind fork that produces Python-version-agnostic modules, but I suggest getting the upstream version to work first, and then thinking about portability.
On Windows, the modules must be built using MSVC or Clang in MSVC-compatible mode (we only support the latter), the ones built with MinGW will not work with the official Python releases.
We don't generate Pybind code (C++ code calling Pybind) directly. Instead we generate target-language-agnostic macros.
E.g. given:
void foo();
void bar();We generate something along the lines of:
#define MRBIND_HEADER
MB_FUNC(foo)
MB_FUNC(bar)But much more complex.
Then for each target language using this format (currently only Python), we have a header that defines MB_FUNC() and other macros (in the case of Python they expand to Pybind calls). MRBIND_HEADER needs to be defined to the name of that header.
It seemed like a good idea at the time. (c)
Previously the parser output format was set to JSON for testing (-o parse_result.json), but the Python backend requires a different format, --format=macros. This generates a .cpp file. (See above for more info.)
Then you compile this file. Only the Clang compiler is supported, the same version that you used to compile MRBind (the generated code has some complex templating that other compilers may or may not choke on).
If the target library is compiled with a different compiler, and you get compatibility issues (undefined references), consult the ABI compatibility page.
Compile with the following flags:
-
All flags needed for Pybind, as explained earlier.
-
-std=c++20or newer. -
-I.— We're adding the current directory to the include search path, because at one point we need to do#include __FILE__. This becomes optional if you pass an absolute path to the source file to the compiler. -
-DMRBIND_HEADER='<mrbind/targets/pybind11.h>'(Use the approproate quotes for your shell,'...'in Bash.) -
-Ipath/to/mrbind/includeto find the above file, wherepath/to/mrbindis a path to this repository. -
-DMB_PB11_MODULE_NAME=MyModule— Set this to your module name. This should match the filename of the compiled module, minus the extension. -
-DMB_DEFINE_IMPLEMENTATION— When compiling multiple source files, exactly one of them should have this defined. More on that below. -
-DPYBIND11_COMPILER_TYPE=... -DPYBIND11_BUILD_ABI=...— To prevent cross-talk between your modules and those by other people, you should define both of those macros to your library name, preferably with a leading underscore, e.g.-DPYBIND11_COMPILER_TYPE='"_mylib"' -DPYBIND11_BUILD_ABI='"_mylib"'. (The double quotes are needed in the macro, and the single quotes here are the shell's quoting.)Normally Pybind internals do cross-talk between modules that use the same compiler and ABI, but there's little reason do do this between modules from different vendors, and not disabling this has caused issues for us in the past.
The values of those macros are logged during compilation, as
Pybind internals magic: .... -
If the Clang comes from MSYS2, but you're building in MSVC-compatible mode, then
--target=x86_64-pc-windows-msvc -rtlib=platform -D_DLL -D_MT(same as what you passed to the parser before), plus additionally in Release mode:-Xclang --dependent-lib=msvcrtor in Debug mode:-Xclang --dependent-lib=msvcrtd -D_DEBUG.
There are more optional knobs to tune, but this should work.
Try importing the resulting module and use help(...) and Tab to navigate around the contents.
Python bindings aren't checked for completeness neither on build nor when importing the Python module.
This means e.g. that if some of your functions accept a third-party type that's not in the bindings, you won't get any errors until you try to call them. (In those cases you'll notice that the help() pages for the offending functions include the C++ type names (for parameters and/or the return type), e.g. std::vector<Blah> instead of Python-ified mylib.std_vector_Blah).
A good way to check the bindings for completeness is generating stubs for them.
Stubs are the .pyi files that Python IDEs use to provide code completion for modules. If you plan to distribute your modules, you should generate those.
Use the pybind11-stubgen utility to generate them, same as you would with pure Pybind11 bindings.
This also has a side effect of testing the bindings for completeness, pybind11-stubgen will complain if something is missing (but will still produce usable stubs, so this doesn't have to be a blocker).
The generated .cpp file can be huge for large inputs, and compiling it can easily exhaust your RAM. The solution is compiling it in parts.
Define -DMB_NUM_FRAGMENTS=42 to the desired number of parts ("fragments"), then compile the .cpp file this many times with different values of -DMB_FRAGMENT=i (where i goes from 0 to N-1). If you take a look at the generated .cpp file, you'll see #ifdefs used to split the contents evenly between fragments.
Only one fragments (preferably the 0th one) should define -DMB_DEFINE_IMPLEMENTATION.
Then link the N resulting object files together into the final module.
Don't compile all fragments in parallel, as that will use as much RAM if not more. Compile them sequentally, a few at a time.
Also see Improving compilation time.
It is possible to run the parser several times, and then link together the results, But our testing has shown that parsing the entire input as one big header is faster, so processing each individual header like this isn't a good idea.
Each generated .cpp file can be compiled using a different number of fragments.
When linking together multiple generated files, only one fragment across all files must define -DMB_DEFINE_IMPLEMENTATION.
There are a few additional macros that you can define to tune the bindings:
-
-DMB_PB11_STRIPPED_NAMESPACES='"MyLib","MyLib.Nested"'— The default behavior is to copy the namespaces from the input. If your library usesnamespace MyLib { void foo(); }, then in Python you'll getMyModule.MyLib.foo();It's easy to see that having the top-level namespace (
MyLib) in Python is usually pointless. This macro lets you remove it.Set this to a comma-separate list of quoted namespace names. If you have to remove multiple nested namespaces, separate them with
.instead of::, as they're spelled in Python (see example above).Here
'...'is the shell's quoting, not a part of the syntax. -
-DMB_PB11_ADJUST_NAMES='"..."'— This is closely related to the previous macro.This one is used to tweak names like
MyModule.std_vector_MyLib_Foo(originallystd::vector<MyLib::Foo>) into e.g.MyModule.std_vector_Foo.The macro takes a string literal, containing a
;-separated list ofsed-style regex replacement rules, where each rule iss/A/B/g(replaces regexAwith stringB), or withoutg(to act only once instead of multiple times).Those rules apply to C++ type names. For example,
'"s/\\bMyLib:://g"'will strip theMyLib::namespace, like in the example above. -
-DMB_PB11_MERGE_STL_TL_EXPECTED— If you mixstd::expectedandtl::expectedon different platforms (depending on what's available), defining this should make the names more consistent across platforms, by strippingstd::andtl::fromexpected. -
-DMB_PB11_ENABLE_CXX_STYLE_CONTAINER_METHODS— When binding the C++ standard containers, add some additional C++-style methods to them, in addition to the Python-style ones. -
-DMB_PB11_MODULE_DEPS='"foo", "bar"'— Adding another Python modules as dependencies.If you want to import another Python module at startup as a dependency, pass its name to this macro. It acceps a list of quoted module names.
'...'here is the shell's quoting and not a part of the syntax. -
Adding aliases — Python lets you add aliases for things like functions, types, and even class members, simply using what looks like variable assignment.
E.g. given
struct Vec3 {float x, y, z;};, which binds tomylib.Vec3in Python, you could domylib.Vec3.foo = mylib.Vec3.x, and thenfoowould be usable as an alternative name forxin every instance of the class.MRBind exposes a way to create those aliases, by calling
MRBind::pb11::RegisterCustomAlias("alias", "target");at startup. You can create a .cpp file with the following contents, and then compile it as a part of your module:#include MRBIND_HEADER static const auto MRBIND_UNIQUE_VAR = []{ #define ALIAS(alias, target) MRBind::pb11::RegisterCustomAlias(#alias, #target) ALIAS( Vec3.foo, Vec3.x ); // Make `Vec3.foo` an alias for `Vec3.x`. ALIAS( Vec3.bar, Vec3.y ); // Make `Vec3.foo` an alias for `Vec3.x`. return nullptr; }();
Notice that
.is used as the separator here instead of::, Python style. Also notice that the module namemylib.is omitted.
-
First of all, pass
--combine-types=cv,ref,ptr,smart_ptrto the parser. This reduces the amount of duplicate work performed by different fragments by merging type information for similar types at parse time. This doesn't have any downsides, and just improves compilation time. -
Make your big input header a PCH. Note that you typically don't want to feed the same PCH to the parser, even though that's technically possible, because you might want to define some macros to apply only to the parser or only to the compilation, and sharing a PCH prevents that.
-
-DMB_PB11_NO_REGISTER_TYPE_DEPS— This can dramatically reduce compilation time, at the cost of having to do more manual work.Imagine the following input code:
std::vector<std::string> foo(); std::list<std::string> foo();
The job of generating the type bindings (
std::vector<std::string>,std::list<std::string>) is split evenly across fragments. But the problem here is thatstd::stringneeds a binding too, and it doesn't appear standalone in this example, we don't know which fragment should handle it. The default behavior is to generate it in every fragment that needs it, which duplicates work between fragments at compile-time.By defining this macro, none of the fragments will handle this type, which will give you a runtime error in Python when using those containers.
How is this not stupid?
First of all, this doesn't apply to parsed types, e.g.
std::vector<MyLib::MyClass>will work fine. Only the types with handwritten bindings (such as the standard library classes) are affected.Second of all, this is a very rare situation. Usually the element type will happen to be mentioned somewhere else standalone, so it will be handled by some other fragment.
And lastly, if you do get errors because of this, it's easy to manually poke the offending type to give it a binding. To do that, create an extra header (include it in the big one that you feed to the parser), and in it use the following macros:
#define FORCE_REGISTER_TYPE(...) using MR_CONCAT(_mrbind_inst_,__LINE__) __attribute__((__annotate__("mrbind::instantiate_only"))) = __VA_ARGS__ #define FORCE_REGISTER_PARAM_TYPE(...) __attribute__((__annotate__("mrbind::instantiate_only"))) void MR_CONCAT(_mrbind_inst_,__LINE__)(__VA_ARGS__) #define FORCE_REGISTER_RETURN_TYPE(...) __attribute__((__annotate__("mrbind::instantiate_only"))) __VA_ARGS__ MR_CONCAT(_mrbind_inst_,__LINE__)()
Now having
FORCE_REGISTER_TYPE(std::string);in that header would generate the binding forstd::stringeven if it's not otherwise mentioned anywhere.The reason why the other two macros exist is because some types behave differently in different contexts. E.g. using
int *specifically as a function parameter generates a helper Python class namedMyModule.int_output, so registering anint *parameter needs to be done viaFORCE_REGISTER_PARAM_TYPE(int *).Only the parser needs to see those macros. You can
#ifdefthem away for the compilation.