Generating Python bindings

We generate Python bindings using Pybind11. You can try using the latest version of it; or, at the time of writing this, the commit 741d86f is known to work. (Note that latest Pybind has a bug that makes it crash when exiting Python when binding enums, consider using this commit or older.)

You're expected to have a basic understanding of how to work with Pybind. I recommend going through the basic Pybind tutorial, compiling at least one test module, and making sure you can import it in Python.

Python modules are shared libraries (sometimes with a customized extension, typically .pyd instead of .dll on Windows, and .so elsewhere). When using Pybind, you create a .cpp file with a function that gets called on module import (using the PYBIND11_MODULE macro, the name passed to it must match the module filename minus the extension), where you register all your functions/classes using Pybind API (see the link above).

Pybind is a header-only library, so you'll need to clone it and add it to your include paths. You also need to add Python to the include paths (on Linux use pkg-config --cflags python3-embed or ... python-3.??-embed). You also need some platform-dependent linker flags: On Windows, you must link pythonXY.lib. On Linux, you don't link anything (and rely on the default behavior of not checking for undefined references when building shared libraries). On Mac, you don't link anything and additionally pass -Xlinker -undefined -Xlinker dynamic_lookup.

Modules generated with Pybind need to be recompiled on every OS, for every minor Python version (X.Y, e.g. 3.13) that you want to support. We have a Pybind fork that produces Python-version-agnostic modules, but I suggest getting the upstream version to work first, and then thinking about portability.

On Windows, the modules must be built using MSVC or Clang in MSVC-compatible mode (we only support the latter), the ones built with MinGW will not work with the official Python releases.

The basic idea

We don't generate Pybind code (C++ code calling Pybind) directly. Instead we generate target-language-agnostic macros.

E.g. given:

void foo();
void bar();

We generate something along the lines of:

#define MRBIND_HEADER
MB_FUNC(foo)
MB_FUNC(bar)

But much more complex.

Then for each target language using this format (currently only Python), we have a header that defines MB_FUNC() and other macros (in the case of Python they expand to Pybind calls). MRBIND_HEADER needs to be defined to the name of that header.

It seemed like a good idea at the time. (c)

The build process

Previously the parser output format was set to JSON for testing (-o parse_result.json), but the Python backend requires a different format, --format=macros. This generates a .cpp file. (See above for more info.)

Then you compile this file. Only the Clang compiler is supported, the same version that you used to compile MRBind (the generated code has some complex templating that other compilers may or may not choke on).

If the target library is compiled with a different compiler, and you get compatibility issues (undefined references), consult the ABI compatibility page.

Compile with the following flags:

All flags needed for Pybind, as explained earlier.
-std=c++20 or newer.
-I. — We're adding the current directory to the include search path, because at one point we need to do #include __FILE__. This becomes optional if you pass an absolute path to the source file to the compiler.
-DMRBIND_HEADER='<mrbind/targets/pybind11.h>' (Use the approproate quotes for your shell, '...' in Bash.)
-Ipath/to/mrbind/include to find the above file, where path/to/mrbind is a path to this repository.
-DMB_PB11_MODULE_NAME=MyModule — Set this to your module name. This should match the filename of the compiled module, minus the extension.
-DMB_DEFINE_IMPLEMENTATION — When compiling multiple source files, exactly one of them should have this defined. More on that below.
-DPYBIND11_COMPILER_TYPE=... -DPYBIND11_BUILD_ABI=... — To prevent cross-talk between your modules and those by other people, you should define both of those macros to your library name, preferably with a leading underscore, e.g. -DPYBIND11_COMPILER_TYPE='"_mylib"' -DPYBIND11_BUILD_ABI='"_mylib"'. (The double quotes are needed in the macro, and the single quotes here are the shell's quoting.)

Normally Pybind internals do cross-talk between modules that use the same compiler and ABI, but there's little reason do do this between modules from different vendors, and not disabling this has caused issues for us in the past.

The values of those macros are logged during compilation, as Pybind internals magic: ....
If the Clang comes from MSYS2, but you're building in MSVC-compatible mode, then --target=x86_64-pc-windows-msvc -rtlib=platform -D_DLL -D_MT (same as what you passed to the parser before), plus additionally in Release mode: -Xclang --dependent-lib=msvcrt or in Debug mode: -Xclang --dependent-lib=msvcrtd -D_DEBUG.

There are more optional knobs to tune, but this should work.

Try importing the resulting module and use help(...) and Tab to navigate around the contents.

Completeness of the bindings

Python bindings aren't checked for completeness neither on build nor when importing the Python module.

This means e.g. that if some of your functions accept a third-party type that's not in the bindings, you won't get any errors until you try to call them. (In those cases you'll notice that the help() pages for the offending functions include the C++ type names (for parameters and/or the return type), e.g. std::vector<Blah> instead of Python-ified mylib.std_vector_Blah).

A good way to check the bindings for completeness is generating stubs for them.

Generating stubs

Stubs are the .pyi files that Python IDEs use to provide code completion for modules. If you plan to distribute your modules, you should generate those.

Use the pybind11-stubgen utility to generate them, same as you would with pure Pybind11 bindings.

This also has a side effect of testing the bindings for completeness, pybind11-stubgen will complain if something is missing (but will still produce usable stubs, so this doesn't have to be a blocker).

RAM usage during compilation; compiling in multiple fragments

The generated .cpp file can be huge for large inputs, and compiling it can easily exhaust your RAM. The solution is compiling it in parts.

Define -DMB_NUM_FRAGMENTS=42 to the desired number of parts ("fragments"), then compile the .cpp file this many times with different values of -DMB_FRAGMENT=i (where i goes from 0 to N-1). If you take a look at the generated .cpp file, you'll see #ifdefs used to split the contents evenly between fragments.

Only one fragments (preferably the 0th one) should define -DMB_DEFINE_IMPLEMENTATION.

Then link the N resulting object files together into the final module.

Don't compile all fragments in parallel, as that will use as much RAM if not more. Compile them sequentally, a few at a time.

Also see Improving compilation time.

Multiple input headers

It is possible to run the parser several times, and then link together the results, But our testing has shown that parsing the entire input as one big header is faster, so processing each individual header like this isn't a good idea.

Each generated .cpp file can be compiled using a different number of fragments.

When linking together multiple generated files, only one fragment across all files must define -DMB_DEFINE_IMPLEMENTATION.

Tuning generated bindings

There are a few additional macros that you can define to tune the bindings:

-DMB_PB11_STRIPPED_NAMESPACES='"MyLib","MyLib.Nested"' — The default behavior is to copy the namespaces from the input. If your library uses namespace MyLib { void foo(); }, then in Python you'll get MyModule.MyLib.foo();

It's easy to see that having the top-level namespace (MyLib) in Python is usually pointless. This macro lets you remove it.

Set this to a comma-separate list of quoted namespace names. If you have to remove multiple nested namespaces, separate them with . instead of ::, as they're spelled in Python (see example above).

Here '...' is the shell's quoting, not a part of the syntax.
-DMB_PB11_ADJUST_NAMES='"..."' — This is closely related to the previous macro.

This one is used to tweak names like MyModule.std_vector_MyLib_Foo (originally std::vector<MyLib::Foo>) into e.g. MyModule.std_vector_Foo.

The macro takes a string literal, containing a ;-separated list of sed-style regex replacement rules, where each rule is s/A/B/g (replaces regex A with string B), or without g (to act only once instead of multiple times).

Those rules apply to C++ type names. For example, '"s/\\bMyLib:://g"' will strip the MyLib:: namespace, like in the example above.
-DMB_PB11_MERGE_STL_TL_EXPECTED — If you mix std::expected and tl::expected on different platforms (depending on what's available), defining this should make the names more consistent across platforms, by stripping std:: and tl:: from expected.
-DMB_PB11_ENABLE_CXX_STYLE_CONTAINER_METHODS — When binding the C++ standard containers, add some additional C++-style methods to them, in addition to the Python-style ones.
-DMB_PB11_MODULE_DEPS='"foo", "bar"' — Adding another Python modules as dependencies.

If you want to import another Python module at startup as a dependency, pass its name to this macro. It acceps a list of quoted module names. '...' here is the shell's quoting and not a part of the syntax.
Adding aliases — Python lets you add aliases for things like functions, types, and even class members, simply using what looks like variable assignment.

E.g. given struct Vec3 {float x, y, z;};, which binds to mylib.Vec3 in Python, you could do mylib.Vec3.foo = mylib.Vec3.x, and then foo would be usable as an alternative name for x in every instance of the class.

MRBind exposes a way to create those aliases, by calling MRBind::pb11::RegisterCustomAlias("alias", "target"); at startup. You can create a .cpp file with the following contents, and then compile it as a part of your module:
```
#include MRBIND_HEADER

static const auto MRBIND_UNIQUE_VAR = []{
    #define ALIAS(alias, target) MRBind::pb11::RegisterCustomAlias(#alias, #target)

    ALIAS( Vec3.foo, Vec3.x ); // Make `Vec3.foo` an alias for `Vec3.x`.
    ALIAS( Vec3.bar, Vec3.y ); // Make `Vec3.foo` an alias for `Vec3.x`.

    return nullptr;
}();
```
Notice that . is used as the separator here instead of ::, Python style. Also notice that the module name mylib. is omitted.

Improving compilation time

First of all, pass --combine-types=cv,ref,ptr,smart_ptr to the parser. This reduces the amount of duplicate work performed by different fragments by merging type information for similar types at parse time. This doesn't have any downsides, and just improves compilation time.
Make your big input header a PCH. Note that you typically don't want to feed the same PCH to the parser, even though that's technically possible, because you might want to define some macros to apply only to the parser or only to the compilation, and sharing a PCH prevents that.
-DMB_PB11_NO_REGISTER_TYPE_DEPS — This can dramatically reduce compilation time, at the cost of having to do more manual work.

Imagine the following input code:
```
std::vector<std::string> foo();
std::list<std::string> foo();
```
The job of generating the type bindings (std::vector<std::string>, std::list<std::string>) is split evenly across fragments. But the problem here is that std::string needs a binding too, and it doesn't appear standalone in this example, we don't know which fragment should handle it. The default behavior is to generate it in every fragment that needs it, which duplicates work between fragments at compile-time.

By defining this macro, none of the fragments will handle this type, which will give you a runtime error in Python when using those containers.

How is this not stupid?

First of all, this doesn't apply to parsed types, e.g. std::vector<MyLib::MyClass> will work fine. Only the types with handwritten bindings (such as the standard library classes) are affected.

Second of all, this is a very rare situation. Usually the element type will happen to be mentioned somewhere else standalone, so it will be handled by some other fragment.

And lastly, if you do get errors because of this, it's easy to manually poke the offending type to give it a binding. To do that, create an extra header (include it in the big one that you feed to the parser), and in it use the following macros:
```
#define FORCE_REGISTER_TYPE(...) using MR_CONCAT(_mrbind_inst_,__LINE__) __attribute__((__annotate__("mrbind::instantiate_only"))) = __VA_ARGS__
#define FORCE_REGISTER_PARAM_TYPE(...) __attribute__((__annotate__("mrbind::instantiate_only"))) void MR_CONCAT(_mrbind_inst_,__LINE__)(__VA_ARGS__)
#define FORCE_REGISTER_RETURN_TYPE(...) __attribute__((__annotate__("mrbind::instantiate_only"))) __VA_ARGS__ MR_CONCAT(_mrbind_inst_,__LINE__)()
```
Now having FORCE_REGISTER_TYPE(std::string); in that header would generate the binding for std::string even if it's not otherwise mentioned anywhere.

The reason why the other two macros exist is because some types behave differently in different contexts. E.g. using int * specifically as a function parameter generates a helper Python class named MyModule.int_output, so registering an int * parameter needs to be done via FORCE_REGISTER_PARAM_TYPE(int *).

Only the parser needs to see those macros. You can #ifdef them away for the compilation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generating Python bindings

The basic idea

The build process

Completeness of the bindings

Generating stubs

RAM usage during compilation; compiling in multiple fragments

Multiple input headers

Tuning generated bindings

Improving compilation time

FilesExpand file tree

generating_python.md

Latest commit

History

generating_python.md

File metadata and controls

Generating Python bindings

The basic idea

The build process

Completeness of the bindings

Generating stubs

RAM usage during compilation; compiling in multiple fragments

Multiple input headers

Tuning generated bindings

Improving compilation time