November 11, 2022 by Tobias Hunger
Rust and C++ Interoperability
In this blog post, I want to explore both directions of integration between Rust and C++ and present some tools we use in Slint.
This blog post is based on a presentation I gave at EuroRust 2022 in Berlin. Slides are available, as is the video recording.
Here at Slint we work on an UI toolkit written in Rust. A UI toolkit is useful for other languages and eco-systems in addition to the one it was written in, so Slint comes with C++ and even Javascript APIs. Those APIs must of course feel fully native to developers using those languages. For this reason we've a strong interest in how to provide native-feeling APIs to Rust code for users in the C++ world.
Slint can (optionally) make use of existing C++ code to integrate into the different operating system environments. This includes topics like widget styling, accessibility, and more. This is why we also care about exposing existing C++ code into the Rust world.
If you need an open source C or C++ library in your Rust project: Have a look at crates.io or lib.rs: Maybe somebody else has already done the work for you?
For readers with a C++ background
As a Rustacean I use "safe" in the Rust sense: Code is safe if the rust compiler has made sure all the properties needed to enforce memory safety are met. As the Rust compiler can not parse C++ code and check the properties there, all C++ code is unsafe by definition. This doesn't mean that the "unsafe" C++ code triggers undefined behavior or do invalid memory accesses, just that it could.
You don't need to know Rust for this post, but one concept you will run into is Rust macros. They are different from C macros. A Rust macro is a function written in Rust that accepts a stream of tokens as input and produces a stream of tokens as output. The compiler runs this function at compile time whenever it encounters the macro in code, passing in the current stream of tokens and replacing it by the generated stream. This mechanism makes for powerful macros that are still "hygienic": They won't change the meaning of code around them.
Language level integration
Let's first look at language level integration: How to make Rust call code written in C++ and the other way around.
The Rust compiler can not understand C++ code. This makes it necessary to tell the Rust compiler about code you want to use on the C++ side. A bit of glue code is needed: Language bindings. Bindings define functions and data types available on the C++ side in a way that the Rust compiler can understand. Once a binding is available, Rust code can use those bindings to call code on the C++ side. The same is of course also true in the other direction: The C++ compiler also needs language bindings to tell it about code available on the Rust side.
This means you can not mix and match C++ and Rust code, but need defined interfaces to cross from one language into the other.
Challenges
All we need to do is to generate some bindings and everything is smooth sailing from there on out. How hard can that be?
There are a number of challenges:
- The two languages we want to map to each other do have very different concepts. Rust has a different macro system than C++, C++ has inheritance, Rust uses a system of traits instead (where these two concept do not map directly to each other), Rust has life-times, something foreign to C++. C++ templates and Rust generics address similar problems, but approach them differently. All these mismatches makes it hard to map between the two languages.
- Rust does not have a defined Application Binary Interface (ABI): This means the Rust compiler is free to change how it represents data types or function calls in the binary output it generates. Of course that makes it challenging to exchange data in binary form. The situation on the C++ side isn't too different: The ABI is compiler defined. This is why you can not mix libraries generated with MSVC and GCC. The least common denominator is the C foreign function interface (FFI). This provides a stable binary interface, but it also limits the interface to what can be expressed in the C programming language. Despite this limitation, C FFI is the backbone most inter-language communication (not only between Rust and C++) is build upon.
-
Both languages have data types to express concepts like strings of text,
but the internal representation of these data types differ. For example
both languages offer a way to represent a dynamic sequence of elements
of the same type stored next to each other. That's
std::vector
in C++ orstd::Vec
in Rust. Both define a vector as a pointer to some memory, a capacity and a length. But what type does the pointer have? How does the data pointed to need to be aligned in memory? What type represents capacity and length? In which sequence are pointer, capacity and length stored? Any mismatch in these or other details makes it impossible to map one language's type to the other language conceptually similar type. - Even if the data structure happens to match: Different languages may have different requirements on the data stored in those data types. For example a string needs to be valid UTF-8 in Rust, while to C++ it's just a sequence of bytes - the programmer surely knows what encoding to used. This means it's always safe to pass a string from Rust to C++ (assuming all the little details about the string type in the standard libraries happen to match), but passing a string from C++ to Rust might trigger a panic.
- Another problem comes in the form of inlined code. This code isn't directly callable with just the binary. Instead it's inserted wherever the inlined code is used. This requires the compiler to be able to compile the code in question: The Rust compiler can obviously not inline C++ code and neither can the C++ compiler inline Rust code. This is a widely used technique: In C++ all templates are effectively inline code.
All this makes it hard to generate binding to mediate between Rust and C++.
Automatic binding generation
In an ideal world no bindings are needed. This is not possible for the combination of Rust and C++, so let's look at the next best thing: Generating binaries automatically from existing rust files or C++ header files. This is what automatic binding generation is about.
Even though it's hard to create good language bindings automatically, it's still valuable to have generators. They get you started. There are options for both directions: Making Rust code available to C++ as well as the other way around.
The most widely used binding generators are bindgen
and
cbindgen
.
bindgen
Bindgen parses header files and generates Rust bindings. This works well for C code, but is not perfect for C++ code. By default bindgen skip any construct it can not generate bindings for. This way it produces as many bindings as it can.
In practice bindgen needs configuration to work for any real world C++ project. You will include and exclude types as needed, or mark types as opaque: This mean they can be passed to Rust from C++, and back from Rust to C++, but the Rust side can not interact with those types in any way. You might need to add C(++) helper-functions that enable access to functionality not to visible to bindgen by default.
Typically bindgen is used to generate a low level crate (for C++ users: A
library in a package manager) with a name ending in
-sys
. -sys
crates tend to be full of
unsafe
calls into the C or C++ library they wrap.
Since Rust is all about building safe wrappers around unsafe code, you
typically write another crate with safe wrappers around the
-sys
crate, which then drops the -sys
suffix
from its name.
Note that the process isn't unlike how C++ developers provide safe
wrappers around C libraries. Of course the -sys
-level is not
needed there as C++ can just consume the C headers directly.
cbindgen
Cbindgen covers the other direction: It parses Rust code and generates C or C++ headers from it.
Cbindgen looks at code specifically marked up by a developer as compatible
with the C FFI interface using the #[repr(C)]
attribute.
Typically developers create a module (often called ffi
) in
their Rust project and collect all the #[repr(C)]
they want
to expose in this module. This process isn't unlike how C++ developers
write a C-level interface to their C++ code.
When to use binding generators
Binding generators work best when you have code with a stable interface in one language and want to make that code available to the other language. Typically the code exists in the form of a library.
This is how we use binding generation in Slint: We generate bindings from our stable Rust API. We then extend the generated code on the C++ side to make the code nicer to interact with from C++, (partially) hiding the generated code behind a hand-crafted facade.
How to use binding generators
Binding generators can be run once and have the generated bindings put under version control. This only works reliably though for code with very stable interfaces.
Binding generators should generate bindings at build time. This does of course require integration into the build system of choice.
Semi-automatic binding generation
Semi-automatic binding generation works by having one custom piece of code or configuration to define an interface between two languages. This is then turned into a set of bindings for both Rust and C++, on top of an automatically generated C FFI interface hidden between the set of bindings.
The advantage is that more abstraction on top of the C FFI interface are possible, making the generated bindings more comfortable to use.
The cxx
crate
A popular option is the
cxx crate. Other options exist and
either build on top of cxx
or offer similar functionality.
cxx
promises safe and fast bindings.
The safety is limited to the bindings themselves: The code called trough those bindings is of course still unsafe. This is a nice property, as you can be sure that the generated code isn't introducing problems of its own. You can concentrate on debugging the "other side" of the bindings instead of looking into the generated code.
To ensure the bindings safety, cxx
generates static asserts
and checks function and type signatures.
To keep the bindings fast, cxx
makes sure there is no copy of
data done in the binding - nor is there any conversion. This leads to
types from one language bleeding into the other. For example a
std::string
on the C++ side turns into a
CxxString
in Rust. This makes the generated binding feel
foreign to developers.
How does this look like? You need to have a module in your Rust code that
defines both sides of the interface. Here is an example taken from the
documentation of cxx
:
#[cxx::bridge]
mod ffi {
struct Metadata {
size: usize,
tags: Vec<String>,
}
extern "Rust" {
type MultiBuf;
fn next_chunk(buf: &mut MultiBuf) -> &[u8];
}
unsafe extern "C++" {
include!("demo/include/blob_store.h");
type Client;
fn new_client() -> UniquePtr<Client>;
fn put(&self, parts: &mut MultiBuf) -> u64;
}
}
-
You need to mark the module with
#[cxx::bridge]
. This triggers a Rust macro to process this code. Inside the module (calledffi
in this case), data types available to both C++ and Rust get defined. -
A
extern "Rust"
section is next. This lists types and functions defined on the Rust side that should be exposed to C++.cxx
notices that the first argument tonext_chunk
is a mutable reference to theMultiBuf
data type. It modelsMultiBuf
as a class on the C++ side and makesnext_chunk
a member of that class. -
A
unsafe extern "C++"
section defines data types and functions available on the C++ side, which should be usable from Rust.cxx
looks for information relevant to Rust here: You need to express life time information as well as whether a function is safe to call or not. In this case bothnew_client
andput
are safe. This information is relevant for the Rust side but has no effect on the C++ code that gets wrapped.
When to use cxx
?
It work best when you can control both sides of the API. For example when
you want to factor out some code from an existing C++ implementation into
new library written in Rust. cxx
is ideal here since defines
a matching set of bindings and the C FFI interface between them in one go.
Don't generate bindings
A third option is to use the
cpp crate in Rust to write C++
code inline. Let's look at a (shortened) Rust member function
notify
, taken from Slint source code:
fn notify(&self) {
let obj = self.obj;
cpp!(unsafe [obj as "Object*"] {
auto data = queryInterface(obj)->data();
rust!(rearm [data: Pin<&A18yItemData> as "void*"] {
data.arm_state_tracker();
});
updateA18y(Event(obj));
});
}
When I first saw this in Rust code, it blew my mind. What does this piece of code do?
-
A local variable
obj
, holding a reference to a member variableobj
(of type&c_void
) is created. -
The
cpp!
macro (all callable macros in Rust end in `!`) processes all the code till the closing parenthesis at the end of thenotify
function.This macro implicitly declares an
unsafe
C++ function returningvoid
, which takes one argument calledobj
of typeObject*
. The macro expectobj
to be defined in the surrounding Rust code. The body of this C++ function is the code between the curly braces. -
While in the C++ world, we interact with
obj
to extract some information which we then store into a local variabledata
. Thisdata
is of course only visible inside the C++ function we just have defined implicitly. The surrounding Rust code can not see it. -
In the next line we use the
rust!
(pseudo-)macro. This switches back into the Rust language. -
This
rust!
macro creates another (rust) function calledrearm
, which will take a argumentdata
of typePin<A18yItemData>
. This argument must exist in the surrounding C++ code and we expect it to have a type ofvoid*
there. We need to give type definitions for both C++ and Rust here as thecpp
crate can unfortunately not find the type on the C++ side. The body of that Rust function will containdata.arm_state_tracker();
and will returnvoid
. It will also create the necessary bindings to call the newrearm
function from C++. Once therust!
pseudo-macro has generated this code, it will replace itself with C++ code calling therearm
function through the generated C++ bindings. -
Back in the C++ function created by the
cpp
, we call have some more C++ codeupdateA11y(Event(obj));
and reach the end of the body of the implicitly created C++ function. Once thecpp
macro has generated all its code, it replaces itself with a call to the C++ function it generated via the Rust binding it created for it.
After all the macros are expanded, we have two new functions generated,
including the necessary bindings to call them. The final
notify
function seen by the Rust compiler is just the
definition of the obj
variable followed by a call to some
binding taking this obj
as argument.
This approach doesn't avoid the generation of bindings, so the title of
this section is misleading. It handles a big part of the binding
generation implicitly. Of course you still need to generate bindings for
data types you want to access in both Rust and C++. The
cpp
crate has more macros to help with that.
How does this work?
The macros shipped by the cpp
crate do generate all the code.
You do need build system integration to build and link the generated C++
code.
When to use the cpp
crate?
In Slint we use the cpp crate to interact with C++ GUI toolkits that have a stable API. It works great for this use case.
Summary
You have a wide range of options to integrate C++ and Rust code, but you always need to generate language bindings. This indirection avoids a tight coupling between the languages and opens up more design spaces for Rust to explore, but it also makes a seamless integration of Rust and C++ impossible.
Build system integration
Once you have a project that combines Rust and C++ code, you need to build both the Rust and the C++ parts, and merge both together into one consistent binary. Let's take a short look at what's necessary to build a cross-language project.
cargo
, the official Rust build system, is the only supported way to build Rust
code. You have a build system for your C++ code. Typically that build
system isn't trivial, don't try to reimplement it in cargo
.
Integrate the two build systems with each other instead.
Let's start by looking at Cargo
Cargo
Having Cargo as the main build tool driving your project build is great if you have a little C++ code in a bigger Rust context. The typical use cases is generating bindings around C and C++ code.
Cargo can run arbitrary code at build time. It looks for a file called
build.rs
next to the Cargo.toml
file. If a
build.rs
file exists, cargo builds and executes this file in the build process. The
build.rs file can inform the rest of the build process by printing
instructions to cargo on stdout. Check the cargo documentation for
details.
build.rs
is a normal Rust code and may use any crate
specified as a build-dependency
in the
Cargo.toml
file!
When working with C and C++ code the
cc crate is interesting. It allows
to drive a C or C++ compiler from within build.rs
. This is
ideal to build a few simple files. For bigger C or C++ projects you
probably want to run the projects build system directly. The
cmake crate comes in handy here.
It drives the typical CMake configure, build, install workflow and exposes
the CMake build targets to cargo afterwards.
Other build systems have similar support crates or can be driven via a lower level crates to run arbitrary commands like xshell.
CMake
I use CMake as one example of a build system widely used for C and C++ projects. Similar support is available for other build tools, some even claim to support Rust natively -- often by running the rust compiler directly (unsupported by Rust!).
Using the existing C++ build system to drive the entire build is ideal when you have a little Rust code in a bigger C++ project. A typical use case is replacing some small part of a project with code written in Rust or using a Rust library.
The
corrosion
projects provides cargo integration into CMake. A simple
CMakeLists.txt
file building a Rust example library and
linking to it would look like this:
cmake_minimum_required(VERSION 3.15)
project(MyCoolProject LANGUAGES CXX)
find_package(Corrosion REQUIRED)
corrosion_import_crate(MANIFEST_PATH rust-lib/Cargo.toml)
add_executable(cpp-exe main.cpp)
target_link_libraries(cpp-exe PUBLIC rust-lib)
- You start out with the usual two lines in any CMake project, defining the minimum CMake version required to build the project followed by the project name and the programming languages CMake needs to build. Note that you don't mention Rust there.
-
The
find_package(Corrosion REQUIRED)
line asks CMake to include the Corrosion support and fail if it isn't found. You could also useFetchContent
to download Corrosion as part of your build instead. -
Now that corrosion is available, you can ask it to build Rust code using
corrosion_import_crate
, pointing it to an existingCargo.toml
file. Corrosion builds this Rust project and exposes all build targets to CMake. - The last two lines in the example build a C++ binary file and link it to the Rust code.
Slint uses the Corrosion project to enable C++ developers to use the Slint library in C++ code without having to bother with Rust too much.
I hope this gives you a good starting place for your project integrating C++ and Rust code - or at you found some option you weren't aware of before. Please feel free to reach out with questions in the discussion on github.
Slint is a Rust-based toolkit for creating reactive and fluent user interfaces across a range of targets, from embedded devices with limited resources to powerful mobile devices and desktop machines. Supporting Android, Windows, Mac, Linux, and bare-metal systems, Slint features an easy-to-learn domain-specific language (DSL) that compiles into native code, optimizing for the target device's capabilities. It facilitates collaboration between designers and developers on shared projects and supports business logic development in Rust, C++, JavaScript, or Python.