Swift Regex Deep Dive
iOS MacOur introductory guide to Swift Regex. Learn regular expressions in Swift including RegexBuilder examples and strongly-typed captures.
Editor’s note: This is the third post in our series on building an iOS app in Rust.
Welcome to Part 3 of our “Building an iOS App in Rust” series. This post
continues to build on the tools and details we set up in Part 1
and Part 2, so please refer back to those as needed. Part 1
walks you through setting up a cross-compiler built from the latest “unstable”
Rust compiler. The code in in this post uses a feature that was only recently
stabilized, so if you are following along with your own compiler toolchain,
note that you’ll need Rust 1.4 or newer.
Another major change is that we’re now using Swift 2, which added the
capability to pass Swift functions to C APIs that wanted function pointers. We no longer need to drop down into Objective-C to wrap our Rust code.
In Part 2, we ended with passing strings between Rust and Swift. Our
implementation worked but was not ideal in either direction:
String
into Rust via the c_string_to_rust
function,get_string_from_rust
Both of these issues would be complete showstoppers for a real application. We
need a way to pass much more complex data than strings, and we’ll need to be
able to correctly and safely manage the ownership of that data.
As in Part 2, we will not be going over every line of code in detail, but all
of the code for this post is available on GitHub.
Rust is a great language for a number of reasons, but the one feature where it
really shines is how it deals with ownership (who is responsible for
allocating and deallocating resources held by an instance) and borrowing (loaning out temporary access to an instance without giving away ownership. Much
has been written about this, with more detail than I can offer here. If you
want to learn more, check out the excellent Fearless Concurrency
with Rust blog post as well as the Rust book’s sections
on Ownership, References and
Borrowing and Lifetimes.
In pure Rust, the compiler makes sure you obey ownership rules that ensure all
resources will be cleaned up at the proper time. The same ownership rules
provide thread safety guarantees that we’ll discuss in a later post. When we’re
working with the foreign function interface (FFI) layer, we have to be much
more careful, as we’re interacting with a system outside of the Rust compiler’s
knowledge, so we don’t get nearly as much help.
The most basic things we want to be able to understand are (a) how long will an
object we’re passing between languages be valid (e.g., in
c_string_to_rust
, the pointer is valid only for the function call), and (b)
who is responsible for cleaning it up (e.g., in c_string_to_rust
, Swift will
free the string’s memory). For the rest of this post, we’re going to look at
how to pass ownership of an object across the boundary: creating an object in
Rust and giving responsibility for its cleanup to Swift and vice versa.
Most of the Rust types we’re going to create will not boil down to a primitive
or even a simple structure like RustByteSlice
. For example, we could build a
Rust structure like this, which represents a label and some associated data:
// Automatically derive an implementation of the `Debug` trait so we can
// print instances of NamedData for debugging purposes.
#[derive(Debug)]
pub struct NamedData {
name: String,
data: Vec<i32>,
}
When we wrote the C header file for RustByteSlice
, we put the C-compatible
definition of RustByteSlice
directly in the header. We can’t do that here,
though: we don’t know what the in-memory layout of NamedData
is, because
we don’t know what the internal data layouts of String
or Vec<i32>
are (nor
should we need to). If we want to create instances of this type and give them
to Swift, we can create a C interface for creation and destruction like this:
// Forward declare a struct but never specify its fields.
// We will only work with pointers to named_data.
struct named_data;
// Create a new instance of `named_data`.
// The caller is responsible for passing the returned pointer to
// named_data_destroy, or memory will be leaked.
struct named_data *named_data_new(void);
// Free a `named_data` instance returned by `named_data_new`.
void named_data_destroy(struct named_data *data);
The Swift side can’t use a struct named_data *
directly; its bridged type,
COpaquePointer
, makes that obvious. We’ll need to add functions to access the
properties we want to expose:
// Get the name of a `named_data`. The returned byte slice is valid until
// the `named_data` instance is destroyed.
struct RustByteSlice named_data_get_name(const struct named_data *data);
// Get the number of elements stored in `data`.
size_t named_data_count(const struct named_data *data);
On the Rust side, let’s start by implementing the Drop
trait for NamedInit
.
The drop
function is called when a value goes out of scope; it’s analogous to
a destructor in C++ or deinit
in Swift (although any value in Rust can
implement Drop
, whereas there’s no way to have a deinit
for Swift structs
or enums). We don’t actually need to do anything, but we can add a print
statement for our own understanding:
impl Drop for NamedData {
fn drop(&mut self) {
println!("{:?} is being deallocated", self);
}
}
To implement named_data_new
, we need to create an instance of NamedData
on the heap (if we created it on the stack, it would be destroyed as soon as
named_data_new
returns), then return a raw pointer to the instance. Rust’s
standard library provides the Box
type for heap-allocating instances.
Typically Rust’s RAII semantics would cause a Box-allocated instance
to be deallocated when the box goes out of scope, but we can use
Box::into_raw
to cause the instance to be forgotten by the RAII system.
#[no_mangle]
pub extern fn named_data_new() -> *mut NamedData {
// Create an instance of NamedData.
let named_data = NamedData{
name: "some data".to_string(),
data: vec![1, 2, 3, 4, 5],
};
// Put named_data into a Box, which moves it onto the heap.
let boxed_data = Box::new(named_data);
// Convert our Box<NamedData> into a *mut NamedData. Rust is no longer
// managing the destruction of boxed_data; we must (at some point in the
// future) convert this pointer back into a Box<NamedData> so it can be
// deallocated.
Box::into_raw(boxed_data)
}
This will leak memory if we don’t later give the instance back to a Rust Box
for deallocation, but that’s exactly what we want! We created a heap-allocated
NamedData
, got a pointer to it that we can give to Swift, and Rust will not
deallocate it out from under us. We can implement named_data_destroy
that
will take a raw pointer to a NamedData
, put it back into a Box
, and let
that Box
fall out of scope, causing the instance to be deallocated.
#[no_mangle]
pub unsafe extern fn named_data_destroy(data: *mut NamedData) {
// Convert a *mut NamedData back into a Box<NamedData>.
// This function is unsafe because the Rust compiler can't know
// whether data is actually pointing to a boxed NamedData.
//
// Note that we don't actually have to do anything else or even
// give the new Box a name - when we convert it back to a Box
// and then don't use it, the Rust compiler will insert the
// necessary code to drop it (deallocating the memory).
let _ = Box::from_raw(data);
}
Implementing the two accessor functions is simpler; the only new bit here is
converting a raw *const NamedData
into a Rust reference (&NamedData
), which
requires an unsafe block (because we have to dereference the pointer, and the
Rust compiler can’t know whether the pointer is actually valid):
#[no_mangle]
pub extern fn named_data_get_name(named_data: *const NamedData) -> RustByteSlice {
let named_data = unsafe { &*named_data };
RustByteSlice::from(named_data.name.as_ref())
}
#[no_mangle]
pub extern fn named_data_count(named_data: *const NamedData) -> size_t {
let named_data = unsafe { &*named_data };
named_data.data.len() as size_t
}
There is an interesting point to make here about the use of unsafe
. The Rust
compiler is extremely strict about safety, but it is sometimes necessary to use
unsafe
to implement particular details. We should be careful what kind of API
we present, though. These functions are primarily intended to be used from
outside of Rust, but they could still be called by other Rust code. These
functions appear to the Rust compiler to be safe – it should be memory safe to
call these with any possible *const NamedData
. But our implementation does
not check for NULL
, in particular. This is very bad – we have an unsafe
function masquerading as a safe one. To fix this, we can move unsafe
up from
just a block around the pointer dereference to a marker on the entire function:
#[no_mangle]
pub unsafe extern fn named_data_get_name(named_data: *const NamedData) -> RustByteSlice {
let named_data = &*named_data;
RustByteSlice::from(named_data.name.as_ref())
}
#[no_mangle]
pub unsafe extern fn named_data_count(named_data: *const NamedData) -> size_t {
let named_data = &*named_data;
named_data.data.len() as size_t
}
Now our functions are correctly marked for other Rust consumers – if they want
to call these functions, they can only do so from other unsafe
code. We also
make use of a fairly common Rust idiom: let named_data = &*named_data;
creates a new binding for the named_data
name which shadows the old
named_data
. The new named_data
has type &NamedData
; the shadowed one that
we no longer can (or need to) access had type *const NamedData
.
On the Swift side, we want to guarantee we’re not going to forget to pair every
call to named_data_new
with a call to named_data_destroy
, so we’ll
create a RustNamedData
wrapper class with an appropriate deinit
. This is
also a convenient place to put calls to the other accessor functions, as well:
class RustNamedData {
private let raw: COpaquePointer
init() {
raw = named_data_new()
}
deinit {
named_data_destroy(raw)
}
var name: String {
let byteSlice = named_data_get_name(raw)
return byteSlice.asString()!
}
var count: Int {
return named_data_count(raw)
}
}
We can create an instance of this class, print some properties, and see the
destruction happen as we expect:
let namedData = RustNamedData()
print("namedData.name = (namedData.name)")
print("namedData.count = (namedData.count)")
// Output from running the above snippet:
// namedData.name = some data
// namedData.count = 5
// NamedData { name: "some data", data: [1, 2, 3, 4, 5] } is being deallocated
Embedding Rust into another language is relatively straightforward. There are
no concerns about any runtime support like a garbage collector, so we just have
to give the host language (Swift) a way to create and destroy instances. Going
the other direction is a little trickier thanks to ARC.
Passing ownership of Swift objects down to Rust is problematic. We saw earlier
how to pass an ephemeral pointer to a String; we could easily do that with
other Swift types by making use of the withUnsafePointer
Swift function.
However, pointers created by withUnsafePointer
are only valid for the
duration of that function call, and we’re going to need a way to give Rust a
more permanent handle on Swift objects. In Swift 1, we would need to drop
down to Objective-C to solve this problem. Since Swift 2 added the ability
to pass Swift functions to APIs expecting C function pointers, we no longer
need to do that. We will still get our hands a little dirty, though.
Let’s start with the C interface each side is going to implement:
struct swift_object {
void *user;
void (*destroy)(void *user);
void (*callback_with_int_arg)(void *user, int32_t arg);
};
void give_object_to_rust(struct swift_object object);
The swift_object
struct has three fields:
user
is a void *
; it will be a pointer to an instance of our Swift object.destroy
is a C function pointer that will be called when Rust wants to destroy user
.callback_with_int_arg
is a C function pointer that Rust can call with a 32-bit signed integer argument.Let’s create the Swift side of our Swift object:
class SwiftObject {
deinit {
print("SwiftObject being deallocated")
}
private func callbackWithArg(arg: Int) {
print("SwiftObject: received callback with arg (arg)")
}
func sendToRust() {
let ownedPointer = UnsafeMutablePointer<Void>(Unmanaged.passRetained(self).toOpaque())
let wrapper = swift_object(
user: ownedPointer,
destroy: destroy,
callback_with_int_arg: callback_with_int_arg)
give_object_to_rust(wrapper)
}
}
private func callback_with_int_arg(user: UnsafeMutablePointer<Void>, arg: Int32) {
let obj: SwiftObject = Unmanaged.fromOpaque(COpaquePointer(user)).takeUnretainedValue()
obj.callbackWithArg(Int(arg))
}
private func destroy(user: UnsafeMutablePointer<Void>) {
let _ = Unmanaged<SwiftObject>.fromOpaque(COpaquePointer(user)).takeRetainedValue()
}
deinit
and callbackWithArg
are straightforward: we just want to see output
when they’re called. Most of the magic happens in sendToRust
, so let’s break
that down:
let ownedPointer = UnsafeMutablePointer<Void>(Unmanaged.passRetained(self).toOpaque())
We have an instance of a Swift object (self
), and the bridged form of our
swift_object
struct is expecting an UnsafeMutablePointer<Void>
for its
user
field. We have to pass through two intermediate states to get there:
Unmanaged
is Swift’s window into you taking more control over memory management. When you call Unmanaged.passRetained(self)
, you get back “an unmanaged reference with an unbalanced retain”. This is exactly what we want: the reference has now been retained, and we are responsible for releasing the object when we (or Rust) is finished with it.Unmanaged<SwiftObject>
, we can call toOpaque()
on it to get a COpaquePointer
.COpaquePointer
, we can finally create an UnsafeMutablePointer
, the type we need to supply for user
.Now that we’ve converted self
into an UnsafeMutablePointer<Void>
, we can
build up an instance of the swift_object
struct and call our
give_object_to_rust
function:
let wrapper = swift_object(
user: ownedPointer,
destroy: destroy,
callback_with_int_arg: callback_with_int_arg)
give_object_to_rust(wrapper)
The destroy
and callback_with_int_arg
arguments are private functions; let’s look
at those now.
callback_with_int_arg
is given a user
(the exact same UnsafeMutablePointer<Void>
we
just created and an integer argument. The tricky bit here is converting user
back into a usable SwiftObject
; we have to repeat the process we did above
but in reverse:
// UnsafeMutablePointer<Void> -> COpaquePointer
COpaquePointer(user)
// COpaquePointer -> Unmanaged<SwiftObject>
Unmanaged.fromOpaque(COpaquePointer(user))
// Unmanaged<SwiftObject> -> SwiftObject
Unmanaged.fromOpaque(COpaquePointer(user)).takeUnretainedValue()
Note that we call takeUnretainedValue()
, not takeRetainedValue()
, because
we do not want to modify the reference count of the underlying SwiftObject
.
Now that we have a SwiftObject
, we can call methods on it just like normal:
obj.callbackWithArg(Int(arg))
Finally, destroy
is a one-liner that is almost identical to the first
line of callback_with_int_arg
. The difference, as you probably expect, is that here we do
call takeRetainedValue()
. This will decrement the reference count on the
underlying object, causing it to be deallocated (assuming Rust was holding the
only or last reference to it).
Now that the Swift side is ready, what does the Rust side look like? First,
let’s define the SwiftObject
struct:
use libc::c_void;
#[repr(C)]
pub struct SwiftObject {
user: *mut c_void,
destroy: extern fn(user: *mut c_void),
callback_with_int_arg: extern fn(user: *mut c_void, arg: i32),
}
This shouldn’t be too bad, if you’ve made it this far. C function pointers come
in as extern fn
types, and we need to make sure the argument and return types
match. (If we wanted to allow these function pointers to be NULL
, we would
use Option<extern fn(…)>
instead, but we don’t need to do that for this
example.)
Now for give_object_to_rust
. To make things interesting, we’ll start up a
thread, move the SwiftObject
onto that thread, sleep for 1 second, and then
issue the callback into iOS. To tell Rust that it is safe for instances of
SwiftObject
to be sent across threads, we’ll also need to add an
implementation of the (empty) Send
trait. (The explanation for this is a
little long, and this post is already too long by half, so I’ll refer you to
the Rust book’s Concurrency chapter if you’re
curious.)
use std::thread;
unsafe impl Send for SwiftObject {}
#[no_mangle]
pub extern fn give_object_to_rust(obj: SwiftObject) {
println!("moving SwiftObject onto a new thread created by Rust");
thread::spawn(move||{
thread::sleep_ms(1000);
(obj.callback_with_int_arg)(obj.user, 10);
(obj.destroy)(obj.user);
});
}
If we run this, we’ll find everything working; we get the following log, with a
1 second delay between the 2nd and 3rd lines:
moving SwiftObject onto a new thread created by Rust
SwiftObject: received callback with arg 10
SwiftObject being deallocated
You can also set a breakpoint in SwiftObject.callbackWithArg
and see that
the callback is happening off of the main thread.
However, there is the unsightly bit of having to manually call
obj.destroy
. This is Rust – we should not need to do manual resource
management! Earlier, we implemented the Drop
trait on NamedData
. Let’s try
to do the same thing here. We can move the call to destroy
into drop
,
which means Rust will put the call at exactly the right place (whenever our
SwiftObject
falls out of scope):
impl Drop for SwiftObject {
fn drop(&mut self) {
(self.destroy)(self.user);
}
}
#[no_mangle]
pub extern fn give_object_to_rust(obj: SwiftObject) {
println!("moving SwiftObject onto a new thread created by Rust");
thread::spawn(move||{
thread::sleep_ms(1000);
(obj.callback_with_int_arg)(obj.user, 10);
/* (obj.destroy)(obj.user); */
});
}
This compiles, but with a pretty scary warning:
src/swift_ownership_to_rust.rs:13:1: 17:2 warning: implementing Drop adds hidden state to types, possibly conflicting with `#[repr(C)]`, #[warn(drop_with_repr_extern)] on by default
src/swift_ownership_to_rust.rs:13 impl Drop for SwiftObject {
src/swift_ownership_to_rust.rs:14 fn drop(&mut self) {
src/swift_ownership_to_rust.rs:15 (self.destroy)(self.user);
src/swift_ownership_to_rust.rs:16 }
src/swift_ownership_to_rust.rs:17 }
It turns out the warning is scary-sounding for a reason: running this code now
crashes!
Explaining what’s going on is a little involved, but I’ll try to summarize.
Currently, when you implement Drop
on a struct, the Rust compiler inserts a
hidden field into that struct that it uses to track whether or not it needs to
call drop
. This is just a limitation of the current implementation; there are
an accepted RFC and an open issue
that will address this. For now, however, this hidden field changes the size
of SwiftObject
. The Rust compiler is warning us that we said we wanted an
in-memory representation compatible with C, but that isn’t happening because of
this hidden field.
Luckily, fixing this is easy. We can wrap SwiftObject
up in a new Rust type
and implement Drop
on that type instead. We’ll use Rust’s newtype
syntax
for a struct-that-only-exists-to-wrap-another-type:
use std::ops::Deref;
struct SwiftObjectWrapper(SwiftObject);
impl Deref for SwiftObjectWrapper {
type Target = SwiftObject;
fn deref(&self) -> &SwiftObject {
&self.0
}
}
impl Drop for SwiftObjectWrapper {
fn drop(&mut self) {
(self.destroy)(self.user);
}
}
There’s another new thing here: we implemented the Deref trait.
This lets us freely access the fields (and methods, if there were any) on a
SwiftObjectWrapper
’s inner SwiftObject
. Finally, we need to update our
give_object_to_rust
implementation to wrap the struct Swift gives us into a
SwiftObjectWrapper
, and move that wrapper onto the background thread:
#[no_mangle]
pub extern fn give_object_to_rust(obj: SwiftObject) {
println!("moving SwiftObject onto a new thread created by Rust");
let obj = SwiftObjectWrapper(obj);
thread::spawn(move||{
thread::sleep_ms(1000);
(obj.callback_with_int_arg)(obj.user, 10);
});
}
This compiles (and runs) without warning or error, and we no long have to worry
about calling destroy
: Rust will insert the call for us at precisely the
right moment.
We’ve covered a lot in this post! We now know how to pass all kinds of things from Rust to Swift and from Swift to Rust. In
the next couple of posts, we’ll start to take steps towards a more realistic
example and explore two different ways to implement view models in Rust that
can be used from Swift.
All the code, plus the extra stuff necessary to get it running (like an Xcode
project file), is available on GitHub.
Our introductory guide to Swift Regex. Learn regular expressions in Swift including RegexBuilder examples and strongly-typed captures.
The Combine framework in Swift is a powerful declarative API for the asynchronous processing of values over time. It takes full advantage of Swift...
SwiftUI has changed a great many things about how developers create applications for iOS, and not just in the way we lay out our...