Swift Regex Deep Dive
iOS MacOur introductory guide to Swift Regex. Learn regular expressions in Swift including RegexBuilder examples and strongly-typed captures.
Editor’s note: This is the second post in our series on building an iOS app in Rust.
Welcome to Part 2 of our “Building an iOS App in Rust” series! If you
haven’t read Part 1 already, please do that. Running the code in this
post will require you to have set up a Rust toolchain for iOS.
Last time, we built a simple “Hello, World!” library in Rust and successfully
linked it into an iOS app. This time, we’re going to explore more of Rust’s
FFI, or foreign function interface, layer. The Rust book
has an excellent chapter on Rust’s FFI facilities; however, it is
almost exclusively about how to call C libraries from Rust, whereas we want
to go the other direction and call Rust from Swift.
Unlike Part 1, I’m not going to walk you through generating a Rust library and
an iOS project to run all the sample code in this post. The code is hosted
on GitHub, so you can check it out at your leisure.
Note: In Part 1, we scoped out a five-part plan for this series, and Part 2
was supposed to be “Passing Data Between Rust and iOS.” It started to get
a little long, though, so we’ve split the data passing into two. There will
be more than five parts in the end.
API design is hard in any language. We’re throwing a peculiar wrench into
things with this project:
Basically, we have two modern, advanced languages talking to each other… but they
both think they’re talking to C. Great.
At CppCon 2014, Stefanus DuToit gave an excellent talk called Hourglass
Interfaces for C++ APIs. It’s well worth watching, even
if you don’t speak C++. The talk is largely a sales pitch to C++ library
writers, giving them reasons to create, and advice on creating, C-compatible
wrappers for their APIs. We’re going to follow a lot of his advice when
wrapping Rust code:
int
and long
in favor ofint32_t
and uint8_t
. We would end up doingint
in C or Int
i32
instead, which is always guaranteed to beLet’s start with exchanging primitive numbers.
In Part 1 of this series, you already saw a Rust function that returns an
i32
, a 32-bit signed integer:
#[no_mangle]
pub extern fn rust_hello_world() -> i32 {
println!("Hello, I'm in Rust code! I'm about to return 10.");
10
}
Here are more examples of Rust functions that take and return various
primitives:
// C declaration: int32_t return_int32(void);
#[no_mangle]
pub extern fn return_int32() -> i32 {
10
}
// C declaration: uint16_t triple_a_uint16(uint16_t x);
#[no_mangle]
pub extern fn triple_a_uint16(x: u16) -> u16 {
x * 3
}
// C declaration: float return_float(void);
#[no_mangle]
pub extern fn return_float() -> f32 {
10.0
}
// C declaration: double average_two_doubles(double x, double y);
#[no_mangle]
pub extern fn average_two_doubles(x: f64, y: f64) -> f64 {
(x + y) / 2.0
}
Rust’s i32
is C’s int32_t
is Swift’s Int32
. All the normal sizes (8, 16,
32, 64) are available in all three languages, and unsigned variants are as well
(u32
<-> uint32_t
<-> UInt32
, etc.). Swift’s Float
is Rust’s f32
, and
Swift’s Double
is Rust’s f64
.
There’s one last primitive I want to discuss. I said earlier that Rust didn’t
have a default integer type that changes size based on the target platform
(like how Swift’s Int
can be either 32 or 64 bits). That’s true, but it does
have something similar that comes up in several important Rust APIs: usize
is
a “pointer-sized unsigned integer.” Rust uses usize
for things like the
length of a string, or the number of elements in an array.
Rust uses an unsigned integer for things like array length; Swift, on the other
hand, uses Int
, which is signed. To reconcile this, we’ll pass Rust’s usize
as a C size_t
. Because size_t
is used in C for things like length, Swift
bridges it in as Int
. We’ll need to do a little casting on the Rust side, but
that’s ok. Here is a Rust function that works with size_t
, internally
converting to usize
:
use libc::size_t;
#[no_mangle]
pub extern fn sum_sizes(x: size_t, y: size_t) -> size_t {
let x_usize = x as usize;
let y_usize = y as usize;
(x_usize + y_usize) as size_t
}
The first line is importing a type, size_t
, from the libc
crate. libc
provides platform-specific bindings to native C data
types, like size_t
(and things like void
– that’ll come up in a few
minutes).
In the body of the function, we use the as
keyword to cast between
size_t
and usize
. Like Swift, Rust doesn’t do implicit conversions, so we
must be explicit.
We can call all of these from Swift just like regular functions:
func exercisePrimitives() {
let a: Int32 = return_int32()
let b: UInt16 = triple_a_uint16(10)
let c: Float = return_float()
let d: Double = average_two_doubles(10, 20)
let e: Int = sum_sizes(20, 30)
print("primitives: (a) (b) (c) (d) (e)")
}
In the modern Unicode world, strings are surprisingly complex. Rust’s built-in
string types are always guaranteed to hold valid UTF-8 encoded data. This means
you can’t create a string out of arbitrary bytes unchecked; instead, you can
use the std::str::from_utf8
function to attempt to interpret
a slice of bytes into a string. from_utf8
returns a Result<&str,
type: an enum that is either
Utf8Error>Ok
with an associated
&str
value or Err
with an associated Utf8Error
value.
Here’s a Rust function that uses Rust’s match
construct (analogous to Swift’s
switch
) to unpack the returned Result
:
use std::str;
// Take a slice of bytes and try to print it as a UTF-8 string.
fn print_byte_slice_as_utf8(bytes: &[u8]) {
match str::from_utf8(bytes) {
Ok(s) => println!("got {}", s),
Err(err) => println!("invalid UTF-8 data: {}", err),
}
}
Note that this function is not marked pub
or extern
and doesn’t have the
#[no_mangle]
attribute. Without pub
, it’s a private function: we’re going
to use it internally, but not expose it to Swift.
Let’s write a Rust function that takes a raw pointer and a length in bytes,
converts it to a Rust slice, and calls our print_byte_slice_as_utf8
function:
use std::slice;
// C declaration: void utf8_bytes_to_rust(const uint8_t *bytes, size_t len);
#[no_mangle]
pub extern fn utf8_bytes_to_rust(bytes: *const u8, len: size_t) {
let byte_slice = unsafe { slice::from_raw_parts(bytes, len as usize) };
print_byte_slice_as_utf8(byte_slice);
}
The function declaration is straightforward. We take two arguments: the first
is a pointer to 8-bit unsigned integers (i.e., bytes), and the second is the
length. This is standard fare for passing around arrays in C.
The first line of the function body is a little hairy; let’s unpack it from the
inside out:
slice::from_raw_parts(bytes, len as usize)
passes our two arguments to the std::slice::from_raw_parts
function from Rust’s standard library. That function returns a &[u8]
, a slice of bytes.unsafe { … }
is our first encounter with Rust’s unsafe
keyword. One of Rust’s claims to fame is memory safety: in pure, safe Rust, it’s impossible to ever access uninitialized or invalid memory. Unfortunately, when we’re calling Rust from another language, none of those guarantees exist. Someone might pass NULL
as the argument to bytes
, for example. from_raw_parts
is marked as an unsafe
function because the compiler can’t guarantee (at compile time) that the pointer and length you’re providing are actually valid. It also doesn’t know how long the pointer will be valid. Because both of these issues are giant holes that could cause our program to crash and burn, we’re forced to wrap the call in an unsafe
block to tell the compiler, “I know this isn’t safe, but I want you to go ahead and do it anyway. I promise it’ll be okay.”byte_slice
variable. Rust, like Swift, supports type inference; we could’ve written let byte_slice: &[u8]= …
if we wanted to.To call this function from Swift, we need to convert a String
into
UTF-8-encoded data, then pass the data pointer and length:
let myString = "Hello from Swift"
let data = myString.dataUsingEncoding(NSUTF8StringEncoding, allowLossyConversion: false)!
utf8_bytes_to_rust(UnsafePointer<UInt8>(data.bytes), data.length)
Rust and Swift both know how to work with old style, null-terminated C strings,
as well. Here’s the Rust side:
use std::ffi::CStr;
// C declaration: void c_string_to_rust(const char *null_terminated_string);
#[no_mangle]
pub extern fn c_string_to_rust(null_terminated_string: *const c_char) {
let c_str: &CStr = unsafe { CStr::from_ptr(null_terminated_string) };
let byte_slice: &[u8] = c_str.to_bytes();
print_byte_slice_as_utf8(byte_slice);
}
This time, we use the std::ffi::CStr
type, which represents a
“borrowed” C string. (We’ll talk more about borrowing and ownership shortly;
for now, think of it as “a string someone else is responsible for
deallocating”.) Like from_raw_parts
, CStr::from_ptr
is an unsafe
function
because the Rust compiler can’t know at compile time whether the argument is
actually a valid null-terminated string.
Once we have a &CStr
, we can use its to_bytes()
method to get a view into
it as a slice of bytes, and then we can call our same
print_byte_slice_as_utf8
function. Note that no memory copies happen in this
function: the first line just wraps up the pointer into a new Rust type, and
the second line gives us a view into those same bytes.
Calling this function from Swift is trivial: Swift will automatically convert
String
s into C-style strings when you try to pass them to functions that take
const char *
:
let myString = "Hello from Swift"
c_string_to_rust(myString)
I mentioned earlier that Rust stores strings encoded as UTF-8. This does not
include a C-style null-terminating byte. The cleanest way to pass a Rust string
out across the FFI boundary is to pass out an array of bytes and a length. We
could write a C interface that needs to return both, but then we have the
always-slightly-unwieldy output pointer type; e.g.,
// Unwieldy: Return a pointer to bytes and fill in `len` with its length.
const uint8_t *get_string_from_rust(size_t *len);
Instead, we’ll define a Rust struct and tell the compiler to use a binary
representation compatible with C:
#[repr(C)]
pub struct RustByteSlice {
pub bytes: *const u8,
pub len: size_t,
}
Now our C interface looks like this, where we pass back a small, 16-byte
structure by value:
struct RustByteSlice {
const uint8_t *bytes;
size_t len;
};
struct RustByteSlice get_string_from_rust(void);
Finally, the Rust implementation:
#[no_mangle]
pub extern fn get_string_from_rust() -> RustByteSlice {
let s = "This is a string from Rust.";
RustByteSlice{
bytes: s.as_ptr(),
len: s.len() as size_t,
}
}
We used type inference, but the type of s
is &str
; that is, a “borrowed
string” (more on “borrowed” momentarily). Two methods on &str
are
.as_ptr()
, which returns a pointer to the string’s bytes, and .len()
, which
returns the number of bytes in the string. Note that this is different from
Swift, which does not define a String
’s length. Swift provides several
different views into the components of a String
; see Strings in Swift
2.0 on the official Swift blog for details.
Did you notice anything unusual about the Rust implementation? Even though
we’re trafficking in raw C pointers, we never had to use the unsafe
keyword.
Why not? The Rust compiler can statically guarantee that everything inside the
get_string_from_rust
function is safe. It’s perfectly safe to create a raw
pointer to s
’s bytes; in general, it’s perfectly safe to create and pass
around raw pointers. Dereferencing pointers is unsafe, but
get_string_from_rust
doesn’t do that itself. Presumably whoever calls
get_string_from_rust
is going to dereference the returned pointer, but that’s
their problem to deal with.
That does bring up a question, though. How long is the pointer
get_string_from_rust
returns going to be valid? Is it really safe for Swift
to dereference it? Does someone need to deallocate it? (How would they? free
,
or some other function?)
To answer that, we need the full type of s
. A few paragraphs ago, I said it
was &str
, but that’s only part of the story. All Rust references have an
associated lifetime. Most of the time you don’t have to work with lifetimes
yourself, as the Rust compiler can infer them. Our s
variable actually has a
special lifetime. Because we initialized s
with a string literal, its full
type is &'static str
. The static
lifetime outlives all other lifetimes. For
us, this means that the string s
points to will always be valid (similar to a
static const char *
string in a C program). It is safe for Swift to access
the pointer we return at any time, as it will always point to the bytes making
up the string This is a string from Rust.
.
So how do we call this function from Swift? The RustByteSlice
struct gets
bridged in as this Swift type:
struct RustByteSlice {
var bytes: UnsafePointer<Int8>
var len: Int
}
We can write an extension on this bridged type to convert to an
UnsafeBufferPointer<UInt8>
(which will always succeed) and to a String
(which will only succeed if the byte slice contains valid data according to the
expected encoding):
extension RustByteSlice {
func asUnsafeBufferPointer() -> UnsafeBufferPointer<UInt8> {
return UnsafeBufferPointer(start: bytes, count: len)
}
func asString(encoding: NSStringEncoding = NSUTF8StringEncoding) -> String? {
return String(bytes: asUnsafeBufferPointer(), encoding: encoding)
}
}
Now we can ask Rust for a String, and print it:
let rustString = get_string_from_rust()
if let stringFromRust = rustString.asString() {
print("got a string from Rust: (stringFromRust)")
} else {
print("Could not parse Rust string as UTF-8")
}
So far, all the data passing we’ve seen is quite limited. We can pass around
primitives without any problem: in Rust and Swift (as well as C and a host of
other languages), primitives are passed by value. Our string-passing functions
are more problematic:
utf8_bytes_to_rust
and c_string_to_rust
, Swift owns the memory backing the string. The pointer it gives to Rust is valid for the duration of the function call, but may not be valid afterwards. If Rust were to squirrel away the pointer Swift gives it and try to access it later, Bad Things™ will probably happen.get_string_from_rust
is safe, but only because it’s passing a pointer to a static
string.Both directions are insufficient for building a real library. We need the ability to pass objects that live for longer than one function call, but we need the flexibility to work with dynamic data. Ultimately, we need to be able to reason about the ownership of our objects, and that will be the subject of the next post.
Editor’s note: Be sure to check out Part 3 of the series, where John covers how to pass more complex data than strings, and how to correctly and safely manage ownership of that data.
Our introductory guide to Swift Regex. Learn regular expressions in Swift including RegexBuilder examples and strongly-typed captures.
The Combine framework in Swift is a powerful declarative API for the asynchronous processing of values over time. It takes full advantage of Swift...
SwiftUI has changed a great many things about how developers create applications for iOS, and not just in the way we lay out our...