Swift Regex Deep Dive
iOS MacOur introductory guide to Swift Regex. Learn regular expressions in Swift including RegexBuilder examples and strongly-typed captures.
One of the things that excites me the most about Swift is the additional toolset it provides to write code with fewer bugs. This is not just pie-in-the-sky thinking: earlier today, I fixed a bug (that I had introduced) in an app written in Objective-C; that bug would not have been possible to introduce in the first place with the stronger static typing of Swift.
There are a lot of features in Swift that fall into this camp: typed arrays and dictionaries, optionals and more. But perhaps even more intriguing are the possibilities that we can build on top of those features to change “things I have to worry about” in Objective-C into “things I can let the compiler worry about” in Swift.
Let’s tackle something that’s always messy: thread safety.
Suppose we’re writing a class that includes, among other things, an array of things (Array<T>
) and a timestamp (NSDate
) of when that array was last modified. For the sake of brevity, let’s limit the class to just “append an item to the array” and “get the last-modified timestamp”:
class ArrayTracker<T> {
private var things: [T] = []
private var lastModified: NSDate?
// ... various other properties
// Append an item to the array, returning a tuple of the modification
// time we just saved and a count of the number of things in the
// array.
func appendToThings(item: T) -> (NSDate, Int) {
things.append(item)
lastModified = NSDate.date()
return (lastModified, things.count)
}
// Get the timestamp of when last modified the array.
// Returns nil if the array has never been modified.
func lastModifiedDate() -> NSDate? {
return lastModified
}
// ... various other methods
}
This covers the basic interface of our array tracker: we can append something to the array (getting back the new “last modified” time and the new number of things in the array), and we can get the “last modified” time (if there is one). You could imagine several other interesting things: get the last item in the array (doesn’t change lastModified
), remove an item from the array (does change lastModified
), etc.
But now here’s the catch: We want ArrayTracker
to be thread-safe, and to allow multiple concurrent readers, but only one writer at a time (and a writer should get exclusive access—all readers are blocked while a writer is active). First up, we need a lock.
We want a readers-writer lock, which is a lock that can be acquired by multiple readers simultaneously, but can only be acquired by a single writer. There are lots of different ways to implement such a lock (on top of GCD, or using low-level atomics, or a variety of other means), but let’s not get bogged down in the details—that can be left as an exercise for you, dear reader. Instead, we’ll define a protocol that describes the interface we want our lock to satisfy. Sticking with the theme of letting the compiler do things for us, let’s avoid having lock()
and unlock()
methods that we have to remember to call at the right times, and instead have the lock implementation run a block that we provide:
protocol ReadWriteLock {
// Get a shared reader lock, run the given block, and unlock
mutating func withReadLock(block: () -> ())
// Get an exclusive writer lock, run the given block, and unlock
mutating func withWriteLock(block: () -> ())
}
These functions are marked as mutating because one could imagine some particular lock implementation being a struct
with some internal state that needed to be modified in order to take and release locks. Assuming we have a lock implementation that satisfies this protocol (we’ll call it MyLock
), what does our thread-safe version of ArrayTracker
look like? Omitting things that haven’t changed:
class ArrayTracker<T> {
// ... existing properties
private var lock: ReadWriteLock = MyLock()
func lastModifiedDate() -> NSDate? {
var date: NSDate?
// withReadLock runs the block its given synchronously, so we
// don't need to capture self - use unowned
lock.withReadLock { [unowned self] in
date = self.lastModified
}
return date
}
func appendToThings(item: T) -> (NSDate, Int) {
// we know we're going to set these before we return them, but we
// don't have a reasonable default value; we'll use
// implicitly-unwrapped optionals
var date: NSDate!
var count: Int!
lock.withWriteLock { [unowned self] in
self.things.append(item)
self.lastModified = NSDate.date()
date = self.lastModified
count = self.things.count
}
return (date, count)
}
// ... rest of class
}
So far, so good. Now our two methods (plus many more that have been elided—you’re still keeping them in mind, right?) are thread-safe. However, the implementations look a little messy: in both, we have to create local variables, assign to them from inside the block, then return them. There’s got to be a better way.
Everything we’ve done so far could have been done almost exactly the same way in Objective-C (aside from items
being a generic array), but now let’s move on to something we can’t do. Instead of having to capture values within the “lock blocks,” what if we give those blocks the ability to return arbitrarily-typed things? Let’s modify our lock protocol:
protocol ReadWriteLock {
// Get a shared reader lock, run the given block, unlock, and return
// whatever the block returned
mutating func withReadLock<T>(block: () -> T) -> T
// Get an exclusive writer lock, run the given block, unlock, and
// return whatever the block returned
mutating func withWriteLock<T>(block: () -> T) -> T
}
Now we can clean up our class:
func lastModifiedDate() -> NSDate? {
// return the result of the call to withReadLock...
return lock.withReadLock { [unowned self] in
// ... which is the date that we want
return self.lastModified
}
}
func appendToThings(item: T) -> (NSDate, Int) {
return lock.withWriteLock { [unowned self] in
self.things.append(item)
self.lastModified = NSDate.date()
return (self.lastModified!, self.things.count)
}
}
Much better! We no longer have to declare local variables before the “lock blocks,” set them inside and then return them.
Now we have a nice, clean way of protecting access to data behind a lock. That’s great—there’s a lot to be said for readable code, and our thread-safe versions of these methods are only two lines longer than the original, unsafe versions (and one of those lines is just an extra closing brace). However, let’s go back to what we really want to accomplish: how can we get the compiler to enforce things that, in Objective-C, we typically have to reason out ourselves? Having pretty locking mechanisms are great, but we still have to worry about the locking. We have to make sure we never access things
or lastModified
outside of a lock. If the class is big or has a lot of other moving parts, that can become difficult to keep track of. What we really want is to get the compiler to enforce that we only access those data while we are holding the lock.
We want to make lastModified
and things
impossible to access without locking, which will require moving them out of ArrayTracker
and into something else. Let’s define that something else:
// Protector holds onto an item of type T, and only allows access to it
// from within a "lock block"
class Protector<T> {
private var lock: ReadWriteLock = MyLock()
private var item: T
// initialize an instance with an item
init(_ item: T) {
self.item = item
}
// give read access to "item" to the given block, returning whatever
// that block returns
func withReadLock<U>(block: (T) -> U) -> U {
return lock.withReadLock { [unowned self] in
return block(self.item)
}
}
// give write access to "item" to the given block, returning whatever
// that block returns
func withWriteLock<U>(block: (inout T) -> U) -> U {
return lock.withWriteLock { [unowned self] in
return block(&self.item)
}
}
}
Whew! Let’s unpack the signature of withReadLock
:
withReadLock<U>
is a generic method inside of a generic class. That means there are two generic types involved: T
from our class, and U
from this method.(block: (T) -> U)
means withReadLock
takes as its sole parameter a block that takes a T
(in particular, our protected item) and returns anything at all.-> U
says that the return type is U
; that is, we return whatever the block returns. This is the same trick we used in Step 2 above, to get our lock protocol to return whatever the block we give it returns.withWriteLock
is the almost the same, with the difference that the argument to block is inout
, meaning the block is allowed to modify it.
Time to replace lastModified
and things
with a protected version:
// Let's define a struct to hold our protected data. This should probably
// be embedded inside ArrayTracker, but that doesn't build in Beta 4.
private struct Protected<T> {
var lastModified: NSDate?
var things: [T] = []
init() {
}
}
class ArrayTracker<T> {
// Put an instance of our protected data inside a Protector
private let protector = Protector(Protected<T>())
// ... other properties, but no longer "lock", "lastModified",
// or "things"
func lastModifiedDate() -> NSDate? {
return protector.withReadLock { protected in
return protected.lastModified
}
}
func appendToThings(item: T) -> (NSDate, Int) {
return protector.withWriteLock { protected in
protected.things.append(item)
protected.lastModified = NSDate.date()
return (protected.lastModified!, protected.things.count)
}
}
// ... rest of class
}
Now we’ve freed up some mental space! It’s no longer possible for us to accidentally access lastModified
or things
without obtaining the lock, so we don’t have to think about it at all. Not only that, but because Protected
is a struct (and therefore has value semantics), it’s now a compile-time error to try to modify the protected item inside of a read lock:
// WRONG: Incorrectly try to append to things with just a read lock
func appendToThingsWithReadLock(item: T) {
protector.withReadLock { protected -> () in
// This line fails to compile:
// "Immutable value of type [T] only has mutating members
// named 'append'"
protected.things.append(item)
// This line refuses to compile too:
// "Cannot assign to 'lastModified' in 'protected'"
protected.lastModified = NSDate.date()
}
}
The protection against incorrectly using a read lock and modifying the protected value isn’t perfect. If Protected
were a class instead of a struct, the last example would build without complaint (but be thread-unsafe). If Protected
were a struct but contained properties that were instances of classes (i.e., reference types), we could call methods on those properties that might modify them (which would also be thread-unsafe). However, we’ve at least significantly reduced the things we have to manage ourselves.
One of the most difficult problems we have as software developers is managing complexity, and Swift gives us a lot of tools to lean on the compiler for help. In this example, we could have followed most of the same steps in Objective-C, but without generics, the most natural way of writing the Protector
class would have tied it tightly to ArrayTracker
.
Swift, much more than Objective-C, is going to reward careful, considered API design. It’s exciting to be on the ground floor: making full use of the language is going to require unlearning some patterns we’re used to, and figuring out idiomatic replacements is going to take some time and creativity.
Our introductory guide to Swift Regex. Learn regular expressions in Swift including RegexBuilder examples and strongly-typed captures.
The Combine framework in Swift is a powerful declarative API for the asynchronous processing of values over time. It takes full advantage of Swift...
SwiftUI has changed a great many things about how developers create applications for iOS, and not just in the way we lay out our...