
Swift Regex Deep Dive
iOS MacOur introductory guide to Swift Regex. Learn regular expressions in Swift including RegexBuilder examples and strongly-typed captures.
Want to learn more about what’s really happening inside those square brackets? Read the entire Inside the Bracket series.
In our last episode, you saw a real application of Objective-C metadata introspection, looking at the information generated by the compiler and made available by the Objective-C runtime. Now it’s time for some fun – changing things!
You can manipulate classes at runtime. One way you can do this is by creating them entirely from your code, adding methods and instance variables. Once it’s built the new class is a first-class citizen as far as the Objective-C runtime is concerned. That’s a pretty rare thing to do, so instead I’ll just concentrate changing existing classes.
Remember @dynamic
? It’s one of the directives that tells the compiler what to do with a property. If you tell the compiler “hey, this property is dynamic”, the compiler will do nothing else. It won’t make an instance variable and it won’t emit any accessor methods. Someone is responsible for hooking up those methods before they’re used. You can do that with the class_addMethod function.
Here’s our victim class – you can get the final version of the code from this gist:
@interface Glorn : NSObject
@end
@implementation Glorn
@end
Not a whole lot there, just the basic support machinery you get from NSObject
. You know what happens when you send an object an unexpected message:
name = [glorn larvoid];
This wonderful message:
-[Glorn larvoid]: unrecognized selector sent to instance 0x1021142b0
Easy enough to fix! First, make a function that takes the usual first two Objective-C parameters, self and the selector:
static NSString *Larvoid (id self, SEL _cmd) {
return @"Zogg";
} // Larvoid
This is our IMP
, which is a pointer to an Objective-C implementation. IMP
s are the unit of currency for referencing code. When you ask a class for the code behind a method, you’ll get an IMP
. When you change the code behind a method, you give it an IMP
.
Get the class and add the method, passing the IMP
, the selector, and a type signature:
Class glornClass = [Glorn class];
BOOL success = class_addMethod (glornClass, @selector(larvoid),
(IMP)Larvoid, "@:v");
Assuming it succeeded, you can call the method:
name = [glorn larvoid];
NSLog (@"Hello my name is %@", name);
And you get:
Hello my name is Zogg
Pretty easy – you just need to make a function, figure out its type signature, and then class_addMethod
it. There is an easier way, too.
imp_implementationWithBlock
is not in the official documentation (which hasn’t been updated since 2010), but is in the <objc/runtime.h>
header. Instead of making a function, you give it a block that will be passed a self pointer (but no selector), and then any subsequent arguments:
IMP blockImp =
imp_implementationWithBlock ((__bridge void *)
^(id self, NSUInteger count) {
for (NSUInteger i = 0; i < count; i++) {
NSLog (@"Hello #%ld, %@", i, name);
}
});
The __bridge
cast is to keep ARC happy because the function takes a void*
for its block, presumably because there’s no common block type you’ll be passing. You get back a full-fledged IMP
. implementationWithBlock
wraps the block in a trampoline function and returns it to you. For more of the grungy details, check out Bill Bumgarner’s blog post.
Now you need to get the encoding of the block, which is is at least “v@” – returning void, and taking an object argument (self), and also add the type of NSUInteger
. Because NSUInteger
can be different sizes on different platforms, you can’t hardcode the string like I’m doing for the other two arguments, instead construct it at runtime. Then call class_addMethod
, and free the buffer:
char *encoding;
asprintf(&encoding;, "v@:%s", @encode(NSUInteger));
success = class_addMethod (glornClass, @selector(greetingWithCount:),
blockImp, encoding);
free (encoding);
I’m using asprintf
to try to break myself of the habit of “oh, I’ll just make a 512 byte character buffer on the stack and snprintf
into that”, which can bite you later if that chunk of code gets used for something real.
Now you can call any arbitrary Glorn to get a repeated greeting:
[glorn greetingWithCount: 5];
And get the expected output. You’ll notice it captured name from the surrounding scope.
Hello #0, Zogg
Hello #1, Zogg
Hello #2, Zogg
Hello #3, Zogg
Hello #4, Zogg
Remember back in part 4 where I talked about some methods that get called when the Objective-C runtime can’t find a method for a given selector, and then used them to forward on to other objects? There’s another mechanism you can use in this circumstance. +resolveInstanceMethod:
gives you the opportunity to add a method to a class before all hell breaks loose. It actually happens when someone asks “does this class respond to this selector.” You can look around, make decisions, add methods if need be to say “why yes. Yes it does!”
Try sending the Glorn
some more messages:
[glorn keep];
[glorn watching];
[glorn theSkies];
This will croak at runtime unless you do something. For this class, make all unknown selectors that take no arguments create a new method, and have that new method print out the name of the selector. Otherwise, bail out and tell the runtime that you couldn’t resolve the instance method.
+ (BOOL) resolveInstanceMethod: (SEL) selector {
// Don't want to handle anything that takes arguments.
if (strchr(sel_getName(selector), ':')) return NO;
IMP blockImp =
imp_implementationWithBlock ((__bridge void *) ^(id self) {
NSLog (@"you just called %@",
NSStringFromSelector(selector));
});
class_addMethod ([self class], selector, blockImp, "v@:");
return YES;
} // resolveInstanceMethod
Notice that the block has captured the value of the selector that’s passed in. No need to copy the selector’s name to an NSString
or a char
buffer. That’s what makes imp_implementationWithBlock really powerful – you can take advantage of all of block’s various behaviors. Now, when you call these methods:
[glorn keep];
[glorn watching];
[glorn theSkies];
It prints out
you just called keep
you just called watching
you just called theSkies
you just called _doZombieMe
Whoa! What’s that last one? The first time I got my +resolveInstanceMethod:
working, I saw that zombie-me method and laughed. I was totally not expecting it. It comes from NSObject's
dealloc
, presumably for Zombie debugging support. Here’s the stack when it happens:
#0 +[<strong>Glorn resolveInstanceMethod</strong>:] (self=0x100003518, _cmd=0x7fff8cf433d3,
selector=0x7fff8b59ec3c) at addstuff.m:23
#1 0x00007fff8a38db9b in <strong>_class_resolveMethod</strong> ()
#2 0x00007fff8a38b35e in <strong>lookUpMethod</strong> ()
#3 0x00007fff8a38da5c in <strong>class_respondsToSelector</strong> ()
#4 0x00007fff8b44d051 in -[<strong>NSObject dealloc</strong>] ()
#5 0x00000001000016f6 in <strong>AddSomeMethods</strong> () at addstuff.m:93
If you’re actually following along at home, and you’re using ARC, you should have gotten compiler errors like this:
addstuff.m:56:17: error: no visible @interface for 'Glorn' declares
the selector 'larvoid' name = [glorn larvoid];
With ARC you can no longer casually send random messages to objects that the compiler hasn’t heard of, such as -larvoid
. The compiler has to know the types involved so it can do proper memory management. This is just a warning in non-ARC land, so if you want to casually send messages, you can compile a file with -fobjc-no-arc.
For the code here, I kind of cheated and put a category on NSObject to say “ok, here’s what these methods really look like”. Similar to something in Part 4 I used without explanation, but was there to get the code to compile. Here’s the category for completeness:
@interface Glorn (HushTheCompiler)
- (NSString *) larvoid;
- (NSString *) greetingWithCount: (NSUInteger) count;
- (void) keep;
- (void) watching;
- (void) theSkies;
@end
Not only can you add methods at runtime, you can change the code that runs at runtime. You can patch your own stuff into existing classes. I know this is what everyone is here for. “Duuuuude, show me the swizzle!”
Like my previous bit about DTrace, I originally wanted to write about method swizzling because I got to use it to great effect in Krëndler, but you kind of had to know all this other stuff to appreciate what’s going on when you swizzle methods.
Back on the old, old days, when memory was measured in K and processor clock speeds were measured in units of 0.001 gigahertz, the original Mac used a pretty clever scheme for calling functions inside of the Mac Toolbox. The Toolbox was a huge (64K) library of code that made every Mac a Mac. Needed to allocate memory? Wanted to create a window and draw into it? You called into the toolbox.
Rather than using jump instructions (at the cost of four to six bytes) to call a function like GetNextEvent()
, the original Mac used “A traps”, two-byte instructions that started with a leading 1010 nybble, which is 0xA
. When the processor hit one of these instructions, the machine would stop, look in a table in memory for the address to jump to, and then resume execution there, kind of like an interrupt vector. You called the toolbox a lot, and saving two or four bytes on each call is a major win just in program compactness. But it also came with added flexibility.
Here’s the normal use case: GetNextEvent
is trap A970
(the nostalgic can find a complete list in part 3B of this page). The processor sees A970
in the instruction stream, stops, indexes into the lookup table, gets the address in ROM to jump to, and then jumps.
This table lives in RAM. There was no memory protection on the original Mac, so we could write anything, anywhere. (Let me tell you that was fun, for limited values of “fun”). What would happen if we saved that address from the table, stashed it elsewhere in memory, and put a pointer to our own code in the table? You get this:
Now when GetNextEvent
’s A-Trap is encountered, the system stops, looks in the table, finds the address of WhackEvents
, and calls it. Poof! We’ve just added a global event filter. It could examine the event returned from the Toolbox and decide to return it untouched. Or maybe it would modify the event. Or perhaps just drop the event on the floor entirely and wait for another one. This is what was known as “trap patching”. Trap patching was also used by Apple to fix bugs in the ROM – the system loads a some code into RAM and twiddles with the trap table to point to the fix.
Trap patching was a rite of passage for all Mac programmers to do at least once. My first experimental patch was FrameRect
, A8A1
, which I figured was called enough to actually be hit. I made it SysBeep
to say “I’m here!” I quickly realized just how often FrameRect gets called, even in a do-nothing program. Man that was loud! I had to power off the machine to get control back.
This is what method swizzling does, but for Objective-C. You can stick your code inside someone else’s class and alter the course of program flow.
Trap-patching is a very powerful technique, as is method swizzling. But it’s also very fragile. An OS update might render your patch obsolete. It may change the way something behaves so that all the assumptions you made about the environment of the call (state of registers, variables on the stack, other functions on the stack, and so on) might be wrong. Your now-broken code might be nice enough to instantly lock up the machine, or it might be mean enough to scribble a couple of bytes into the user’s data, corrupting it for a later point in time. Trap-patching was fraught with peril, and if you didn’t have to do it to work around a fatal bug, it was best if you never even heard of the concept.
Before getting into Swizzling, I want to make that same admonition. Don’t use it for real code in real programs used by real people. The risks of stuff breaking is rarely worth it. So far in my career I’ve used swizzling for writing some unit tests (say wanting to patch out part of some godforsaken singleton), but have never shipped any.
Now, programming competitions where you’re showing off to your friends, or if your product is fundamentally an evil hack and your users expect it to break under minor OS updates, then it’s fine. It can also be a useful exploration and debugging tool.
The canonical method swizzling function is kind of subtle. You’ll first see the simple (and wrong) way to do it, see why it’s wrong, and then see the right way.
To start off with, here are some sample classes. The complete code can be found at this gist. The base class has two utility methods, and a cover method that invokes those two:
@interface BaseClass : NSObject
- (void) hornswoggle;
- (void) bamboozle;
- (void) doStuff; // Calls hornswoggle and bamboozle to do its work.
@end
@implementation BaseClass
- (void) doStuff {
[self hornswoggle];
[self bamboozle];
} // doStuff
- (void) hornswoggle {
NSLog (@"BaseClass hornswoggle");
}
- (void) bamboozle {
NSLog (@"BaseClass bamboozle");
}
@end // BaseClass
You take a BaseClass
, send it -doStuff
, which causes it to call -hornswoggle
and -bamboozle
.
Then there’s a subclass that overrides both of those, does some work, and then calls the superclass:
@interface FirstBegat : BaseClass
// implements hornswoggle and bamboozle
@end
@implementation FirstBegat
- (void) hornswoggle {
NSLog (@"FirstBegat hornswoggle calling");
[super hornswoggle];
}
- (void) bamboozle {
NSLog (@"FirstBegat bamboozle calling");
[super bamboozle];
}
@end // FirstBegat
Make a FirstBegat
, and tell it to doStuff
:
FirstBegat *first = [FirstBegat new];
[first doStuff];
And you’ll see this output (I added some spaces to see the flow a little better)
FirstBegat, do stuff!
FirstBegat hornswoggle calling
BaseClass hornswoggle
FirstBegat bamboozle calling
BaseClass bamboozle
This is polymorphism 101. doStuff
is in the BaseClass
, but it’s causing code to be run in FirstBegat
because the Objective-C runtime machinery is poking around FirstBegat
’s class looking in its code pile for things stored under the name “hornswoggle” and “bamboozle”. (If this didn’t make any sense, it probably means you skipped the preliminaries.)
To make life interesting, here’s a second class. It subclasses FirstBegat
, but only overrides hornswoggle
. It gets the default bamboozle
.
@interface SecondBegat : FirstBegat
// implements hornswoggle (not bamboozle)
@end
@implementation SecondBegat
- (void) hornswoggle {
NSLog (@"SecondBegat hornswoggle calling");
[super hornswoggle];
}
@end // SecondBegat
Making and calling a second begat shows the override of hornswoggle
:
SecondBegat *second = [SecondBegat new];
[second doStuff];
...
SecondBegat, do stuff!
SecondBegat hornswoggle calling
FirstBegat hornswoggle calling
BaseClass hornswoggle
FirstBegat bamboozle calling
BaseClass bamboozle
Not surprising. Here’s the class hierarchy:
Now, let’s perform some violence!
One of the fundamental utilities for swizzling methods is method_exchangeImplementations
. You first get two Method
objects from the class by using class_getInstanceMethod
and then pass them to method_exchangeImplementations
. Their implementations, their IMP
s, will be swapped. Exchanging hornswoggle
and bamboozle
for FirstBegat
would cause [first hornswoggle]
to print out “bamboozle”, and vice versa. Here’s the first crack at a Swizzling function:
void SwizzleMethodBadly (Class clas, SEL originalSelector, SEL newSelector)
{
Method originalMethod =
class_getInstanceMethod (clas, originalSelector);
Method newMethod =
class_getInstanceMethod (clas, newSelector);
if (originalMethod && newMethod) {
method_exchangeImplementations (originalMethod, newMethod);
}
} // SwizzleMethodBadly
You call this with a class and two selectors. The methods are looked up, then swapped assuming they both exist. Their signatures should be the same, that is take the same kinds of arguments and return the same values. There’s no error checking, so if you exchange @selector(description)
with @selector(initWithBitmapDataPlanes:pixelsWide:pixelsHigh:bitsPerSample:samplesPerPixel:hasAlpha:isPlanar:colorSpaceName:bitmapFormat:bytesPerRow:bitsPerPixel:)
, you get what you deserve.
So, let’s hack FirstBegat
. Remember that it implements both hornswoggle
and bamboozle
. (This is a very important detail.) The hack looks like this:
@interface FirstBegat (HackNSlash)
+ (void) hck_hijackMethods;
@end
@implementation FirstBegat (HackNSlash)
+ (void) hck_hijackMethods {
SwizzleMethodBadly ([self class], @selector(hornswoggle),
@selector($hackFirstBegat_Hornswoggle));
}
- (void) $hackFirstBegat_Hornswoggle {
NSLog (@"I'm in ur FirstBegat doin ur %s", sel_getName(_cmd));
// Actually calls original implementaton.
[self $hackFirstBegat_Hornswoggle];
}
@end
Even though we’re swizzling the methods badly, this is the coding technique I use even when swizzling them correctly. This adds a new method, does some work, and then calls the original method. Enjoy this. Revel in the horror. What’s going on?
First, the machinery. SwizzleMethodBadly
, and soon enough SwizzleMethod
, exchanges two methods in a single class. If you want to whack FirstBegat
’s class, you need to have a method that’s inside of FirstBegat
’s code pile. A category is a great way to do that. Declare the hck_hijackMethods
method (and remember to be safe and always prefix your categories.):
@interface FirstBegat (HackNSlash)
+ (void) hck_hijackMethods;
@end
And then implement it:?
+ (void) hck_hijackMethods {
SwizzleMethodBadly ([self class], @selector(hornswoggle),
@selector($hackFirstBegat_Hornswoggle));
}
All it does is swap the selector bamboozle with something with a (more) bizarre name. What’s up with that name? The dollar sign is a legal identifier character in C, along with with the usual alphanumerics and the underscore. Not many people know about the dollar sign, and those that do typically use it to indicate that something weird or hacky is coming up. In this case, the method name $hackFirstBegat_Bamboozle
tells me that a) “$
” something gross is coming up and it’s most likely a swizzle, b) the class that’s being attacked, and c) some indication of the method being changed. That way it’s pretty obvious that this method is used in a swizzling situation and what it does. You could just as easily called it fluffyBunny
and things would work the same.
Now for the method that gets swizzled. This is what will be invoked when someone sends the -bamboozle
message. The first two lines are straightforward – the method signature and the work being done. In this case, printing a message:
- (void) $hackFirstBegat_Hornswoggle {
NSLog (@"I'm in ur FirstBegat doin ur %s", sel_getName(_cmd));
The second method calls the original implementation. This code isn’t replacing the original code, just augmenting it.
[self $hackFirstBegat_Hornswoggle];
}
But wait, you say. Isn’t this a recursive call? We’re inside of $hackFirstBegat_Hornswoggle
, and now you’re calling itself again!
Remember that the methods have been exchanged. The class’s selector-method dictionary now looks like this:
Remember how things work : [self $hackFirstBegat_Hornswoggle]
tells the Objective-C runtime to access self’s class, dig into the code map, look up $hackFirstBegat_Hornswoggle
, find the code there, and jump to it. Thanks to method_exchangeImplementations
, this selector is currently pointing to the old code, and so we’re calling the original code here, even though it’s been filed under our new name. Yeah, it looks pretty weird, but that’s how it works.
And speaking of “works”, it seems to work. After swizzling the methods, and telling first to doStuff, you can see the new code being run:
FirstBegat, do stuff!
I'm in ur FirstBegat doin ur hornswoggle
FirstBegat hornswoggle calling
BaseClass hornswoggle
FirstBegat bamboozle calling
BaseClass bamboozle
Woo! Ship it!
There’s just one problem. class_getInstanceMethod
does its job too well. If this function can’t find the method in the given class, it’ll go up the inheritance chain and find and the first implementation it can. If you exchangeImplementations
with that Method
, you’ve now put your code into an unexpected class.
Imagine intending to swizzle a method on NSButton
intending to only affect buttons, but instead you whack NSView
. Now every single view has your swizzle code, potentially leading to hilarious results. This is something that could bite you with OS updates. Say Apple did a refactoring of a class you’ve swizzled and removed the need for that class to override a method. Your working swizzle, which was using this buggy swizzle implementation, is now broken.
Want proof? Remember that SecondBegat overrides hornswoggle
, but not bamboozle
. If we SwizzleBadly
SecondBegat’s bamboozle
, it will whack FirstBegat
’s version. Here’s the swizzling:
@interface SecondBegat (HackNSlash)
+ (void) hck_hijackMethods;
@end
@implementation SecondBegat (HackNSlash)
+ (void) hck_hijackMethods {
SwizzleMethodBadly ([self class], @selector(bamboozle),
@selector($hackSecondBegat_Bamboozle));
}
- (void) $hackSecondBegat_Bamboozle {
NSLog (@"I'm in ur SecondBegat doin ur %s", sel_getName(_cmd));
}
@end
So, we’ve changed SecondBegat
. Not a FirstBegat
in sight. But now, send first
the doStuff
message:
[first doStuff];
...
FirstBegat, after bad swizzling, do stuff!
I'm in ur FirstBegat doin ur hornswoggle
FirstBegat hornswoggle calling
BaseClass hornswoggle
I'm in ur <strong>SecondBegat</strong> doin ur bamboozle
Whoa^2. The SecondBegat
code is now being run. And because the $hackSecondBegat
method didn’t “send super”, the FirstBegat
and BaseClass
code is now completely cut off.
Here is the canonical way to swizzle methods:
void SwizzleMethod (Class clas, SEL originalSelector, SEL newSelector) {
Method originalMethod =
class_getInstanceMethod (clas, originalSelector);
Method newMethod =
class_getInstanceMethod (clas, newSelector);
BOOL addedMethod =
class_addMethod (clas, originalSelector,
method_getImplementation(newMethod),
method_getTypeEncoding(newMethod));
if (addedMethod) {
class_replaceMethod (clas, newSelector,
method_getImplementation(originalMethod),
method_getTypeEncoding(originalMethod));
} else {
method_exchangeImplementations (originalMethod, newMethod);
}
} // SwizzleMethod
There’s a bit of subtlety involved, so time for a line-by-line. Here’s the call:
SwizzleMethod ([self /* <strong>SecondBegat</strong> */ class],
@selector(<strong>bamboozle</strong>),
@selector(<strong>$hackSecondBegat_Bamboozle</strong>));
The first two lines get the Method objects that underly those selectors in the SecondBegat
class.
Method originalMethod =
class_getInstanceMethod (clas, originalSelector); // bamboozle
Method newMethod =
class_getInstanceMethod (clas, newSelector); // $hackSecond...
newMethod
will be code that fer-sher lives in SecondBegat
because we’re passing in the selector we’re swapping with ($hackSecond
). originalMethod
could be a method that lives in SecondBegat
. It might live in FirstBegat
, or even BaseClass
or NSObject
.
How to tell? class_addMethod
fails if there’s already a method for that selector in its code pile. So, try adding the $hackSecond
method to SecondBegat
’s class under the bamboozle
selector:
BOOL addedMethod =
class_addMethod (clas, // SecondBegat
originalSelector, // @selector(bamboozle)
method_getImplementation(newMethod), // hackSecond$...
method_getTypeEncoding(newMethod));
If this succeeds, it means that SecondBegat
did not have a bamboozle
of its own, but it does now (pointing to $hackSecond
). If it fails, nothing happens, and we know that SecondBegat
did, in fact, have its own bamboozle
.
if (addedMethod) {
class_replaceMethod (clas, // SecondBegat
newSelector, // hack$
method_getImplementation(originalMethod), // bamboozle
method_getTypeEncoding(originalMethod));
So if we added the method, that means that now SecondBegat
has a new -bamboozle
that points to the $hackSecond
method. To complete the swizzle, $hackSecond
needs to point to the original y code. class_replaceMethod
says “hey class! Whatever you have this selector pointing to in your code pile, replace it with this method”. In this case, replacing $hackSecond
with the original -bamboozle
. Done. This is the code path that this particular swizzle will take.
Now for the other case, where the method add failed because the class already had a -bamboozle
method. Just exchange the two:
} else {
method_exchangeImplementations (originalMethod, // bamboozle
newMethod); // $hackSecond
}
And the trick, she is done.
I hope I’ve hammered the point home that this is powerful, potentially dangerous tool. A Sawzall is a powerful tool, but it could be dangerous if you accidentally cut a conduit apart in your home. But there are some legitimate uses out here in application land. There might be a really bad bug in a new version of the toolkit that has no other workaround. You might want to change the behavior of a helper class during a unit test. You might be tracking down a bug and be wondering “what data is really flowing through this Cocoa method” – swizzle in a spy to print out what it’s seeing, maybe modify it, and then send control on to the existing code.
This wraps up my tour of the low-level goodies that exist in the Objective-C runtime. I didn’t cover everything – there’s just not enough time. Also, a reason a lot of this exists is to bridge to other languages, which is a pretty esoteric topic even for me. There are practical uses for some of this stuff, and I’m a believer that knowledge is power, no matter how skanky the hacks are to obtain that knowledge. Just be professional in what you ship to paying customers, whether they’re paying in money or their time.
_18 months ago MarkD started writing for the Big Nerd Ranch blog. 70 some-odd articles (and some are pretty odd) and 110,000 words later, we’re sending him away for a well-deserved blogging vacation. Don’t worry, he’ll be back in the fall.
_
Our introductory guide to Swift Regex. Learn regular expressions in Swift including RegexBuilder examples and strongly-typed captures.
The Combine framework in Swift is a powerful declarative API for the asynchronous processing of values over time. It takes full advantage of Swift...
SwiftUI has changed a great many things about how developers create applications for iOS, and not just in the way we lay out our...