Swift Regex Deep Dive
iOS MacOur introductory guide to Swift Regex. Learn regular expressions in Swift including RegexBuilder examples and strongly-typed captures.
This is the first of a four-parter. You can find the other parts here: two, three, four
My day-to-day work is building Mac and iOS software. That means I spend a fair amount of time inside of the Usual Tools for that. Xcode. Interface Builder. Instruments. gdb
and lldb
. There’s another tool out there that many programmers don’t think to reach for, and that’s DTrace. Just the other day I needed to see how an app I inherited was working with NSUndoManager
so I could fix some weird behavior. With DTrace, I quickly found out where every NSUndoManager
was created (turned out to be four), and more importantly, where each one was being used, complete with stack traces.
You can do that with debuggers. You can set breakpoints with regular expressions. And perhaps add some scripts, or Xcode breakpoint settings for particular calls. You could also write some lldb
python script to do some magic. Sounds like a lot of work. My DTrace thing? One command:
# dtrace -n 'objc57447:NSUndoManager*::entry { printf("%x", arg0); ustack(20,0); }'
-n 'objc57447:NSUndoManager:-init:return { printf("%x", arg1); ustack(20, 0); }'
OK, so that looks like gibberish, like many command lines someone pulls out of their shell history and posts saying “Look at me! Look how wonderful I am!” That being said, DTrace is a superbly powerful tool, and is useful for more than system administrators and kernel developers. Before I can really dig into this command, and the usefulness of DTrace for Objective-C developers, I’ll cover a little bit of DTrace first. Later on I’ll cover my undo manager story. Also my colleague at the Ranch Adam Preble hinted about using DTrace in Objective-C in an earlier posting. We’ll dig into exactly what he did to track down that problem with DTrace.
What is this “DTrace” thing? It stands for “Dynamic Tracing”, a way you can attach “probes” to a running system and peek inside as to what it is doing. It was created by Sun for Solaris, and was ported to the Mac in the Mac OS X 10.5 “Leopard” time frame. DTrace is not available on iOS, but you can use it in the simulator.
Imagine being able to say “whenever malloc is called in Safari, record the amount of memory that’s been asked for.” Or, “whenever anyone opens Hasselhoff.mov on the system, tell me the app that’s doing so.” Or, “show me every message being sent to this particular object.” Or, “show me where every NSUndoManager
has been created, and tell me the address of that object in memory.”
DTrace lets you do this by specifying probes. A probe describes a function, or a set of functions, to intercept. You can say “when some function is about to run, do some sequence of steps”, or “when some function exits, tell me what the return value is, and where in that function it actually returned.” What’s neat about DTrace is that this happens dynamically (hence the name). You don’t have to do anything special to participate. Any function or method in your code can be traced.
You command DTrace by writing code in the “D Language”, which is not related to that other D Language. DTrace’s D is a scripting language that has a strong C flavor, so much of C is available to you. It gets compiled by the dtrace command into bytecode, shipped to the kernel, where it runs. It’s run in a safe mode to minimize impact on the system, so there’s no looping , branch statements, or floating point operations.
A D script is composed of a sequence of clauses which look like this:
syscall::read:entry
/execname != "Terminal"/
{
printf ("%s called read, asking for %d bytesn", execname, arg2);
}
There’s four parts to a DTrace clause. The first line is a probe description:
syscall::read:entry
It’s a four-tuple (don’t programming nerds love terms like that?), that specifies four things:
A provider. This is the entity that provides access to classes of functions. A common provider is syscall, which lets you trace Unix system calls like read and write system-wide. There’s another provider called pid that lets you trace functions inside of a particular program. You’ll see that in part 2. The objc provider lets you poke around in Objective-C land, and you’ll see that in part 3. There’s a lot of other providers available, too.
A module. This depends on the provider. Some providers don’t have modules. Some providers let you specify a particular shared library in the module. Perhaps you only wanted to trace some functions inside of libxm2.dylib – you could put that in the module.
A function. This is the name of the function you want to pay attention to.
A name. The name also depends on the provider, but most let you specify entry
, to run some D code before the function starts running or return
to run some D code after the function has exited.
The parts of the probe description are separated by colons. You have to have all the colons, even if a spot is empty. Leaving an entry acts like a wildcard. Speaking of wildcards, you can use *
and ?
in the style of shell globbing : *
matches zero or more characters, and ?
matches exactly one.
So what does the probe description mean?
syscall::read:entry
It means: for the syscall provider, any module, we’re interested in the read
function. Specifically, right before it starts running. Now, whenever that happens (someone calls read), the rest of the clause will be evaluated.
Next up is the predicate, delimited by two slashes. It’s optional. The expression is evaluated, and if it evaluates to a true (non-zero) value, the D code gets run. The predicate above is:
/execname != "Terminal"/
This means, if the current executable name (execname
is a built-in variable) is Terminal, don’t do anything. The dtrace
command will be printing stuff out in the Terminal, so you know it’s going to be reading stuff. No need to see it clutter up the output for this particular example.
The last part, the code in braces, are known as the actions. The actions are run when the probe description matches a firing function, and the predicate evaluates to true. In this case, the actions just print out some stuff:
{
printf ("%s called read, asking for %d bytesn", execname, arg2);
}
In particular, it prints out the name of the executable, and the number of bytes read.
What’s that arg2
? It’s a built-in variable for the third argument to read
. (You can guess there’s also an arg0
, arg1
, and so on). Recall read()’s prototype:
ssize_t read (int fileDescriptor, void *buffer, size_t bytesToRead);
Number of bytes to read is the third argument, so it’s the second index. You could, if you wanted to, have your actions block examine which file descriptor is being used, where the destination buffer is, and how much they want to read.
Now to actually run this stuff. The dtrace
command needs to be run as root – with superuser privileges. Ordinarily you’d use sudo
to run a command as root, but some DTrace behaviors don’t work reliably when run with sudo
, so you can start a root shell with sudo -i
, or give it a shell, like with sudo bash
.
Assuming that script above is saved in a file named read.d
, you would run it like
# dtrace -s read.d
dtrace: script 'read.d' matched 1 probe
CPU ID FUNCTION:NAME
2 115 read:entry WebProcess called read, asking for 32768 bytes
2 115 read:entry WebProcess called read, asking for 32768 bytes
1 115 read:entry mds called read, asking for 147424 bytes
0 115 read:entry fseventsd called read, asking for 189799 bytes
0 115 read:entry Activity Monito called read, asking for 27441 bytes
0 115 read:entry Activity Monito called read, asking for 19253 bytes
0 115 read:entry Activity Monito called read, asking for 11061 bytes
1 115 read:entry Propane called read, asking for 32768 bytes
2 115 read:entry Google Chrome H called read, asking for 1 bytes
^C
When run, dtrace
tells you the number of probes it matched, and then prints out a new line when the probe fires. It tells you what CPU it triggered on, the ID of the probe, the description of the probe that fired, and the text generated by the printf statement.
A lot of stuff is happening on the system, even though it kind of looks like it’s just sitting there. When you’re done watching the reads scroll by, type Control-C to terminate dtrace
dtrace has a man
page the describes the command-line flags, and there are numerous books and online tutorials you can explore. There’s also a nice set of slides from a DTrace Bootcamp. It’s not a Mac-only technology, but a great deal is common between Solaris and the Mac. Here are some commands:
-l
(dash ell) tells dtrace
to list all probes. If there are no other arguments, dtrace lists all the probes it knows about out of the box, without paying attention to any particular running program:
# dtrace -l
You’ll get a lot of output if you actually try it. On my systems there are over 200,000 individual probes. You can redirect and pipe through grep
, or you can redirect it to a file and poke around with your favorite text editor.
You can limit the output to just the probes that match a particular description. -n
lets you specify a probe by “name”, by providing a probe description:
# dtrace -l -n 'syscall::read:entry'
This tells you it found read
:
ID PROVIDER MODULE FUNCTION NAME
115 syscall read entry
You can also use wildcards:
# dtrace -l -n 'syscall::read*:entry'
ID PROVIDER MODULE FUNCTION NAME
115 syscall read entry
225 syscall readlink entry
349 syscall readv entry
901 syscall read_nocancel entry
931 syscall readv_nocancel entry
Lots of different kinds of read
functions.
DTrace is great for one-liners – single commands that do stuff. read.d
from above can be condensed to a single line:
# dtrace -n 'syscall::read:entry /execname != "Terminal"/ { printf ("%s called read, asking for %d bytesn", execname, arg2); }'
Why is this useful, outside of “Hey look at me”? You can experiment entirely on the command line, without having to bounce between and editor an a shell. I find my DTrace stuff tends to be very ad-hoc. “What is this thing doing, right now” and “how do I find out this one piece of information I need, right now.” I tend not to write too many scripts that live in files. It’s a developerism, using little throw-aways to diagnose an immediate problem. Folks who administer machines for a living often have libraries of scripts that monitor and diagnose their systems.
So What. Big Deal. You can see who is reading stuff. Is there any actual use you can put this stuff to, right now, before getting to the pid
and objc
providers later? Why yes!
A friend of mine got an app rejected from the Mac AppStore because his app was (supposedly) attempting to open the Info.plist
in DiskArbitration.framework
in read/write mode. An app really shouldn’t be trying to write to anything in a system framework. Sad thing is, his app wasn’t doing this, at least not explicitly. It just used the framework, including it in Xcode in the usual manner and then calling its API. The framework was being loaded by the usual app startup machinery. It was an unjust accusation, but there’s really no recourse, short of resubmitting the app and hoping it works.
Wouldn’t be nice if you could catch every open
being done by a program, and see what access flags it’s passing? You could take this output and see “oh yeah, for some reason I’m opening this read/write” and do some debugging, or else have some evidence for “I don’t see any wrongdoing”. Sounds like a great application of DTrace.
Together we cooked up a script like this:
syscall::open:entry
/execname == "TextEdit"/
{
printf("Opening %s with flags %xn", copyinstr(arg0), arg1);
}
(OBTW, that copyinstr
action in there copies a string from the running program into dtrace-land so it can be printed.)
We started this script running in dtrace
, and then launched the app. As expected, all of the framework Info.plist opens were with zero flags, which means read-only:
...
0 119 open:entry Opening /System/Library/Keyboard Layouts/AppleKeyboardLayouts.bundle/Contents/Info.plist with flags 0
0 119 open:entry Opening /System/Library/Keyboard Layouts/AppleKeyboardLayouts.bundle/Contents/Info.plist with flags 0
2 119 open:entry Opening /System/Library/Frameworks/AppKit.framework/Resources/Info.plist with flags 0
...
With this output, plus the script, he was able to make a case that no wrongdoing happened. The app was subsequently approved.
Here are some commands to play around with.
See what files are being opened by Time Machine during a backup, including the files being backed up:
# dtrace -n 'syscall::open*:entry /execname == "backupd"/ { printf ("%s", copyinstr(arg0)); }'
Replace “backupd” with “mdworker” to see what Spotlight is indexing on your system.
See every system call being made, what it is, and who’s calling it. This uses another built-in variable, probefunc
, to tell you which function actually triggered. You don’t really know otherwise because wildcarding is in use. There’s also probeprov
, probemod
, and probename
, so you can examine the entire probe description inside of the clause body.
# dtrace -n 'syscall:::entry { printf ("%s called %s", execname, probefunc); }'
See all the processes launching on your system. Especially fun when doing work in the terminal:
# dtrace -n 'proc:::exec-success { trace(execname); }'
Feel free to play around. You can’t damage your system with dtrace, unless you take special pains to do so.
Part 2 features the “pid provider” which lets you peek into programs. Finally, part 3 features the “objc provider” which has goodies applicable to us who play in Cloud-CocoaLand.
Curious about DTrace? Chapter 9 of Advanced Mac OS X Programming: The Big Nerd Ranch Guide is dedicated to DTrace.
Our introductory guide to Swift Regex. Learn regular expressions in Swift including RegexBuilder examples and strongly-typed captures.
The Combine framework in Swift is a powerful declarative API for the asynchronous processing of values over time. It takes full advantage of Swift...
SwiftUI has changed a great many things about how developers create applications for iOS, and not just in the way we lay out our...