Swift Regex Deep Dive
iOS MacOur introductory guide to Swift Regex. Learn regular expressions in Swift including RegexBuilder examples and strongly-typed captures.
Update October 2013 – On 64-bit iOS (device and simulator) BOOL
is now actually bool
, so the sharp corners have thankfully gone away for that platform. For everything else, though…
Objective-C is actually a pretty old language, dating back from from the mid eighties. As such, it’s got some sharp corners here and there due to limitations of early C. Today’s sharp corner is BOOL
.
BOOL
seems innocuous enough. It holds a boolean value. We All Know that in C a zero is a false value, and non-zero is a true value. There’s even the handy YES
and NO
macros to represent truth and untruth. It turns out that BOOL
is actually just a typedef for a signed char
– an eight-bit value. It’s not a first-class boolean type, which is one with compiler support (like bool
) to make sure it acts sanely. There’s a couple of places where BOOL
’s non-first-classitude can cause subtle problems.
YES
is Objective-C’s truth value, equal to the value 1. BOOL
can actually hold any non-zero value from -128 through 127 (in addition to zero), so there are 255 different flavors of truth. It’s a nice convention that functions and methods always return YES
or NO
, but you can’t depend on a truth value from one of these to be equal to YES
(the value 1). Comparing to YES
is generally a bad idea:
if ([kitteh haz: cheezburger] == YES) {
// LOL
}
You have to trust that -haz:
returns YES
or NO
, and no other value. Without looking at the source or disassembly I can’t trust its truth value is always YES
. There is no compiler enforcement that a BOOL
return value, or variable, always holds NO
or YES
.
How could a non-YES
value find its way coming out of a function? A common C idiom (not necessarily ideal, but common) is to do some kind of math and then use that value to determine a Truth. If the math results in zero, then the Truth is false. If the math results in a non-zero value, then the truth is True.
Here’s a contrived piece of logic – given two integers, indicate if they’re different. An experienced C programmer might do this:
static BOOL different (int thing1, int thing2) {
return thing1 - thing2;
} // difference
(You can find this code, and other code referenced here at this gist)
If you used a bool
return type this function would actually work correctly.
But with Objective-C, the BOOL
return value is actually cast to a char
and returned, so it will be the result of the subtraction modulo 256. Comparing to YES
will end up causing false negatives.
Here are some uses of this function, comparing to YES
:
if (different(11, 10) == YES) printf ("11 != 10n");
else printf ("11 == 10n");
if (different(10, 11) == YES) printf ("10 != 11n");
else printf ("10 == 11n");
if (different(10, 15) == YES) printf ("10 != 15n");
else printf ("10 == 15n");
if (different(512, 256) == YES) printf ("512 != 256n");
else printf ("512 == 256n");
You’d hope that all of these would print out “thing1 != thing2” in all cases. But that’s not the case:
11 != 10
10 == 11
10 == 15
512 == 256
Only the first case correctly says that the two numbers are different. If you only had the first case in your unit test, you’d think that things were working correctly. (ship it!)
The actual return values from different
are 1, -1, -5, and 0. Only the first one happens to equal YES
. And notice the last expression actually evaluated to NO
. More on this weirdity in a bit.
Because I don’t trust the general programmer population to be aware of this subtlety when using BOOL
, I use this idiom for checking BOOL
values:
if ([kitteh haz: cheezburger]) {
// LOL
}
Of course, it’s always safe to compare against NO
, because there is only one false value in C expressions – zero.
When I write code that returns a BOOL
, I return a YES
or NO
value explicitly, versus doing some kind of arithmetic. You can also rely on logical expressions, though, to return values of zero and one, which happen to map to NO
and YES
.
I was actually caught unawares by this behavior, and was a good Learning Experience for me. I was doing a code review at Google, and saw this code:
BOOL something () {
// stuff stuff stuff
return a == b;
}
And I dinged it, saying the return should be return (a == b) ? YES : NO;, because I thought the value of the expression was undefined, and would just evaluate to a C truth or false value. (I blame some buggy compilers in my distant past.)
One of my gurus there, David Philip Oster, countered that it was actually legal, and the value of logical expressions is well-defined. When DPO makes a statement like that, he’s usually correct. So after a couple of minutes of writing some test code and finding chapter and verse in the C99 and C++ standards, I was convinced: logical expressions in C and C++ will always evaluate to a bool
true or false value, which are one and zero, and happily equal to YES
and NO
.
I still won’t ever compare directly to YES
because I don’t have time to review everyone’s code I call.
Another BOOL
sharp corner is a variation of the above. Not only can a non-NO
BOOL
have a non-YES
value, sometimes it can be NO
.
That sounds scary. How can that happen?
BOOL
is a char
, which is eight bits. If you try to squeeze a value larger than a char
through BOOL
, the compiler will happily truncate the upper bits, slicing them off.
So, what does this code print?
BOOL truefalse = (BOOL)256;
printf ("256 -> %dn", truefalse);
Zero
. NO
. This “true” value is actually zero. Why? Here it is bitwise:
The compiler happily stripped off the upper byte, leaving zeros. Granted, a casting of a large constant into a BOOL
might raise a red flag. But then again, the different
function returned the value of a subtraction, truncating the value silently and didn’t require a cast.
The same code, but using a standard bool
type works fine:
bool stdTruefalse = (bool)256;
printf ("256 -> %d (bool)n", stdTruefalse);
So if your entire C boolean experience has been with with first-class types like bool
(lower-case), then the BOOL
(upper-case) behavior probably comes as a surprise.
The same thing can happen with pointers. (Thanks to Mike Ash for introducing me to this class of errors.) You might think this is safe code:
static NSString *g_name;
static BOOL haveName () {
return (BOOL)g_name;
} // haveName
If g_name
is non-nil
, it’s a true value, so this function will be returning a true value. It might not be YES
, but truth is truth in C, isn’t it? If this function returned bool
, it would return the correct value.
Unfortunately, with BOOL
, it doesn’t. If the address happens to have a zero lower byte, this function will return zero due to the same slicing behavior.
Here’s two cases. The first is the string living at address 0x010CA880
.
That returns a true value, 0x80. It’s not YES
, but it’s true. So code in general still works. If the address happens to be aligned in memory such that the lower byte is zero, say 0x010CA800
, the compiler would slice off the top three bytes leaving a zero:
So you can see the function works in most cases, except for those times when the string happens to lie at particular addresses. It only fails every now and then. This is the kind of bug that can change from run to run.
Luckily clang
and current versions of gcc
complain if you try to pass a larger integer or pointer through a BOOL
, requiring a cast. Hopefully adding the cast will raise some red flags in the programmer writing the code.
The take-away from this? For this kind of test I would either return an explicit YES
or NO
value:
if (g_name) return YES;
else return NO;
or use a logical expression
return g_name != nil;
And not depend on automatic pointer->BOOL
(or any integer->BOOL
) behavior.
So, what’s the point of all of this? Mainly that us as users of Objective-C, we can’t forget the C portion of the language. There are the occasional sharp corners that come from Objective-C’s C heritage that we need to be aware of, such as BOOL
being a signed char. Many situations that work correctly with bool
, the first-class boolean type, can fail in weird and wonderful ways with BOOL
.
Our introductory guide to Swift Regex. Learn regular expressions in Swift including RegexBuilder examples and strongly-typed captures.
The Combine framework in Swift is a powerful declarative API for the asynchronous processing of values over time. It takes full advantage of Swift...
SwiftUI has changed a great many things about how developers create applications for iOS, and not just in the way we lay out our...