Swift Regex Deep Dive
iOS MacOur introductory guide to Swift Regex. Learn regular expressions in Swift including RegexBuilder examples and strongly-typed captures.
Just got back from a weekend at CocoaConf down in Herndon VA. A lot of great sessions from Ranch folks, and from everyone else as well. One of my favorites was the session Chad Sellers from Useful Fruit had on Text, covering the text system from NSString
up through NSTextView
and back down to Core Text. I learned stuff.
One interesting point Chad brought up was his surprise that more programmers didn’t know about some of the useful built-in features of NSString
, instead preferring to fall back on complicated code or extensive use of stringWithFormat:
. My guess is that many programmers see how NSString
is used intro books and don’t pursue them any farther. I decided to spelunk the NSString
header file and play with some of the cooler methods. You can get the code (such as it is) at this gist.
A common operation in scripting languages is splitting and joining a string. Splitting a string slices the string in multiple places, yielding an array of substrings. You supply a delimiter string which is where things will get chopped.
Starting out with a rather sad string:
NSString *splitString = @"hello:-(there:-(everybody";
You can split it up into words by taking out the frowny faces:
NSArray *split = [splitString componentsSeparatedByString: @":-("];
Afterwards you get an array that looks like this:
split: (
hello,
there,
everybody
)
When you join strings, you start with an array of strings and then glue them together with a given character string. Turning that frown up-side down with:
NSString *joinString = [split componentsJoinedByString: @":-)"];
Resulting in hello:-)there:-)everybody
.
If you’ve ever wanted to turn an array of strings into a comma-separated list by writing a complex loop, making sure you don’t add a comma before the first item or add an extra trailing one, you’ll want to jettison all that code and just use componentsJoinedByString:
You can extract substrings based on character index. Old-school BASIC programmers might remember the LEFT$
, RIGHT$
, and MID$
functions that extract the leftmost-, rightmost-, and central-N characters from the string. NSString
’s substringToIndex:
, substringFromIndex:
, and substringWithRange:
provide the same services.
Here’s a word, which happens to contain three other words. Time to extract them:
NSString *subby = @"whatsoever";
And pull out the wordss:
NSString *first = [subby substringToIndex: 4]; // LEFT$
NSString *last = [subby substringFromIndex: 6]; // RIGHT$
NSRange range = NSMakeRange (4, 2);
NSString *middle = [subby substringWithRange: range]; // MID$
Which when printed:
NSLog (@"%@ -> %@ %@ %@", subby, first, middle, last);
gives us:
whatsoever -> what so ever
This should make visualizing the indexes used a little easier:
The documentation warns that if you’re dealing with composed characters, that you should use rangeOfComposedCharacterSequencesForRange
to avoid splitting them.
You can ask a string whether it starts out with a particular string with hasPrefix, or if it ends with one with hasSuffix. Here’s a tweet:
NSString *tweet = @"@wookiee your pony is looking colorful today!";
And you can do some quick analysis of it:
if ([tweet hasPrefix: @"@"]) NSLog (@" tweet is a reply");
if ([tweet hasSuffix: @"!"]) NSLog (@" tweet is excitable");
if ([tweet hasSuffix: @"?"]) NSLog (@" tweet is questionable");
This would print out
tweet is a reply
tweet is excitable
You don’t need to use stringWithFormat: @"%@%@"
to smash two strings together. That format string always reminds me of a three-eyed two-nosed alien. You can append one string to another with stringByAppendingString:
NSString *cookie = @"I made you a cookie. ";
NSString *eated = @"But I eated it";
NSString *cat = [cookie stringByAppendingString: eated];
Which results in:
<a href="http://icanhascheezburger.files.wordpress.com/2007/01/2000035887522228730_rs.jpg">I made you a cookie. But I eated it.</a>
Don’t forget that literal NSString
s are objects too. You can send messages to @"strings"
:
cat = [@"I made you a cookie. " stringByAppendingString: eated];
This is perfectly legal.
You can pull numbers out of strings with digits. Leading whitespace (characters in whitespaceSet
) is stripped out, then the digits interpreted, stopping at the end of a string or when hitting an irrelevant character. Don’t forget there’s NSScanner
and NSNumberFormatter
for more exact parsing of numbers.
intValue
extracts an integer value, floatValue
and doubleValue extract floating point values, and so on.
Here’s a set of strings, and a display of their intValues
:
NSArray *intValues =
[NSArray arrayWithObjects: @"1", @" 2bork", @"t t3greeble5",
@"-12", "-", @"", nil];
for (NSString *scan in intValues) {
NSLog (@" %@ -> %d", scan, [scan intValue]);
}
Prints out:
1 -> 1
2bork -> 2
3greeble5 -> 3
-12 -> -12
- -> 0
-> 0
There’s also a boolValue
that looks for numerical values, as well as things like kind of look like “yes” and “no”:
NSArray *boolValues =
[NSArray arrayWithObjects:
@"1", @"2", @"y", @" yEs", @"Yes", // good
@"0", @"n", @"tNo", @"NO", // bad
@"", @"gronk", nil]; // ugly
for (NSString *scan in boolValues) {
NSLog (@" %@ -> %d", scan, [scan boolValue]);
}
Which prints out:
1 -> 1
2 -> 1
y -> 1
yEs -> 1
Yes -> 1
0 -> 0
n -> 0
No -> 0
NO -> 0
-> 0
gronk -> 0
Want to see if two strings have leading characters in common? You can get the common prefix:
NSString *thing1 = @"BigNerdRanch";
NSString *thing2 = @"Bigbooty, John";
NSString *commonPrefix = [thing1 commonPrefixWithString: thing2
options: 0];
Results in a common prefix of “Big”. The options are the usual parameters you pass to methods like this, like NSCaseInsensitiveSearch
.
You can force a string to be all upper or lower case, or only capitalize words. Say one of your local script kiddies has taken interest in American History:
NSString *sentence =
@"fOUr scOrE aND sEveN-yEARs aGo ouR FAthers brouGHT Fourth.";
You can regularize the sentence to be all upper- or all lower-case:
NSString *upper = [sentence uppercaseString];
NSString *lower = [sentence lowercaseString];
The results should be obvious. You can also capitalize each word:
NSString *capped = [sentence capitalizedString];
Which results in:
Four Score And Seven-Years Ago Our Fathers Brought Fourth.
Sometimes you get a string from somewhere, say over the network or from a text file, which would be perfect if not for the extraneous white space surrounding it, like extra tabs, spaces or newlines. You can use stringByTrimmingCharactersInSet:
to remove the characters in a set. These sets can be any set of characters – you can build your own character sets, or use something provided by cocoa.
Here’s how to remove whitespace from the ends of the string, but not any that’s located in the middle:
NSString *original = @"n tt hello there tn nn";
NSString *trimmed =
[original stringByTrimmingCharactersInSet:
[NSCharacterSet whitespaceAndNewlineCharacterSet]];
The resulting string is “hello there”.
The last stop on this tour is stringByPaddingToLength:...
which can be used to extend a string or truncate it (which is what everyone thinks when padding something…) It takes three parameters: the length to pad the string (or truncate if the string is longer than the length), a string to use to pad it with, and the index in the pad where it should start the padding.
Here’s a string:
NSString *original = @"I've got a bad feeling about this ";
You can truncate the string:
NSString *shorter = [original stringByPaddingToLength: 22
withString: nil
startingAtIndex: 0];
Resulting in "I've got a bad feeling"
Or add a bunch of leader dots for a monospaced table of contents:
NSString *leader = [original stringByPaddingToLength: 40
withString: @"."
startingAtIndex: 0];
Yielding
I've got a bad feeling about this ......
(“Leader” is the technical term for that row of dots.)
And you can pad with more than single characters
NSString *longer = [original stringByPaddingToLength: 40
withString: @"(:-"
startingAtIndex: 0];
Giving
I've got a bad feeling about this (:-(:-
You can change the phase, which is where it starts reading the padding string. Starting the pad at the first character (the colon) changes the resulting string:
NSString *phased = [original stringByPaddingToLength: 40
withString: @"(:-"
startingAtIndex: 1];
It makes a little more sense now:
I've got a bad feeling about this :-(:-(
So, is there a use of the startAtIndex parameter aside from getting frowny faces to render correctly? One use (actually, the only use that I could come up with) is if you have a multiple character leader string and you want it to be in sync from line to line, even if the beginning strings are of different lengths.
Here is a two character leader: @"-="
being used to pad two different strings:
NSString *thing = @"thing";
NSString *thingy = @"thingy";
NSString *outOfPhaseThing = [thing stringByPaddingToLength: 30
withString: @"-="
startingAtIndex: 0];
NSString *outOfPhaseThingy = [thingy stringByPaddingToLength: 30
withString: @"-="
startingAtIndex: 0];
This outputs:
thing-=-=-=-=-=-=-=-=-=-=-=-=-
thingy-=-=-=-=-=-=-=-=-=-=-=-=
Kind of lame. Would be nice to have both lines be in sync, but they’re out of sync because thingy is one character longer than thing. The solution? Start the padding in a different place for them:
NSString *inPhaseThing = [thing stringByPaddingToLength: 30
withString: @"-="
startingAtIndex: <strong>thing.length % 2</strong>];
NSString *inPhaseThingy = [thingy stringByPaddingToLength: 30
withString: @"-="
startingAtIndex: <strong>thingy.length % 2</strong>];
If the string has an even number of characters, the leader starts with the first character of the pad. If the string has an odd number of characters, the leader starts with the second character. Things look a lot nicer now:
thing=-=-=-=-=-=-=-=-=-=-=-=-=
thingy-=-=-=-=-=-=-=-=-=-=-=-=
There of course are a lot more string-oriented calls. I omitted the obvious ones that get used on a regular basis, such as those that sort, search, compare, and convert. There’s also an NSStringPathExtensions
category that add path manipulation methods like lastPathComponent
or stringByDeletingPathExtension
. You might want to check those out if you’re doing file system path processing. There’s also functions for converting structs into human-readable strings (e.g. NSStringFromCGRect
/ NSStringFromRect
)
_Got any favorite Stupid String Tricks? Leave ‘em in the comments!
_
Our introductory guide to Swift Regex. Learn regular expressions in Swift including RegexBuilder examples and strongly-typed captures.
The Combine framework in Swift is a powerful declarative API for the asynchronous processing of values over time. It takes full advantage of Swift...
SwiftUI has changed a great many things about how developers create applications for iOS, and not just in the way we lay out our...