The post Swift Regex Deep Dive appeared first on Big Nerd Ranch.
]]>An exciting and new Regex Builder in Swift Regex gives us a programmatic way of creating regular expressions. This innovative approach to creating often complex regular expressions is sure to be an instant winner with the regex neophyte and aficionado alike. We’ll be digging into Regex Builder to discover its wide-reaching capabilities.
Swift Regex brings first-class support for regular expressions to the Swift language, and it aims to mitigate or outright eliminate many of the downsides of regex. The Swift compiler natively supports regex syntax, which gives us compile time errors, syntax highlighting, and strongly typed captures. Regex syntax in Swift is compatible with Perl, Python, Ruby, Java, NSRegularExpression, and many others.
It should be noted that as of the writing of this article, Swift Regex is still in the open beta period. We’ll be using the Swift Regex found in Xcode 14 beta 6.
Swift Regex supports creating a regular expression in several different ways, each of which is useful for different scenarios. First, let’s take a look at creating a compile-time regular expression.
let regex = /\d/
This regular expression will match a single digit. As is typical in regular expression syntax, the expression can be found between two forward slashes; “/<expression>/”. As you can see, this regular expression is a first-class type in Swift and can be assigned directly to a variable. As a Swift type, Xcode will also recognize this regex and provide both compile time checks and syntax highlighting.
Swift has added robust support for regex to a number of common APIs, and using this regular expression couldn’t be easier.
let user = "{name: Shane, id: 123, employee_id: 456}" let regex = /name: \w+/ if let match = user.firstMatch(of: regex) { print(match.output) }
Which gives us the output:
name: Shane
You may be tempted to use the regular expression [a-zA-Z]+
in order to match a word here. However, using \w+
allows the system to take into account the current locale.
Swift Regex also supports creating regular expressions at runtime. Runtime creation of a regular expression has many uses and can be useful for editors, command line tools, and search just to name a few. The expression syntax is the same as a compile time expression. However, they are created in a slightly different manner.
let regex = try Regex(".*\(searchTerm).*")
This regular expression is looking for a specific search term supplied at runtime. Here the regular expression is created by constructing the Regex
type with a String
representing the regular expression. The try
keyword is used since a Regex
can throw an error if the supplied regular expression is invalid.
We can again apply this regex using the firstMatch(of:)
function as in our first example. Note that this time our regex captures the line that matches by using a regex capture, (
, and )
.
let users = """ [ {name: Shane, id: 123, employee_id: 456}, {name: Sally, id: 789, employee_id: 101}, {name: Sam, id: 453, employee_id: 999} ] """ let idToSearch = 789 let regex = try Regex("(.*id: \(idToSearch).*)") if let match = users.firstMatch(of: regex) { print(match.output[1].substring ?? "not found") }
Running the example gives us the following output:
{name: Sally, id: 789, employee_id: 101},
We can gain access to any data captured by the regex via output
on the returned Regex.Match
structure. Here, output
is an existential with the first item, at index 0
, being the regex input data. Each capture defined in the regex is found at subsequent indexes.
The innovative and new Regex Builder introduces a declarative approach to composing regular expressions. This incredible new way of creating regular expressions will open the regex door to anyone who finds them difficult to understand, maintain, or create. Regex builder is Swift’s solution to the drawbacks of the regular expression syntax. Regex builder is a DSL for creating regular expressions with type safety while still allowing for ease of use and expressivity. Simply import the new RegexBuilder
module, and you’ll have everything you need to create and compose powerful regular expressions.
import RegexBuilder let regex = Regex { One(.digit) }
This regular expression will match a single digit and is functionally equivalent to our first compile time regex example, /\d/
. Here the standard regex syntax is discarded in favor of a declarative approach. All regex operations, including captures, can be represented with RegexBuilder
. In addition, when it makes sense, regex literals can be utilized right within the regex builder. This makes for a very expressive and powerful approach to creating regular expressions.
Let’s take a deeper look into RegexBuilder
. In this example, we will use a regex builder to parse and extract information from a Unix top
command.
top -l 1 -o mem -n 8 -stats pid,command,pstate,mem | sed 1,12d
For simplicity, we’ll take the output of running this command and assign it to a Swift variable.
// PID COMMAND STATE MEMORY let top = """ 45360 lldb-rpc-server sleeping 1719M 2098 Google Chrome sleeping 1679M- 179 WindowServer sleeping 1406M 106 BDLDaemon running 1194M 45346 Xcode running 878M 0 kernel_task running 741M 2318 Dropbox sleeping 4760K+ 2028 BBEdit sleeping 94M """
As you can see, the top
command outputs structured data that is well suited for use with regular expressions. In our example, we will be extracting the name, status, and size of each item. When considering a Regex Builder it is useful to break a larger regex down into smaller component parts which are then concatenated by the builder. First, I’ll present the code, and then we’ll discuss how it works.
// 1 let separator = /\s{1,}/ // 2 let topMatcher = Regex { // 3 OneOrMore(.digit) // 4 separator // 5 Capture( OneOrMore(.any, .reluctant) ) separator // 6 Capture( ChoiceOf { "running" "sleeping" "stuck" "idle" "stopped" "halted" "zombie" "unknown" } ) separator // 7 Capture { OneOrMore(.digit) // /M|K|B/ ChoiceOf { "M" "K" "B" } Optionally(/\+|-/) } } // 8 let matches = top.matches(of: topMatcher) for match in matches { // 9 let (_, name, status, size) = match.output print("\(name) \t\t \(status) \t\t \(size)") }
Running the example gives us the following output:
lldb-rpc-server sleeping 1719M Google Chrome sleeping 1679M- WindowServer sleeping 1406M BDLDaemon running 1194M Xcode running 878M kernel_task running 741M Dropbox sleeping 4760K+ BBEdit sleeping 94M
Here is a breakdown of what is happening with the code:
separator
variable. We can then use separator
within the regex builder in order to match column separators.Regex
and assign it to topMatcher
.CharacterClass
. CharacterClass
is a struct that conforms to RegexComponent
and is similar in function to a CharacterSet
. The .digit
CharacterClass defines a numeric digit.Output
of the regex and are indexed based on their position within the regex.ChoiceOf
is equivalent to a regex alternation (the |
regex operator) and cannot have an empty block. You can think of this as matching a single value of an Enum
. Use when there are a known list of values to be matched by the regular expression.Optionally
component can take a regex literal as its parameter.matches(of:)
function. We assign the returned value to a variable that will allow use to access the regex output and our captured data.output
property of the regex returned data contains the entire input data followed by any captured data. Here we are unpacking the the output
tuple by ignoring the first item (the input) and assigning each subsequent item to a variable for easy access.As you can see from this example, the Swift regex builder is a powerful and expressive way to create regular expressions in Swift. This is just a sampling of its capability. So, next, let’s take a deeper look into the Swift regex builder and its strongly typed captures.
One of the more unique and compelling features of the Swift regex builder are strongly typed captures. Rather than simply returning a string match, Swift Regex can return a strong type representing the captured data.
In some cases, especially for performance reasons, we may want to exit early if a regex capture doesn’t meet some additional criteria. TryCapture
allows us to do this. The TryCapture
Regex Builder component will pass a captured value to a transform
closure where we can perform additional validation or value transformation. When the transform
closure returns a value, whether the original or a modified version, it is assumed valid, and the value is captured. However, when the transform
closure returns nil
, matching is signaled to have failed and will cause the regex engine to backtrack and try an alternative path. TryCapture
s transform
closure actively participates in the matching process. This is a powerful feature and allows for extremely flexible matching.
Let’s take a look at an example.
In this example, we will use a regex builder to parse and extract information from a Unix syslog
command.
syslog -F '$((Time)(ISO8601)) | $((Level)(str)) | $(Sender)[$(PID)] | $Message'
We’ll take the output of running this command and assign it to a Swift variable.
// TIME LEVEL PROCESS(PID) MESSSAGE let syslog = """ 2022-06-09T14:11:52-05 | Notice | Installer Progress[1211] | Ordering windows out 2022-06-09T14:12:18-05 | Notice | Installer Progress[1211] | Unable to quit because there are connected processes 2022-06-09T14:12:30-05 | Critical | Installer Progress[1211] | Process 648 unexpectedly went away 2022-06-09T14:15:31-05 | Alert | syslogd[126] | ASL Sender Statistics 2022-06-09T14:16:43-05 | Error | MobileDeviceUpdater[3978] | tid:231b - Mux ID not found in mapping dictionary """
Next, we use Swift Regex to extract this data, including the timestamp, a strongly typed severity level, and filtering of processes with an id of less than 1000.
let separator = " | " let regex = Regex { // 1 Capture(.iso8601(assuming: .current, dateSeparator: .dash)) // 2 "-" OneOrMore(.digit) separator // 3 TryCapture { ChoiceOf { "Debug" "Informational" "Notice" "Warning" "Error" "Critical" "Alert" "Emergency" } } transform: { // 4 SeverityLevel(rawValue: String($0)) } separator // 5 OneOrMore(.any, .reluctant) "[" Capture { OneOrMore(.digit) } transform: { substring -> Int? in // 6 let pid = Int(String(substring)) if let pid, pid >= 1000 { return pid } return nil } "]" separator OneOrMore(.any) } // 7 let matches = syslog.matches(of: regex) print(type(of: matches[0].output)) for match in matches { let (_, date, status, pid) = match.output // 8 if let pid { print("\(date) \(status) \(pid)") } } // 9 enum SeverityLevel: String { case debug = "Debug" case info = "Informational" case notice = "Notice" case warning = "Warning" case error = "Error" case critical = "Critical" case alert = "Alert" case emergency = "Emergency" }
Running the example gives us the following output:
(Substring, Date, SeverityLevel, Optional<Int>) 2022-06-09 19:11:52 +0000 notice 1211 2022-06-09 19:12:18 +0000 notice 1211 2022-06-09 19:12:30 +0000 critical 1211 2022-06-09 19:16:43 +0000 error 3978
Here’s what is happening with the syslog
example.
iso8601
static function (new in iOS 16) is called on the Date.ISO8601FormatStyle
type. This function constructs and returns a date formatter for use by the Swift Regex Capture
in converting the captured string into a Date
. This Date
is then used in the Capture
s output with no further string-to-date conversion necessary.TryCapture
is being used to transform a captures type. It will convert the matched value into a non-optional type or fail the match.transform
closure will be called upon matching the capture. It is passed the matched substring value that can then transform to the desired type. In this example, the transform is converting the matched substring into a SeverityLevel
enum. The corresponding regex output for this capture becomes the closures return type. In the case of a transform on TryCapture
this type will be non-optional. For a Capture
transform, the type will be optional.OneOrMore
, ZeroOrMore
, Optionally
, and Repeat
. The .reluctant
repetition behavior will match as few occurrences as possible. The default repetition behavior for all repetitions is .eager
.Int
value. If this value is 1000 or greater, then it is returned from the transform and becomes the captures output value. Otherwise, it returns nil
for this captures output.matches
variable.pid
capture is not nil
then print out the data.SeverityLevel
enum type, which is used by the transforming capture defined in #3.Swift Regex is a welcome and powerful addition to Swift. Regex Builder is a go-to solution for all but the simplest of regex needs, and mastering it will be time well spent. The declarative approach of Regex Builder coupled with compile time regex support giving us compile time errors, syntax highlighting, and strongly typed captures, makes for a potent combination. A lot of thought has gone into the design of Swift Regex, and it shows. Swift Regex will make a worthy addition to your development toolbox, and taking the time to learn it will pay dividends.
The post Swift Regex Deep Dive appeared first on Big Nerd Ranch.
]]>The post Custom Operators in Swift Combine appeared first on Big Nerd Ranch.
]]>Despite the usefulness of Combine’s built-in operators, there are times when they fall short. This is when constructing your own custom operators adds needed flexibility to perform often complex tasks in a concise and performant manner of your choosing.
In order to create our own operators, it is necessary to understand the basic lifecycle and structure of a Combine pipeline. In Combine, there are three main abstractions: Publishers, Subscribers, and Operators.
Publishers are value types, or Structs, that describe how values and errors are produced. They allow the registration of subscribers who will receive values over time. In addition to receiving values, a Subscriber can potentially receive a completion, as a success or error, from a Publisher. Subscribers can mutate state, and as such, they are typically implemented as a reference type or Class.
Subscribers are created and then attached to a Publisher by subscribing to it. The Publisher will then send a subscription back to the Subscriber. This subscription is used by the Subscriber to request values from the Publisher. Finally, the Publisher can start sending the requested values back to the Subscriber as requested. Depending on the Publisher type, it can send values that it has indefinitely, or it can complete with a success or failure. This is the basic structure and lifecycle used in Combine.
Operators sit in between Publishers and Subscribers where they transform values received from a Publisher, called the upstream, and send them on to Subscribers, the downstream. In fact, operators act as both a Publisher and as a Subscriber.
Let’s cover two different strategies for creating a custom Combine operator. In the first approach, we’ll use the composition of an existing chain of operators to create a reusable component. The second strategy is more involved but provides the ultimate in flexibility.
In our first example, we’ll be creating a histogram from a random array of integer values. A histogram tells us the frequency at which each value in the sample data set appears. For example, if our sample data set has two occurrences of the number one, then our histogram will show a count of two as the number of occurrences of the number one.
// random sample of Int
let sample = [1, 3, 2, 1, 4, 2, 3, 2]
// Histogram
// key: a unique Int from the sample
// value: the count of this unique Int in the sample
let histogram = [1: 2, 2: 3, 3: 2, 4: 1]
We can use Combine to calculate the histogram from a sample of random Int.
// random sample of Int
// 1
let sample = [1, 3, 2, 1, 4, 2, 3, 2]
// 2
sample.publisher
// 3
.reduce([Int:Int](), { accum, value in
var next = accum
if let current = next[value] {
next[value] = current + 1
} else {
next[value] = 1
}
return next
})
// 4
.map({ dictionary in
dictionary.map { $0 }
})
// 5
.map({ item in
item.sorted { element1, element2 in
element1.key < element2.key
}
})
.sink { printHistogram(histogram: $0) }
.store(in: &cancellables)
Which gives us the following output.
histogram standard operators:
1: 2
2: 3
3: 2
4: 1
Here is a breakdown of what is happening with the code:
Publisher
of our sample dataDictionary
of binned values into an Array
of key/value tuples. eg [(key: Int, value: Int)]
key
As you can see, we have created a series of chained Combine operators that calculates a histogram for a published data set of Int
. But what if we use this sequence of code in more than one location? It would be really nice if we could use a single operator to perform this entire operator chain. This reuse not only makes our code more concise and easier to understand but easier to debug and maintain as well. So let’s do just that by composing a new operator based on what we’ve already done.
// 1
extension Publisher where Output == Int, Failure == Never {
// 2
func histogramComposed() -> AnyPublisher<[(key:Int, value:Int)], Never>{
// 3
self.reduce([Int:Int](), { accum, value in
var next = accum
if let current = next[value] {
next[value] = current + 1
} else {
next[value] = 1
}
return next
})
.map({ dictionary in
dictionary.map { $0 }
})
.map({ item in
item.sorted { element1, element2 in
element1.key < element2.key
}
})
// 4
.eraseToAnyPublisher()
}
}
What is this code doing:
Publisher
and constrain its output to type Int
Publisher
that returns an AnyPublisher
of our histogram outputself
. We use self
here since we are executing on the current Publisher
instanceAnyPublisher
Now let’s use our new Combine operator.
// 1
let sample = [1, 3, 2, 1, 4, 2, 3, 2]
// 2
sample.publisher
.histogramComposed()
.sink { printHistogram(histogram: $0) }
.store(in: &cancellables)
Which gives us the following output.
histogram composed: 1: 2 2: 3 3: 2 4: 1
Using the new composed histogram operator:
From the example usage of our new histogram operator, you can see that the code at the point of usage is quite simple and reusable. This is a fantastic technique for creating a toolbox of reusable Combine operators.
Creating a Combine operator through composition, as we have seen, is a great way to refactor existing code for reuse. However, composition does have its limitations, and that is where creating a native Combine operator becomes important.
A natively implemented Combine operator utilizes the Combine Publisher
, Subscriber
, and Subscription
interfaces and relationships in order to provide its functionality. A native Combine operator acts as both a Subscriber
of upstream data and a Publisher
to downstream subscribers.
For this example, we’ll create a modulus operator implemented natively in Combine. The modulus is a mathematical operator which gives the remainder of a division as an absolute value and is represented by the percent sign, %. So, for example, 10 % 3 = 1, or 10 modulo 3 is 1 (10 ➗ 3 = 3 Remainder 1).
Let’s look at the complete code for this native Combine operator, how to use it, and then discuss how it works.
// 1
struct ModulusOperator<Upstream: Publisher>: Publisher where Upstream.Output: SignedInteger {
typealias Output = Upstream.Output // 2
typealias Failure = Upstream.Failure
let modulo: Upstream.Output
let upstream: Upstream
// 3
func receive<S>(subscriber: S) where S : Subscriber, Self.Failure == S.Failure, Self.Output == S.Input {
let bridge = ModulusOperatorBridge(modulo: modulo, downstream: subscriber)
upstream.subscribe(bridge)
}
}
extension ModulusOperator {
// 4
struct ModulusOperatorBridge<S>: Subscriber where S: Subscriber, S.Input == Output, S.Failure == Failure {
typealias Input = S.Input
typealias Failure = S.Failure
// 5
let modulo: S.Input
// 6
let downstream: S
//7
let combineIdentifier = CombineIdentifier()
// 8
func receive(subscription: Subscription) {
downstream.receive(subscription: subscription)
}
// 9
func receive(_ input: S.Input) -> Subscribers.Demand {
downstream.receive(abs(input % modulo))
}
func receive(completion: Subscribers.Completion<S.Failure>) {
downstream.receive(completion: completion)
}
}
// Note: `where Output == Int` here limits the `modulus` operator to
// only being available on publishers of Ints.
extension Publisher where Output == Int {
// 10
func modulus(_ modulo: Int) -> ModulusOperator<Self> {
return ModulusOperator(modulo: modulo, upstream: self)
}
}
As you can see, the modulus is always positive, and when evenly divisible it is equal to 0.
Now we can discuss how the native Combine operator code works.
Publisher
with a constraint on some upstream Publisher
s output of type SignedInteger
. Remember, our operator will be acting as both a Publisher
and a Subscriber
. Thus our input, the upstream, must be SignedInteger
s.ModulusOperator
output, acting as a Publisher
, will be the same as our input (i.e. SignedInteger
s).Publisher
. Creates a Subscription
which acts as a bridge between the operators upstream Publisher
and the downstream Subscriber
.ModulusOperatorBridge
can act as both a Subscription
and a Subscriber
. However, simple operators like this one can be a Subscriber
without the need of being a Subscription
. This is due to the upstream handling lifecycle necessities like Demand
. The upstream behavior is acceptable for our operator, so there is no need to implement Subscription
. The ModulusOperatorBridge
also performs the primary tasks of the modulus operator.Subscriber
and the upstream Publisher
.CombineIdentifier
for CustomCombineIdentifierConvertible
conformance when a Subscription
or Subject
is implemented as a structure.Subscriber
. Links the upstream Subscription
to the bridge as a downstream Subscription
in addition to lifecycle.Subscriber
, performs the modulus operation on this input, and then passes it along to the downstream Subscriber
. The new demand for data, if any, from the downstream is relayed to the upstream.Publisher
makes our custom Combine operator available for use. The extension is limited to those upstream Publishers
whose output is of type Int
.Putting this new modulus operator into action on a Publisher
of Int
would look like:
[-10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10].publisher
.modulus(3)
.sink { modulus in
print("modulus: \(modulus)")
}
.store(in: &cancellables)
modulus: 1
modulus: 0
modulus: 2
modulus: 1
modulus: 0
modulus: 2
modulus: 1
modulus: 0
modulus: 2
modulus: 1
modulus: 0
modulus: 1
modulus: 2
modulus: 0
modulus: 1
modulus: 2
modulus: 0
modulus: 1
modulus: 2
modulus: 0
modulus: 1
As you can see, the modulus operator will act upon a Publisher
of Int
. In this example, we’re taking the modulus of 3 for each Int
value in turn.
Combine is a powerful declarative framework for the asynchronous processing of values over time. Its utility can be extended and customized even further through the creation of custom operators which act as processors in a pipeline of data. These operators can be created through composition, allowing for excellent reuse of common pipelines. They can also be created through direct implementation of the Combine Publisher
, Subscriber
, and Subscription
protocols, which allows for the ultimate in flexibility and control over the flow of data.
Whenever you find yourself working with Combine, keep these techniques in mind and look for opportunities to create custom operators when relevant. A little time and effort creating a custom Combine operator can save you hours of work down the road.
The post Custom Operators in Swift Combine appeared first on Big Nerd Ranch.
]]>