Swift Regex Deep Dive
iOS MacOur introductory guide to Swift Regex. Learn regular expressions in Swift including RegexBuilder examples and strongly-typed captures.
I was hanging out on the #macdev IRC channel on Freenode the other day when someone asked a question: “static
has different meanings based on the context it is placed in, right?”. Indeed, it has different meaning. And yet it’s the same. Static is a C Koan.
static
controls scope, which is the visibility of an entity. It tell the compiler “Here is this thing that I’m using, but don’t let anyone else know about it.” It communicates that something, like a function or a variable, is an implementation detail and should not be made public.
What constitutes “public?” Stuff outside of a compilation unit.
What’s a compilation unit? It’s a term in C-style languages, which just refers to all the stuff that’s processed during a single invocation of the compiler, whether it’s gcc
or clang
. This is a typical compiler invocation:
clang -g -Wall -c thing1.m
The command tells the compiler to open up thing1.m
, run it through the preprocessor, take that output and run it through the compiler, and save the compiled goodies to an object file named thing1.o
. The preprocessed text that’s fed into the compiler is a compilation unit. It is possible for thing1.m
to #import
thing2.m
, but it’ll be just one same compilation unit. But that is (hopefully) a rare occurrence.
static
controls the visibility of a symbol outside of the compilation unit that’s being processed. Things prefixed with static
are not visible outside of that compilation unit. Things not prefixed with static
are visible. That’s pretty much it.
So what does “visible” mean? It means that other code can call it (for functions) or access / change it (for variables). It also means the function or variable can be looked up by name using a function like dlsym
. This meaning of visible is orthogonal to the idea that there are debugging symbols and that values are visible inside of the debugger. static
has no control over that.
This visibility is also independent of the presence of a function prototype or a variable declaration in a header file. If some other piece of code knows the name of a visible function, it can access the non-static
symbol, even if that code doesn’t pull in the proper header file. Or if the function isn’t in any header files at all. It’s what allows us to call private API.
Functions, by default, are visible everywhere, and so can be called from anywhere. Here’s a function in thing1.m
:
void VisibleFunction (void) {
printf ("Hi! I'm visible!n");
}
It’s totally visible. This one will be hidden:
static void InvisibleFunction (void) {
printf ("Hi! I'm hidden!n");
}
Don’t believe me? You can ask nm
to display the symbol table for linky-things:
% clang -g -Wall -c thing1.m
% nm thing1.o
0000000000000248 s EH_frame0
0000000000000152 s L_.str
0000000000000000 T _VisibleFunction
0000000000000260 S _VisibleFunction.eh
U _printf
Each line is a different symbol used in the file. You can see VisibleFunction
(with a leading underscore, which is Just The Way OS X does things). The T
stands for “Defined in the Text section”. VisibleFunction is indeed defined, because it has a body of code. S
is for other symbols. In this case, some exception handling jazz. U
is for undefined. That means the linker needs to mop up things and put in an address where printf can be found. The lower-case s’s are other symbols, such as another exception handling thing and the string character constants used in the printf’s.
OS X also has libtool
which does what nm
does, and more. But nm
is available on every unix platform, so I’ll be using that.
Notice that InvisibleFunction
is not in nm
’s output. It’s, well, invisible.
What are the implications of this? You can have multiple static
functions with the same name in different compilation units. You don’t have to worry about your LogAllTheThings function being confused with someone else’s LogAllTheThings.
What happens if you leave off the static? The function is no longer private – it’s now global. You could get linker errors if you have two object files that define the same function. Say I have a second file, thing2.m
, which is just thing1.m
copied and compiled:
% cp thing1.m thing2.m
% clang -g -Wall -c thing2.m
And just to be pedantic, here’s the nm
of the file:
% nm thing2.o
0000000000000248 s EH_frame0
0000000000000152 s L_.str
0000000000000000 T _VisibleFunction
0000000000000260 S _VisibleFunction.eh
U _printf
It’s got VisibleFunction
defined in its text segment, too. Trying to mash them together gives you a linker error:
% clang -g -Wall *.o -o static
duplicate symbol _VisibleFunction in:
thing1.o
thing2.o
ld: 1 duplicate symbol for architecture x86_64
If there’s no conflict detected by the linker, it just means you just have a function that’s now visible.
I call variables that are declared outside of any functions “module variables”, to distinguish them from stuff that lives inside of a function. They’re also referred to as global variables, and have a lifespan of the duration of the program. Like with functions, any non-static variables in a compilation unit are visible to other compilation units. Here’s a variable declared like this, added to the top of thing1.m
:
int foobage;
An nm
of the resulting object file includes a new line:
% nm thing1.o
00000000000002c8 s EH_frame0
000000000000019c s L_.str
0000000000000000 T _VisibleFunction
00000000000002e0 S _VisibleFunction.eh
<strong>0000000000000198 C _foobage</strong>
U _printf
The capital C
stands for a Common section symbol – it’ll be loaded and initialized to zero. Lines with a capital D
means a Data section symbol, which happens if you assign a value on the declaration line. Because this symbol is visible, anyone can access foobage
and change it. Putting a static
in front of a module variable declaration:
static int invisibleFoobage;
Keeps it hidden.
OBTW, static
variables are initialized to zero at program launch.
_What happens if you leave off the static? _ It becomes a true global variable. What happens if there’s a conflict, like thing1.o has a visible foobage
variable, and thing2.o
has its own, distinct visible foobage
? The linker will coalesce them. Any changes inside of thing1.o
to foobage
will, in essence, be visible to the code in thing2.o
. Mayhem could possibly ensue, especially if the types aren’t miscible.
Just to make things even more subtly different-but-the-same, we’ve got static
variables that live inside of functions:
void VisibleFunction (void) {
printf ("Hi! I'm visible!n");
static int g_force;
printf ("Firey Phoenix number %dn", g_force++);
}
Like static
module variables, these are global variables. They’re initialized to zero and they hang around for the life of the program. Unlike the static
module variables which have visibility in the entire compilation unit, static
function variables only have visibility inside of their function or curly brace scope. Now the only code that can modify g_force
is code inside of VisibleFunction
. Of course, you can have code that takes the address of g_force
and pass it to functions which can then turn around and change the value. But at least vending the pointer is under your control.
What happens if you leave off the static? The variable is just a local variable now. It won’t be initialized to anything sane, and the value won’t persist from function (or method) call to call. Hopefully bugs will manifest themselves quickly
Why am I harping on the term “Compilation Unit” throughout this whole discussion? Why not just say “visibility in the source file”? The compilation unit is the aggregate of all the code that’s pulled in by the preprocessor, including all the header files that are directly (or indirectly) included. What happens if you have a header like this?
// things.h
void VisibleFunction (void);
int headerVariable;
Every compilation unit will get its own copy of the variable. Ordinarily the linker will sort things out, but you can run into problems when building shared libraries or plugins. Rather than resolving to the running process’s headerVariable
at load time, code will happily use storage reserved for the shared library, leading to subtle bugs. I don’t like subtle edge cases.
What happens if the header declares the variable static
-styles?
// things.h
void VisibleFunction (void);
static int headerVariable;
This means every compilation unit will get its own copy of headerVariable
. This might not be a problem – if you’re expecting multiple compilation units to access the same memory location, you wouldn’t have made it static
to start out with.
The downside is you can get a compiler warning about a static
variable that’s not used. The compiler is thinking “You go to the trouble of declaring this variable. You say it’s static
, so only code in this compilation unit can touch it. But no code actually touches it. It’s superfluous. So you may be doing something wrong.” And it complains:
./things.h:5:12: warning: unused variable 'headerVariable' [-Wunused-variable]
static int headerVariable;
Of course, you are driven to fix all your warnings. You can silence it by prefixing the variable declaration with __unused
:
__unused static int headerVariable;
But now I have to ask, “What’s the point?” This is the time I’d make a comment in the code review system or drop a quick email asking “so, what does this really mean? What is t trying to accomplish?”
There are legitimate times you want to expose a global variable, such as Cocoa’s NSString
constants that are used as dictionary keys (looking at you, NSURLLocalizedNameKey
). If you have a plain old variable in a header file, every object file will get its own copy leading to extra work for the linker to coalesce things. If you make it static
, you can cause warnings.
There’s an additional keyword, extern
, which tells the compiler “Hey, this thing? Trust me that it’ll actually be defined in a compilation unit somewhere. Don’t worry. It’ll be there at link time.” So, the declaration in things.h
would look like
extern int headerVariable;
This fixes both of the problems. The definition should appear in just one place, and you don’t have any duplication issues to worry about.
Of course, someone, somewhere, will need to have a non-extern declaration of this, otherwise the compiler will complain:
Undefined symbols for architecture x86_64:
"_headerVariable", referenced from:
_main in main.o
ld: symbol(s) not found for architecture x86_64
This means You’ve broken the promise that it’ll actually be defined somewhere. Typically, you’ll have a header file that’s the public interface to some .c
or .m
file. You’d have the extern variables in the header, and a non-extern declaration in the .c
or .m
file.
Of course, you really, really should think twice before making variables directly accessible.
So, back to the original question, “static
has different meanings based on the context it is placed in, right?”. At a hight level, it means “consider this to be private”. Where the static
is placed controls whether it’s controlling function visibility or variable visibility. There’s also function statics which restrict the visibility even more.
_(Dig grungy details like this? You’ll love Advanced Mac OS X Programming : The Big Nerd Ranch Guide. It’s chock full of language and command goodness.)
_
Our introductory guide to Swift Regex. Learn regular expressions in Swift including RegexBuilder examples and strongly-typed captures.
The Combine framework in Swift is a powerful declarative API for the asynchronous processing of values over time. It takes full advantage of Swift...
SwiftUI has changed a great many things about how developers create applications for iOS, and not just in the way we lay out our...