Bree Stanwyck - Big Nerd Ranch

The Unreasonable Effectiveness of TDD

Fri, 29 Mar 2013 00:51:22 +0000

In 1960, Eugene Wigner published a paper titled “The Unreasonable Effectiveness of Mathematics in the Natural Sciences.” In it, Wigner discusses one of the thorniest and most fundamental questions in physics: Why does much of (apparently totally abstract) mathematics later end up applying so well to physics?

He states, “mathematical concepts turn up in entirely unexpected connections. Moreover, they often permit an unexpectedly close and accurate description of the phenomena in these connections.” The paper drew a huge number of responses, up to and including the claim that this is the case because the universe itself is a Platonic mathematical object whose properties we are discovering over time.

I was reminded of all this by a talk shown at our weekly Lunch and Learn, “The Deep Synergy Between Testability and Good Design” from Michael Feathers (it even has a similar title!). Michael gives a few great, concrete examples of how “hard to test” implies “poorly designed.” For example, he describes the common pain of “I wish I could test this private method” as a hint to extract another class from an “iceberg class” full of private logic.

Why TDD is unreasonably effective

Michael left partially open the question of why testing pains so frequently indicate design problems, or why TDD leads to better design, which got me thinking about “Unreasonable Effectiveness.” I don’t have any grand unified theories of code to propose, but the question is an interesting one. It seems obvious that tests prevent regression and help ensure the correctness of your software, but why should it improve design?

My answer is something like: writing tests forces you to use your code as though you were already maintaining it. It brings directly to the forefront design pains that might otherwise wait for weeks or months to appear, before the code is even pushed.

Testing also forces the programmer to act as a client of their own code, rather than as someone with intimate knowledge of its interior workings. It’s easy, for example, for a class to accrue more and more direct dependencies on other classes over time, creeping into a god object that becomes a mess to maintain. Using the object when the setup has already been done by previous code can hide the problem. But unit testing a class with an enormous number of dependencies requires setting up (or at least stubbing out) every dependency, which quickly becomes a pain. So, writing tests in this case divorces the class from the context hidden by other code and brings those dependency problems to light.

Of course, you can “cheat” your way out of testing pains and still end up with badly-designed code. Michael Feathers demonstrates the cheat-y solution to his “iceberg class” example: just take the private methods you wish you could test, and make it public. Even then, testing can act as a barometer of code quality. In the case of the iceberg class, the unit spec will grow large and unwieldy from all the private logic that needs testing being stuffed into it.

All this seems (to me) to imply that “real” TDD, writing tests first, isn’t strictly necessary for reaping those design benefits. Test-first coding just forces the client perspective. With no written code, the programmer is free to focus on the way code will ideally be used and maintained, rather than considering it guts-first.

The post The Unreasonable Effectiveness of TDD appeared first on Big Nerd Ranch.

]]>

Building your API Early with API-First Design

Thu, 15 Nov 2012 16:17:37 +0000

In an increasingly multi-platform web—full of iPhone and Android apps, social media plug-ins, and other third-party developers—building your API first provides unified access to an app’s data. In standard processes, you might, for example, build a web front-end in Rails, then later need a phone app that can access and update the same data. An API built as an afterthought can lead to an ad hoc “gluing” of functionalities together rather than providing the cohesion across platforms that a well-planned one provides.

Defining functionality early

From the developer’s point of view, API-first design encourages defining functionality early and allowing form to follow from it. The presence of API output as raw data means that devs can focus on business logic over interface considerations. This also focuses on the core of an app and its value, and might help put extra bells and whistles into perspective (“Do we really need that?”).

Fast iteration and efficiency

This method also allows _fast iteration _on new features. Suppose a Widget model has been changed, and now has many Parts instead of one. If the model is critical to the app, a change like this might cause other sweeping changes across an app’s interface. In many situations, the solution is for a developer to quickly mock up a quick-and-dirty interface that’s just good enough for testing. Once the appropriate features have been accepted, a designer might start nearly from scratch to re-implement forms and other elements in a more aesthetically pleasing way. This is inefficient, and can create a dependency because the designer waits for the back-end portion to be finished before starting their work. API responses can be mocked (once a spec is agreed on), meaning development and design teams can work at the same time, with neither stepping on the other’s toes.

Central documentation

Finally, an API built this way helps to centrally document an application. An API contains all of the central business logic that a developer starting out needs to know, reducing the bus number of the project. It also _doesn’t _contain cruft that’s (mostly) irrelevant to back-end design, e.g., does making a purchase on the site take multiple page-steps? This kind of information should be packaged in the web app, away from the data manipulation of an API. Some caveats:

Story acceptance with APIs might be difficult for non-technical project managers. Indeed, the ideal of being able to separate back-end and UI features given above might be impossible on some projects.
Using an API separately can introduce overhead, e.g. separate API servers will most likely need to be maintained from the start.

The post Building your API Early with API-First Design appeared first on Big Nerd Ranch.

]]>

Conferences, Clients, and ROWE

Sun, 08 Jul 2012 11:00:00 +0000

Highgroovers keep up with new trends by attending at least one conference per year. Besides bringing us up to date on what’s shiny, they help us network, learn about the bleeding edge of our field from academics, and gain new perspectives on what we do. But conferences aren’t necessarily vacations, and juggling a conference and a Results-Only Work Environment (ROWE) can be tricky. Read on to see how we handle conferences in a ROWE while keeping our clients happy.

Many of us choose conferences like RailsConf or RubyConf that relate directly to our work, but we also attend startup and business conferences like LessConf, Summer Con, and BarCampNYC.

This year, I chose to attend ICML 2012, an academic conference centered on machine learning. ICML offered a great chance to get in touch with machine learning academia and learn what’s new in big data processing and deep learning, among other things.

Thanks to our “unlimited within reason” vacation policy, conference days don’t have to count as working days. And even if we’re working during a conference, learning and interacting at the conference takes priority over work.

In my case, I chose to work while attending the conference. Doing so meant that I hade to figure out how to manage time spent doing work at a conference, and how to make sure clients know what to expect while I was attending it. Work done during a conference most likely consists of keeping up to date on any communication regarding the project and potentially responding to emergencies, such as production downtime on a project. By updating our availability well in advance and informing our clients at the same time, we eliminate most issues that can pop up when you attend conferences.

On the other hand, a conference can also be taken as vacation where no work gets done (this option works well, for example, on projects with only one assigned developer). Or there’s a full-on “working conference,” which is a tempting option, but it can get tricky when you’re dealing with a hotel’s shoddy WiFi and balancing work with conference time.

Our vacation policy gives some solid advice for ROWE conference-takers: “Don’t spend conference sessions trying to get other work done. That said, if you have huge swaths of down time while on a trip, that can be a pretty awesome time and place to get some work done.”

I am thankful that the client I am working with actively discouraged me from working too hard during conference days, allowing me to me focus on presentations and networking with the machine learning community. While I was at the conference, I tried to keep up with my client via chat, and they repeatedly told me to get back to the conference. I was surprised by this, but they know that attending conferences helps us to do better work for them in the future.

ICML 2012 helped me sharpen my skills in a number of ways. I was reminded of the importance of the cycle between business problems, machine learning research, and business solutions in a great talk and paper called Machine Learning That Matters. It also gave me a feel for what’s possible now, and what will be possible, with machine learning algorithms. Clients often ask us questions related to machine learning: They frequently have a pile of data and aren’t quite sure how to extract the info they need. ICML made me more aware of new ways to interpret ML problems so that I can help add business value in these cases.

Learning from the academics at ICML taught me about the ways the field is growing and changing, and made me think about how I can use this knowledge in my work. I may not have been working at full strength during the conference, but I was definitely increasing my knowledge base and learning new skills to use at Highgroove.

What about you? Have you been able to juggle work and attending a conference?

The post Conferences, Clients, and ROWE appeared first on Big Nerd Ranch.

]]>

Getting Fancy with ElasticSearch

Thu, 07 Jun 2012 12:00:00 +0000

When an app requires full-text search developers usually have two major contenders to choose from: Solr and ElasticSearch. Each addresses different use cases, but generally, ElasticSearch performs noticeably better when an app expects frequent reindexing, as is often the case. Gems like Tire make setting up ElasticSearch a breeze, but setting up more advanced indexes and interfacing with ActiveRecord can sometimes be a pain. Read on to see how to make your life easier with ElasticSearch and Tire.

Say an app needs an “omnibox” – a single search input that searches over multiple fields (for example, a user’s name, email address, and/or company). An initial attempt at setting this up in ElasticSearch with Tire would look like this:

After which we could search for users like User.search('Highgroove', load: true) and get the expected response.

But what if we want to allow partial-string searches? This requires some custom analyzers, in this case n-grams over the strings, which match substrings between the given lengths:

This works, but we can do much better than the mess of hashes above. Personally, I prefer to wrap this setup in a YAML file and parse it separately in an initializer:

We’re almost done now; unfortunately, though, adding custom analyzers interferes with ElasticSearch’s ability to search over all indexes in a #search call. Instead, searches have to take the form `User.search(“name:#{query} OR email:#{query} OR company:#{query}”). We also have to tokenize queries to account for whitespace. When all is said and done, a finished full-text search might look like this:

and we finally have our omni-search by calling this method like User.fulltext_search('groove').

Some final tips and tricks that make life with ElasticSearch that much nicer:

When setting up ElasticSearch on a development machine, it’s easy to mess up the index (for example, trying to run tests that involve ElasticSearch and add non-existent data to the index). Getting rid of this locally is as easy as sending a DELETE command to the ElasticSearch server, which usually looks like curl -XDELETE 'http://localhost:9200/users/', followed by a rake db:setup to re-seed the database and re-index (or User.index.import in Rails console just to re-index).
n-grams can waste memory if you’re not careful; the min_gram and max_gram analyzer settings should be enough to narrow searches down to one record, and no more (a max_gram of 15 over a name is probably wasteful, since very few names share a substring that long).

The post Getting Fancy with ElasticSearch appeared first on Big Nerd Ranch.

]]>

Writing Readable Ruby

Mon, 27 Feb 2012 12:00:00 +0000

Ruby inherits the philosophy of “there’s more than one way to do it,” or TMTOWTDI, from Perl. Of course, TMTOWTDI is worthless unless at least a handful of those ways can be written clearly not just for the author, but (perhaps more importantly) for future readers and editors. So, how do you make the best use of the many ways Ruby and Rails allow you to do things?

Before Ruby, the experience I had in dynamic, interpreted languages was with Python – a language with a totally opposite motto, “There should be one, and preferably only one, obvious way to do it.” As such, Ruby and Rails were somewhat of a shock. The first thing that comes to mind about loose dynamically-typed languages like Perl and Ruby is usually, “But it’ll be so easy to write bad code!”

And it is! As the author of Eloquent Ruby says, Ruby is a “language for grownups,” meaning that writing ugly, hard-to-maintain code is certainly possible, but this freedom allows for beautifully expressive, concise, and readable code.

Here are some guidelines that I and other Highgroovers use to make sure we achieve the latter whenever we can:

Write functional. Why write code that looks imperative when it doesn’t really change any state? Enumerable methods like sort_by and reduce go a long way towards making code understandable in less time, as does the ubiquitous Symbol#to_proc.
Use method synonyms that make sense. It seems like many newcomers to Ruby are wary of using more than one name for the same method. For example, you can retrieve the number of elements in an array with #count, #length, or #size. Which one you use depends on why the number of elements is needed, and what “sounds right” in the given section of code. Synonyms also make it easier to
Make syntax English-like when reasonable. Readers of _why’s poignant guide know this already, and know how it can make reading code more like a game than a chore. This includes using the right synonyms for a method as above, as well as choosing your own variable and method names cleverly. Why name a method that returns a list of primes less than n like primes(n) when it could just be primes_less_than n?
Review! Of course, it isn’t always possible to stick with these guidelines. Business logic can get messy and working with legacy code can make English-like syntax a pipe dream. But having your code reviewed can reveal which bits can be fixed easily, as well as which bits are completely opaque to a newcomer.

As an example of putting these into action, let’s write a method that turns a hypothetical Rails model into a hash of its reports; we need to convert every report to a hash, combine them, and return it. Here’s the straight imperative way:

It gets the job done, but it ain’t pretty; again, the #merge! method seems out of place, since it implies a change in state and the only thing really changing is the return value, which should be expected.

Here’s a more functional way:

This is a bit shorter, and the fact that @reports is the “subject” being method chained and then returned makes it clearer that the return value is just some transformation of the model’s reports. But let’s try to make it almost readable English:

This is much shorter, and translates easily to its purpose: Take this instance’s reports, map them to hashes, and reduce them by merging them together. It’s also nearly point-free since the map and reduce blocks just take method symbols.

Of course, personal tastes will vary on whether this is the ideal way of representing this or that method. But for me, the guidelines above have led to generally clearer, more comprehensible code during review.

How do you make sure your code is clear and concise?

The post Writing Readable Ruby appeared first on Big Nerd Ranch.

]]>

Why Highgroove has a Personal Trainer

Mon, 20 Feb 2012 12:00:00 +0000

At Highgroove, we have a personal trainer, Cherri, on-staff and on-site, available twice a week to us, scheduled via appointment slots using Google Calendar. Our personal trainer has been with us since December of last year, and we just added more sessions. We have been delighted at the opportunity to get in shape (although, perhaps, temporarily less thrilled when “core day” came around). Personally, having someone motivate us to exercise – someone who thoroughly knows what they are doing was exactly the motivation I needed to start working out again. But we’ve also realized getting a gym session in during the afternoon has benefits for developing software (along with developing sweet abs).

The sessions are personally catered to each of us, last just thirty intense minutes, and with the gym just downstairs, it’s a quick walk over to start working out. It’s an even quicker walk when Cherri sends you an email, reminding you that your session starts in 1 minute. With such a short work out session, and scheduled in the middle of the work day (almost all sessions are after lunch), you would think it would cause a break in productivity, but the opposite actually happens.

In fact, Lifehacker has posted about the benefits of a workday workout, explaining that it improves energy and alertness as well as productivity. The article fits my and other Highgroover’s experiences perfectly: working out gives an energy boost that used to be gained by an extra double-shot of espresso, and also gives us time to mentally work out the programming problems wracking our brains. We can return to work refreshed (if a little sweatier) and ready to tackle things from a new perspective. It almost seems counter-intuitive, but after a work out, we have all remarked on the productivity increase. It’s real!

What healthy habits give you a productivity boost?

The post Why Highgroove has a Personal Trainer appeared first on Big Nerd Ranch.

]]>

Classifying Data with Discriminant Analysis

Mon, 21 Nov 2011 12:00:00 +0000

Cluster analysis methods have been gaining popularity as a way of Relating pieces of data in large datasets with one another. Examples in social networking are obvious: friends on Facebook cluster into cliques and communities, which cluster into even larger groups. Demographics and other marketing research can also be aided by sorting prospective customers into groups based on preference.

When the clusters are known, and ample training data is available, discriminant analysis is particularly effective at classifying new data. Discriminant analysis methods are built into the R programming language (something we’ve discussed a bit in the past) with a standard package. However, R can be cumbersome to use by itself (and the syntax still seems a bit bizarre to me personally), so I used Rinruby, a gem which gives direct access to R methods and data, to put a nice Ruby wrapper around it. Below is some example code that analyzes a well-known clustering test dataset, the Fisher iris data.

Say our iris data is contained as an array (“training_rows”) of hashes that each look like

(where the species is given as 1, 2, or 3). We can load our data into a DiscriminantAnalysis instance and start predicting in just a few lines:

Every prediction comes as a hash with a confidence score to check the quality of the classification. This code uses a linear analysis (i.e., it separates classes by lines or planes); the pretty scatterplot up top was generated using the iris data with quadratic analysis, which can be achieved in the code above by simply replacing “init_lda_analysis” with “init_qda_analysis”.

R allows for tons of manipulation of the analysis once it’s loaded, some of which has been built in to the DiscriminantAnalysis class (scatterplots, accuracy and significance testing, etc.).

Since we’re all about contributing to the open source community at Highgroove, I’ve packaged these methods into a gem called harlequin (cheesily enough after one of the irises in Fisher’s data). If you have a working copy of R on your machine, you can get the code above running by just adding the proper requires for the gem. Feel free to check it out and help to make it more awesome!

Have you used clustering methods before? How do you deal with heavy-duty data processing in Ruby?

The post Classifying Data with Discriminant Analysis appeared first on Big Nerd Ranch.

]]>

If you aren't getting burned…

Tue, 04 Oct 2011 12:00:00 +0000

Highgrooves’s “bias towards action” rallying cry is no secret, and we try to abide by that rule whenever we can, whether we’re deploying or choosing where to go for lunch. An important corollary is that we try to bias towards making mistakes earlier rather than later, too. A fellow Highgroover captured it well by saying: “If you aren’t getting burned, you need to play with fire more.”

Learning what breaks a new system helps developers get a feel for when heavier infrastructure might be required (e.g., implementing background jobs if server-side code starts becoming too intensive). And it’s part of why we encourage new developers to deploy code on the first day: it’s better to learn how to fix production while you’re getting situated than when there might be no one around at the moment (especially in a ROWE).

But finding out when your system doesn’t break can be just as important. I’ve discovered security issues in the past by using unsafe code I “knew” shouldn’t work as part of an initial naïve solution or experiment.

Well-developed test suites, of course, are what make all of this possible, by telling you exactly what has broken and where. It’s also what made me initially come around to TDD; writing tests first means you don’t bias your own thinking to the code you’ve written, making it easier to consider what “breaking” a model or controller (for example) should look like in a test.

Do you encourage experimentation and “failing fast”? How?

The post If you aren't getting burned… appeared first on Big Nerd Ranch.

]]>

Yet Another Benefit of Open Source

Mon, 05 Sep 2011 12:00:00 +0000

Bayesian networks have proven extremely useful for classifying events and documents, reliability analysis, and in many other fields. Essentially, wherever a well-defined chain of causation given between many pieces of data exists, a Bayes net can help provide probabilities for the “hidden variables” of a system: in the cases above, for example, the category a document belongs to, the probability a system will fail if a certain component fails, etc.

As part of another project here at Highgroove, I’ve been developing a gem called glymour that learns a Bayes net’s structure automatically, which is important when it becomes impractical to manually define causal relationships (e.g. when taking into account dozens of different variables). Working on an open-source gem while relating it to a larger project has made me understand one of its many benefits: open source code is kind of like a constrained writing in which you are constrained to being as general-purpose as possible (and reasonable).

Writing a piece of open source software – especially, of course, a gem or some other kind of package/plugin – forces the coder to write in a highly modular and iterative way. Beyond the normal considerations of reusable code, DRY, etc., all possible use cases must be considered. For example, though I intend to use glymour mainly with ActiveRecord objects as stores for sample data, working on it as a gem had me quickly realize that others using it might be reading from a file, or user input, or many other scenarios. Thus, glymour instead uses a user-defined block for retrieving data, allowing for any of these possibilities. In turn, this made testing much easier (since a simple array could be filled with hashes of test data).

It’s been rewarding working on something intended for the public; we try to work open source as much as we can, and really digging in to an open project has helped me understand why.

Does writing open source software benefit your style? How?

The post Yet Another Benefit of Open Source appeared first on Big Nerd Ranch.

]]>

Getting Things Done (When the Mood Strikes)

Mon, 29 Aug 2011 11:00:00 +0000

As the youngest Highgroover, I don’t have quite as much reference to contrast a ROWE with a 9-to-5 type work environment. But I could tell within my first week here that working without a timeclock allows us to have an incredible amount of flexibility.

A lot of writing about ROWE focuses on the way it handles time off: your weekend can be anytime, you can leave for a movie on a Thursday, etc. This seems to me like a silly way to sell the idea. A student who really wants to pass his classes bases his partying around his studying, not vice versa. So why schedule a work week around your weekend if you really want results? Because of this, when I started working for Highgroove I tried paying attention to how our work environment allowed us to get things done.

My first month here has definitely not disappointed. The main benefits, for me, have involved being able to work when I’m most productive. I’m a night-owl and conversely, my brain is frankly useless at 8 in the morning. Rather than sit and stare at a cup of coffee for an hour, I can start (and end) my day late and use time in the office to really work.

Similarly, we can work right through a period of increased productivity (“flow”) without worrying about hours. A couple of weeks ago, one of these moods struck me at about midnight (on a weekday). Worrying about the clock would most likely mean hastily jotting down some ideas before going to sleep (and hoping the ideas are as fresh the next day), just to be able to come in on time. Instead, I could stay up, code while in the zone, and sleep in knowing I had gotten the results I needed.

I don’t mean to downplay the awesomeness of ROWE’s non-work benefits. But having the ability to work when we’re truly productive makes work itself many times more satisfying (and more fun).

How do you make sure your time “in the zone” coincides with your time spent working?

The post Getting Things Done (When the Mood Strikes) appeared first on Big Nerd Ranch.

]]>