In 1960, Eugene Wigner published a paper titled “The Unreasonable Effectiveness of Mathematics in the Natural Sciences.” In it, Wigner discusses one of the thorniest and most fundamental questions in physics: Why does much of (apparently totally abstract) mathematics later end up applying so well to physics?
The post The Unreasonable Effectiveness of TDD appeared first on Big Nerd Ranch.
]]>In 1960, Eugene Wigner published a paper titled “The Unreasonable Effectiveness of Mathematics in the Natural Sciences.” In it, Wigner discusses one of the thorniest and most fundamental questions in physics: Why does much of (apparently totally abstract) mathematics later end up applying so well to physics?
He states, “mathematical concepts turn up in entirely unexpected connections. Moreover, they often permit an unexpectedly close and accurate description of the phenomena in these connections.” The paper drew a huge number of responses, up to and including the claim that this is the case because the universe itself is a Platonic mathematical object whose properties we are discovering over time.
I was reminded of all this by a talk shown at our weekly Lunch and Learn, “The Deep Synergy Between Testability and Good Design” from Michael Feathers (it even has a similar title!). Michael gives a few great, concrete examples of how “hard to test” implies “poorly designed.” For example, he describes the common pain of “I wish I could test this private method” as a hint to extract another class from an “iceberg class” full of private logic.
Michael left partially open the question of why testing pains so frequently indicate design problems, or why TDD leads to better design, which got me thinking about “Unreasonable Effectiveness.” I don’t have any grand unified theories of code to propose, but the question is an interesting one. It seems obvious that tests prevent regression and help ensure the correctness of your software, but why should it improve design?
My answer is something like: writing tests forces you to use your code as though you were already maintaining it. It brings directly to the forefront design pains that might otherwise wait for weeks or months to appear, before the code is even pushed.
Testing also forces the programmer to act as a client of their own code, rather than as someone with intimate knowledge of its interior workings. It’s easy, for example, for a class to accrue more and more direct dependencies on other classes over time, creeping into a god object that becomes a mess to maintain. Using the object when the setup has already been done by previous code can hide the problem. But unit testing a class with an enormous number of dependencies requires setting up (or at least stubbing out) every dependency, which quickly becomes a pain. So, writing tests in this case divorces the class from the context hidden by other code and brings those dependency problems to light.
Of course, you can “cheat” your way out of testing pains and still end up with badly-designed code. Michael Feathers demonstrates the cheat-y solution to his “iceberg class” example: just take the private methods you wish you could test, and make it public. Even then, testing can act as a barometer of code quality. In the case of the iceberg class, the unit spec will grow large and unwieldy from all the private logic that needs testing being stuffed into it.
All this seems (to me) to imply that “real” TDD, writing tests first, isn’t strictly necessary for reaping those design benefits. Test-first coding just forces the client perspective. With no written code, the programmer is free to focus on the way code will ideally be used and maintained, rather than considering it guts-first.
The post The Unreasonable Effectiveness of TDD appeared first on Big Nerd Ranch.
]]>The post Building your API Early with API-First Design appeared first on Big Nerd Ranch.
]]>In an increasingly multi-platform web—full of iPhone and Android apps, social media plug-ins, and other third-party developers—building your API first provides unified access to an app’s data. In standard processes, you might, for example, build a web front-end in Rails, then later need a phone app that can access and update the same data. An API built as an afterthought can lead to an ad hoc “gluing” of functionalities together rather than providing the cohesion across platforms that a well-planned one provides.
From the developer’s point of view, API-first design encourages defining functionality early and allowing form to follow from it. The presence of API output as raw data means that devs can focus on business logic over interface considerations. This also focuses on the core of an app and its value, and might help put extra bells and whistles into perspective (“Do we really need that?”).
This method also allows _fast iteration _on new features. Suppose a Widget model has been changed, and now has many Parts instead of one. If the model is critical to the app, a change like this might cause other sweeping changes across an app’s interface. In many situations, the solution is for a developer to quickly mock up a quick-and-dirty interface that’s just good enough for testing. Once the appropriate features have been accepted, a designer might start nearly from scratch to re-implement forms and other elements in a more aesthetically pleasing way. This is inefficient, and can create a dependency because the designer waits for the back-end portion to be finished before starting their work. API responses can be mocked (once a spec is agreed on), meaning development and design teams can work at the same time, with neither stepping on the other’s toes.
Finally, an API built this way helps to centrally document an application. An API contains all of the central business logic that a developer starting out needs to know, reducing the bus number of the project. It also _doesn’t _contain cruft that’s (mostly) irrelevant to back-end design, e.g., does making a purchase on the site take multiple page-steps? This kind of information should be packaged in the web app, away from the data manipulation of an API. Some caveats:
The post Building your API Early with API-First Design appeared first on Big Nerd Ranch.
]]>The post Conferences, Clients, and ROWE appeared first on Big Nerd Ranch.
]]>Highgroovers keep up with new trends by attending at least one conference per year. Besides bringing us up to date on what’s shiny, they help us network, learn about the bleeding edge of our field from academics, and gain new perspectives on what we do. But conferences aren’t necessarily vacations, and juggling a conference and a Results-Only Work Environment (ROWE) can be tricky. Read on to see how we handle conferences in a ROWE while keeping our clients happy.
Many of us choose conferences like RailsConf or RubyConf that relate directly to our work, but we also attend startup and business conferences like LessConf, Summer Con, and BarCampNYC.
This year, I chose to attend ICML 2012, an academic conference centered on machine learning. ICML offered a great chance to get in touch with machine learning academia and learn what’s new in big data processing and deep learning, among other things.
Thanks to our “unlimited within reason” vacation policy, conference days don’t have to count as working days. And even if we’re working during a conference, learning and interacting at the conference takes priority over work.
In my case, I chose to work while attending the conference. Doing so meant that I hade to figure out how to manage time spent doing work at a conference, and how to make sure clients know what to expect while I was attending it. Work done during a conference most likely consists of keeping up to date on any communication regarding the project and potentially responding to emergencies, such as production downtime on a project. By updating our availability well in advance and informing our clients at the same time, we eliminate most issues that can pop up when you attend conferences.
On the other hand, a conference can also be taken as vacation where no work gets done (this option works well, for example, on projects with only one assigned developer). Or there’s a full-on “working conference,” which is a tempting option, but it can get tricky when you’re dealing with a hotel’s shoddy WiFi and balancing work with conference time.
Our vacation policy gives some solid advice for ROWE conference-takers: “Don’t spend conference sessions trying to get other work done. That said, if you have huge swaths of down time while on a trip, that can be a pretty awesome time and place to get some work done.”
I am thankful that the client I am working with actively discouraged me from working too hard during conference days, allowing me to me focus on presentations and networking with the machine learning community. While I was at the conference, I tried to keep up with my client via chat, and they repeatedly told me to get back to the conference. I was surprised by this, but they know that attending conferences helps us to do better work for them in the future.
ICML 2012 helped me sharpen my skills in a number of ways. I was reminded of the importance of the cycle between business problems, machine learning research, and business solutions in a great talk and paper called Machine Learning That Matters. It also gave me a feel for what’s possible now, and what will be possible, with machine learning algorithms. Clients often ask us questions related to machine learning: They frequently have a pile of data and aren’t quite sure how to extract the info they need. ICML made me more aware of new ways to interpret ML problems so that I can help add business value in these cases.
Learning from the academics at ICML taught me about the ways the field is growing and changing, and made me think about how I can use this knowledge in my work. I may not have been working at full strength during the conference, but I was definitely increasing my knowledge base and learning new skills to use at Highgroove.
What about you? Have you been able to juggle work and attending a conference?
The post Conferences, Clients, and ROWE appeared first on Big Nerd Ranch.
]]>When an app requires full-text search developers usually have two major contenders to choose from: Solr and ElasticSearch. Each addresses different use cases, but generally, ElasticSearch performs noticeably better when an app expects frequent reindexing, as is often the case. Gems like Tire make setting up ElasticSearch a breeze, but setting up more advanced indexes and interfacing with ActiveRecord can sometimes be a pain. Read on to see how to make your life easier with ElasticSearch and Tire.
The post Getting Fancy with ElasticSearch appeared first on Big Nerd Ranch.
]]>When an app requires full-text search developers usually have two major contenders to choose from: Solr and ElasticSearch. Each addresses different use cases, but generally, ElasticSearch performs noticeably better when an app expects frequent reindexing, as is often the case. Gems like Tire make setting up ElasticSearch a breeze, but setting up more advanced indexes and interfacing with ActiveRecord can sometimes be a pain. Read on to see how to make your life easier with ElasticSearch and Tire.
Say an app needs an “omnibox” – a single search input that searches over multiple fields (for example, a user’s name, email address, and/or company). An initial attempt at setting this up in ElasticSearch with Tire would look like this:
After which we could search for users like User.search('Highgroove', load: true)
and get the expected response.
But what if we want to allow partial-string searches? This requires some custom analyzers, in this case n-grams over the strings, which match substrings between the given lengths:
This works, but we can do much better than the mess of hashes above. Personally, I prefer to wrap this setup in a YAML file and parse it separately in an initializer:
We’re almost done now; unfortunately, though, adding custom analyzers interferes with ElasticSearch’s ability to search over all indexes in a #search
call. Instead, searches have to take the form `User.search(“name:#{query} OR email:#{query} OR company:#{query}”). We also have to tokenize queries to account for whitespace. When all is said and done, a finished full-text search might look like this:
and we finally have our omni-search by calling this method like User.fulltext_search('groove')
.
Some final tips and tricks that make life with ElasticSearch that much nicer:
curl -XDELETE 'http://localhost:9200/users/'
, followed by a rake db:setup
to re-seed the database and re-index (or User.index.import
in Rails console just to re-index).min_gram
and max_gram
analyzer settings should be enough to narrow searches down to one record, and no more (a max_gram
of 15 over a name is probably wasteful, since very few names share a substring that long).The post Getting Fancy with ElasticSearch appeared first on Big Nerd Ranch.
]]>The post Writing Readable Ruby appeared first on Big Nerd Ranch.
]]>Ruby inherits the philosophy of “there’s more than one way to do it,” or TMTOWTDI, from Perl. Of course, TMTOWTDI is worthless unless at least a handful of those ways can be written clearly not just for the author, but (perhaps more importantly) for future readers and editors. So, how do you make the best use of the many ways Ruby and Rails allow you to do things?
Before Ruby, the experience I had in dynamic, interpreted languages was with Python – a language with a totally opposite motto, “There should be one, and preferably only one, obvious way to do it.” As such, Ruby and Rails were somewhat of a shock. The first thing that comes to mind about loose dynamically-typed languages like Perl and Ruby is usually, “But it’ll be so easy to write bad code!”
And it is! As the author of Eloquent Ruby says, Ruby is a “language for grownups,” meaning that writing ugly, hard-to-maintain code is certainly possible, but this freedom allows for beautifully expressive, concise, and readable code.
Here are some guidelines that I and other Highgroovers use to make sure we achieve the latter whenever we can:
sort_by
and reduce
go a long way towards making code understandable in less time, as does the ubiquitous Symbol#to_proc
.#count
, #length
, or #size
. Which one you use depends on why the number of elements is needed, and what “sounds right” in the given section of code. Synonyms also make it easier ton
like primes(n)
when it could just be primes_less_than n
?As an example of putting these into action, let’s write a method that turns a hypothetical Rails model into a hash of its reports; we need to convert every report to a hash, combine them, and return it. Here’s the straight imperative way:
It gets the job done, but it ain’t pretty; again, the #merge!
method seems out of place, since it implies a change in state and the only thing really changing is the return value, which should be expected.
Here’s a more functional way:
This is a bit shorter, and the fact that @reports
is the “subject” being method chained and then returned makes it clearer that the return value is just some transformation of the model’s reports. But let’s try to make it almost readable English:
This is much shorter, and translates easily to its purpose: Take this instance’s reports, map them to hashes, and reduce them by merging them together. It’s also nearly point-free since the map
and reduce
blocks just take method symbols.
Of course, personal tastes will vary on whether this is the ideal way of representing this or that method. But for me, the guidelines above have led to generally clearer, more comprehensible code during review.
How do you make sure your code is clear and concise?
The post Writing Readable Ruby appeared first on Big Nerd Ranch.
]]>At Highgroove, we have a personal trainer, Cherri, on-staff and on-site, available twice a week to us, scheduled via appointment slots using Google Calendar. Our personal trainer has been with us since December of last year, and we just added more sessions. We have been delighted at the opportunity to get in shape (although, perhaps, temporarily less thrilled when “core day” came around). Personally, having someone motivate us to exercise – someone who thoroughly knows what they are doing was exactly the motivation I needed to start working out again. But we’ve also realized getting a gym session in during the afternoon has benefits for developing software (along with developing sweet abs).
The post Why Highgroove has a Personal Trainer appeared first on Big Nerd Ranch.
]]>At Highgroove, we have a personal trainer, Cherri, on-staff and on-site, available twice a week to us, scheduled via appointment slots using Google Calendar. Our personal trainer has been with us since December of last year, and we just added more sessions. We have been delighted at the opportunity to get in shape (although, perhaps, temporarily less thrilled when “core day” came around). Personally, having someone motivate us to exercise – someone who thoroughly knows what they are doing was exactly the motivation I needed to start working out again. But we’ve also realized getting a gym session in during the afternoon has benefits for developing software (along with developing sweet abs).
The sessions are personally catered to each of us, last just thirty intense minutes, and with the gym just downstairs, it’s a quick walk over to start working out. It’s an even quicker walk when Cherri sends you an email, reminding you that your session starts in 1 minute. With such a short work out session, and scheduled in the middle of the work day (almost all sessions are after lunch), you would think it would cause a break in productivity, but the opposite actually happens.
In fact, Lifehacker has posted about the benefits of a workday workout, explaining that it improves energy and alertness as well as productivity. The article fits my and other Highgroover’s experiences perfectly: working out gives an energy boost that used to be gained by an extra double-shot of espresso, and also gives us time to mentally work out the programming problems wracking our brains. We can return to work refreshed (if a little sweatier) and ready to tackle things from a new perspective. It almost seems counter-intuitive, but after a work out, we have all remarked on the productivity increase. It’s real!
What healthy habits give you a productivity boost?
The post Why Highgroove has a Personal Trainer appeared first on Big Nerd Ranch.
]]>The post Classifying Data with Discriminant Analysis appeared first on Big Nerd Ranch.
]]>Cluster analysis methods have been gaining popularity as a way of Relating pieces of data in large datasets with one another. Examples in social networking are obvious: friends on Facebook cluster into cliques and communities, which cluster into even larger groups. Demographics and other marketing research can also be aided by sorting prospective customers into groups based on preference.
When the clusters are known, and ample training data is available, discriminant analysis is particularly effective at classifying new data. Discriminant analysis methods are built into the R programming language (something we’ve discussed a bit in the past) with a standard package. However, R can be cumbersome to use by itself (and the syntax still seems a bit bizarre to me personally), so I used Rinruby, a gem which gives direct access to R methods and data, to put a nice Ruby wrapper around it. Below is some example code that analyzes a well-known clustering test dataset, the Fisher iris data.
Say our iris data is contained as an array (“training_rows”) of hashes that each look like
(where the species is given as 1, 2, or 3). We can load our data into a DiscriminantAnalysis instance and start predicting in just a few lines:
Every prediction comes as a hash with a confidence score to check the quality of the classification. This code uses a linear analysis (i.e., it separates classes by lines or planes); the pretty scatterplot up top was generated using the iris data with quadratic analysis, which can be achieved in the code above by simply replacing “init_lda_analysis” with “init_qda_analysis”.
R allows for tons of manipulation of the analysis once it’s loaded, some of which has been built in to the DiscriminantAnalysis class (scatterplots, accuracy and significance testing, etc.).
Since we’re all about contributing to the open source community at Highgroove, I’ve packaged these methods into a gem called harlequin (cheesily enough after one of the irises in Fisher’s data). If you have a working copy of R on your machine, you can get the code above running by just adding the proper require
s for the gem. Feel free to check it out and help to make it more awesome!
Have you used clustering methods before? How do you deal with heavy-duty data processing in Ruby?
The post Classifying Data with Discriminant Analysis appeared first on Big Nerd Ranch.
]]>Highgrooves’s “bias towards action” rallying cry is no secret, and we try to abide by that rule whenever we can, whether we’re deploying or choosing where to go for lunch. An important corollary is that we try to bias towards making mistakes earlier rather than later, too. A fellow Highgroover captured it well by saying: “If you aren’t getting burned, you need to play with fire more.”
The post If you aren't getting burned… appeared first on Big Nerd Ranch.
]]>Highgrooves’s “bias towards action” rallying cry is no secret, and we try to abide by that rule whenever we can, whether we’re deploying or choosing where to go for lunch. An important corollary is that we try to bias towards making mistakes earlier rather than later, too. A fellow Highgroover captured it well by saying: “If you aren’t getting burned, you need to play with fire more.”
Learning what breaks a new system helps developers get a feel for when heavier infrastructure might be required (e.g., implementing background jobs if server-side code starts becoming too intensive). And it’s part of why we encourage new developers to deploy code on the first day: it’s better to learn how to fix production while you’re getting situated than when there might be no one around at the moment (especially in a ROWE).
But finding out when your system doesn’t break can be just as important. I’ve discovered security issues in the past by using unsafe code I “knew” shouldn’t work as part of an initial naïve solution or experiment.
Well-developed test suites, of course, are what make all of this possible, by telling you exactly what has broken and where. It’s also what made me initially come around to TDD; writing tests first means you don’t bias your own thinking to the code you’ve written, making it easier to consider what “breaking” a model or controller (for example) should look like in a test.
Do you encourage experimentation and “failing fast”? How?
The post If you aren't getting burned… appeared first on Big Nerd Ranch.
]]>Bayesian networks have proven extremely useful for classifying events and documents, reliability analysis, and in many other fields. Essentially, wherever a well-defined chain of causation given between many pieces of data exists, a Bayes net can help provide probabilities for the “hidden variables” of a system: in the cases above, for example, the category a document belongs to, the probability a system will fail if a certain component fails, etc.
The post Yet Another Benefit of Open Source appeared first on Big Nerd Ranch.
]]>Bayesian networks have proven extremely useful for classifying events and documents, reliability analysis, and in many other fields. Essentially, wherever a well-defined chain of causation given between many pieces of data exists, a Bayes net can help provide probabilities for the “hidden variables” of a system: in the cases above, for example, the category a document belongs to, the probability a system will fail if a certain component fails, etc.
As part of another project here at Highgroove, I’ve been developing a gem called glymour that learns a Bayes net’s structure automatically, which is important when it becomes impractical to manually define causal relationships (e.g. when taking into account dozens of different variables). Working on an open-source gem while relating it to a larger project has made me understand one of its many benefits: open source code is kind of like a constrained writing in which you are constrained to being as general-purpose as possible (and reasonable).
Writing a piece of open source software – especially, of course, a gem or some other kind of package/plugin – forces the coder to write in a highly modular and iterative way. Beyond the normal considerations of reusable code, DRY, etc., all possible use cases must be considered. For example, though I intend to use glymour mainly with ActiveRecord objects as stores for sample data, working on it as a gem had me quickly realize that others using it might be reading from a file, or user input, or many other scenarios. Thus, glymour instead uses a user-defined block for retrieving data, allowing for any of these possibilities. In turn, this made testing much easier (since a simple array could be filled with hashes of test data).
It’s been rewarding working on something intended for the public; we try to work open source as much as we can, and really digging in to an open project has helped me understand why.
Does writing open source software benefit your style? How?
The post Yet Another Benefit of Open Source appeared first on Big Nerd Ranch.
]]>As the youngest Highgroover, I don’t have quite as much reference to contrast a ROWE with a 9-to-5 type work environment. But I could tell within my first week here that working without a timeclock allows us to have an incredible amount of flexibility.
The post Getting Things Done (When the Mood Strikes) appeared first on Big Nerd Ranch.
]]>As the youngest Highgroover, I don’t have quite as much reference to contrast a ROWE with a 9-to-5 type work environment. But I could tell within my first week here that working without a timeclock allows us to have an incredible amount of flexibility.
A lot of writing about ROWE focuses on the way it handles time off: your weekend can be anytime, you can leave for a movie on a Thursday, etc. This seems to me like a silly way to sell the idea. A student who really wants to pass his classes bases his partying around his studying, not vice versa. So why schedule a work week around your weekend if you really want results? Because of this, when I started working for Highgroove I tried paying attention to how our work environment allowed us to get things done.
My first month here has definitely not disappointed. The main benefits, for me, have involved being able to work when I’m most productive. I’m a night-owl and conversely, my brain is frankly useless at 8 in the morning. Rather than sit and stare at a cup of coffee for an hour, I can start (and end) my day late and use time in the office to really work.
Similarly, we can work right through a period of increased productivity (“flow”) without worrying about hours. A couple of weeks ago, one of these moods struck me at about midnight (on a weekday). Worrying about the clock would most likely mean hastily jotting down some ideas before going to sleep (and hoping the ideas are as fresh the next day), just to be able to come in on time. Instead, I could stay up, code while in the zone, and sleep in knowing I had gotten the results I needed.
I don’t mean to downplay the awesomeness of ROWE’s non-work benefits. But having the ability to work when we’re truly productive makes work itself many times more satisfying (and more fun).
How do you make sure your time “in the zone” coincides with your time spent working?
The post Getting Things Done (When the Mood Strikes) appeared first on Big Nerd Ranch.
]]>