Four Key Reasons to Learn Markdown
Back-End Leveling UpWriting documentation is fun—really, really fun. I know some engineers may disagree with me, but as a technical writer, creating quality documentation that will...
In my last post, we dove into the question of whether small methods are really a good thing. We looked at some code that fellow nerd Nathan had rewritten to consist of only a few long but clear methods. We followed a few blocks of code I extracted into smaller methods, and we found that method extractions could be good or bad depending on whether they formed a good abstraction.
But method extractions weren’t the only refactoring I performed. I also extracted a few types — specifically, two modules and a class. How did these extractions turn out? One criterion Nathan proposed for evaluating code is that “context switches should be few in number and short in distance.” Switching between types is a much greater “distance” than switching between methods in the same type. Usually that switch requires navigating to another file, and even in the rare case that the types are in the same file, it requires moving to a more distant part of the module hierarchy. When is extracting a type worth this extra effort to navigate?
Let’s start back at the beginning, the primary method of the BenchReport
module:
def self.build(report_date_range)
iteration_date_ranges = calculate_iteration_date_ranges(report_date_range)
users = User.active.preload(:services)
user_schedules = user_schedules_for(users, period: report_date_range)
service_availabilities = calculate_service_availabilities(users, user_schedules, iteration_date_ranges)
create_json(iteration_date_ranges, service_availabilities)
end
As I was reading through the methods that build
calls, the first one that seemed like it might belong in a separate module was create_json
. The first four lines of build
all relate to setting up data that will be used for the report. The create_json
line is different: it renders the report instead.
There’s also a difference in its implementation. Here it is:
def self.create_json(iteration_date_ranges, service_availabilities)
{
report: {
iteration_entries: iteration_date_ranges.map { |iteration_date_range|
{
start_date: iteration_date_range.start_date,
end_date: iteration_date_range.end_date,
service_entries: service_availabilities.keys.sort.map {|service_name|
availabilities = service_availabilities[service_name]
iteration_vals = availabilities.fetch(iteration_date_range){ blank_avail_val }
user_availabilities = iteration_vals.fetch(:availabilities)
user_ids = iteration_vals.fetch(:user_ids).sort
{
service: {name: service_name},
user_availability: user_availabilities.reduce(0, &:+),
user_ids: user_ids,
calculation: calculations_from(user_availabilities),
links: {
calendar: capacity_interval_calendar_path(user_ids: user_ids, start_date: iteration_date_range.start_date, end_date: iteration_date_range.end_date)
},
}
}
}
},
}
}
end
Of the three methods create_json
calls, two of them are called only by create_json
. So there are really two groups of methods in the module: the data generation methods, and the rendering methods. This would suggest that rendering is a separate responsibility that could be extracted.
We can easily perform this extraction by moving the create_json
method (and the methods it calls) into a new Renderer
module:
module BenchReport
module Renderer
def self.create_json(iteration_date_ranges, service_availabilities)
{
report: {
iteration_entries: iteration_date_ranges.map { |iteration_date_range|
{
start_date: iteration_date_range.start_date,
end_date: iteration_date_range.end_date,
service_entries: service_availabilities.keys.sort.map {|service_name|
availabilities = service_availabilities[service_name]
iteration_vals = availabilities.fetch(iteration_date_range){ blank_avail_val }
user_availabilities = iteration_vals.fetch(:availabilities)
user_ids = iteration_vals.fetch(:user_ids).sort
{
service: {name: service_name},
user_availability: user_availabilities.reduce(0, &:+),
user_ids: user_ids,
calculation: calculations_from(user_availabilities),
links: {
calendar: capacity_interval_calendar_path(user_ids: user_ids, start_date: iteration_date_range.start_date, end_date: iteration_date_range.end_date)
},
}
}
}
},
}
}
end
# ...called methods trimmed...
end
end
Calling create_json
at its new location requires only a trivial change in the build
method:
def self.build(report_date_range)
iteration_date_ranges = calculate_iteration_date_ranges(report_date_range)
users = User.active.preload(:services)
user_schedules = user_schedules_for(users, period: report_date_range)
service_availabilities = calculate_service_availabilities(users, user_schedules, iteration_date_ranges)
- create_json(iteration_date_ranges, service_availabilities)
+ Renderer.create_json(iteration_date_ranges, service_availabilities)
end
I liked how the Renderer
extraction isolated responsibilites. Nathan agreed, but seeing a call to another module in the middle of the build
method made him wonder if he needed to look through the rest of the module for other calls to external modules.
As we talked through it, we realized that the build
method (and the BenchReport
as a whole) was really operating at two different levels of abstraction. It performed several low-level operations to gather data, but then it made a high-level call to Renderer
to render it. To get the levels of abstraction equal, we decided to extract the rest of the BenchReport
module into a new module as well, leaving only a slimmed-down build
method in BenchReport
:
module BenchReport
def self.build(report_date_range)
- iteration_date_ranges = calculate_iteration_date_ranges(report_date_range)
-
- users = User.active.preload(:services)
- user_schedules = user_schedules_for(users, period: report_date_range)
- service_availabilities = calculate_service_availabilities(users, user_schedules, iteration_date_ranges)
+ iteration_date_ranges, service_availabilities = Calculator.calculate(report_date_range)
Renderer.create_json(iteration_date_ranges, service_availabilities)
end
-
- def self.calculate_iteration_date_ranges(report_date_range)
- # and the rest of the methods
end
Here’s the main method of the new Calculator
module. Note that whereas the last line of build
was a call to Renderer
, we now create an array to return instead:
module BenchReport
module Calculator
MINIMUM_DISPLAYABLE_AVAILABILITY = 50
def self.calculate(report_date_range)
iteration_date_ranges = calculate_iteration_date_ranges(report_date_range)
users = User.active.preload(:services)
user_schedules = user_schedules_for(users, period: report_date_range)
service_availabilities = calculate_service_availabilities(users, user_schedules, iteration_date_ranges)
[iteration_date_ranges, service_availabilities]
end
# ...
end
end
With this change, all BenchReport
does is delegate to two other modules at the same level of abstraction: Calculator
to generate the data and Renderer
to render it. This seems quite small, but Nathan and I both like this separation of responsibilities.
In addition to these two module extractions, I also extracted a class related to the calculate_service_availabilities
method we focused on in my last post. Here’s that method again (with a few changes from last time due to unrelated minor refactorings):
def self.calculate_service_availabilities(users, user_schedules, iteration_date_ranges)
service_availabilities = {}
users.each do |user|
service = primary_service_name(user)
next if primary_service.nil?
iteration_date_ranges.each do |iteration_date_range|
user_availability = user_availability(user, user_schedules, iteration_date_range)
if displayable?(user_availability)
service_availabilities[primary_service] ||= {}
service_availabilities[primary_service][iteration_date_range] ||= blank_avail_val
service_availabilities[primary_service][iteration_date_range][:availabilities] << user_availability / 100.0
service_availabilities[primary_service][iteration_date_range][:user_ids] << user.id
end
end
end
service_availabilities
end
If you recall, the nested loops and conditionals bothered me, and I tried to extract some methods, but it didn’t work out well. What else could I try?
Well, other than the loops and conditionals, this method’s other responsibility is to initialize and populate the nested hash structure. When I realized that, it occurred to me that this behavior could be moved into an object that wrapped the hash. This would allow us to express what we’re accomplishing (adding a service availability) without exposing how we’re doing it (building up a particular hash structure).
I decided to name this class ServiceAvailabilities
. Here it is:
module BenchReport
class ServiceAvailabilities
MINIMUM_DISPLAYABLE_AVAILABILITY = 50
def initialize
@service_availabilities = {}
end
def add(user, availability, service, iteration_date_range)
return unless displayable?(availability)
service_availabilities[service] ||= {}
service_availabilities[service][iteration_date_range] ||= blank_avail_val
service_availabilities[service][iteration_date_range][:availabilities] << availability / 100.0
service_availabilities[service][iteration_date_range][:user_ids] << user.id
end
def to_h
service_availabilities
end
private
attr_reader :service_availabilities
def displayable?(user_availability)
user_availability >= MINIMUM_DISPLAYABLE_AVAILABILITY
end
def blank_avail_val
{availabilities: [], user_ids: []}
end
end
end
Note all the knowledge we were able to move out of Calculator
:
That’s a lot of knowledge that calculate_service_availabilities
doesn’t need to have anymore. Here are the changes to that method:
def self.calculate_service_availabilities(users, user_schedules, iteration_date_ranges)
- service_availabilities = {}
+ service_availabilities = ServiceAvailabilities.new
users.each do |user|
service = primary_service_name(user)
next if primary_service.nil?
iteration_date_ranges.each do |iteration_date_range|
user_availability = user_availability(user, user_schedules, iteration_date_range)
- if displayable?(user_availability)
- service_availabilities[primary_service] ||= {}
- service_availabilities[primary_service][iteration_date_range] ||= blank_avail_val
- service_availabilities[primary_service][iteration_date_range][:availabilities] << user_availability / 100.0
- service_availabilities[primary_service][iteration_date_range][:user_ids] << user.id
- end
+ service_availabilities.add(user, user_availability, service, iteration_date_range)
end
end
- service_availabilities
+ service_availabilities.to_h
end
That’s pretty small; does calculate_service_availabilities
still know enough to pull its weight? It knows:
ServiceAvailabilities
object to store this dataSo there’s still a good amount here. We’ve split out about half of its responsibilities into another object.
(Note: the fact that the ServiceAvailabilities
abstraction is thrown away after this method, rather than continuing to be used during rendering, is a weakness. If we had decided to keep this refactoring, that’s something we would want to explore further.)
I really liked this extraction, but Nathan didn’t prefer it. Part of this is probably because Nathan was the only one of us who experienced the previous version of the code, where back-and-forth message passing was genuinely difficult to follow. But I think there are some more interesting reasons for our differences too.
Nathan has been experimenting with stateless functions and bare data structures, and he saw this BenchReport
as a great fit for that approach. To Nathan, building up a hash is the whole purpose of Calculator
, which is then rendered as JSON and used to draw a graph. The core of what he liked about his initial design was that the hash structure was so visible in the calculate_service_availabilities
method. He wanted to minimize the context switches required to understand the makeup of the data structure and the steps that populated it with data.
I can understand that view, but lately I’ve been doing an experiment of my own, influenced by authors like Kent Beck, Rebecca Wirfs-Brock, and Uncle Bob Martin. They argue for the advantages of smaller objects that know less than the procedure but more than the behaviorless data constructs. These advantages include hiding implementation, separating out responsibilities, and facilitating reuse and recombination. This emphasis on proactively creating small objects comes from the early, idealistic days of object-oriented design, when the benefits of small objects were more well-understood than the costs. These days, a more moderate recommendation (from authors like Sandi Metz) is to leave code that doesn’t change procedural, but when requirements come in that force you to change it, apply OO design principles to separate out the parts that change for different reasons (the responsibilities).
So, in the end, I look at ServiceAvailabilities
as an experiment in what a smarter data structure could look like — an experiment that shouldn’t go to production yet. Even so, I think Nathan and I have different thresholds for deciding when the benefit of extracting ServiceAvailabilities
outweighs the cost…and that’s okay! Refactoring means that whatever decision we make now, we can always reverse it if necessary when we have more information. Plus, all our conversations about this refactoring have helped me understand class extractions much more deeply, and that makes it all worthwhile.
So, when is it worth it to extract a class or module? The thought process is similar to deciding when to extract a method: extract when you have a good abstraction, one that hides details you don’t need to consider at that point in the code. However, because the cost of context switching between types is so much higher than between methods, the value you get from abstracting away those details also needs to be greater to be worth it. The exact tipping point between the cost and the benefit depends on your wiring and that of your team.
Does your code have long procedures that might make more sense if details were split out? Or are you chasing after details in multiple files that might be better inlined? Do you have behaviorless data constructs that could benefit from adding some intelligence? Or are your classes just hoops to jump through to get to your data? What would it look like to switch your approach on any of the above? Give it a try, see how you feel about it, and then discuss it with someone on your team. That’s the heart of refactoring.
Writing documentation is fun—really, really fun. I know some engineers may disagree with me, but as a technical writer, creating quality documentation that will...
Humanity has come a long way in its technological journey. We have reached the cusp of an age in which the concepts we have...
Go 1.18 has finally landed, and with it comes its own flavor of generics. In a previous post, we went over the accepted proposal and dove...