Wednesday, April 11, 2012

Scaling Agile Development With Lean



The engineering team at Change.org has been developing software with an Agile process since its inception, and that has taken us very far. However, as our team size has increased, we have run into a number of challenges scaling traditional Agile techniques. To address these challenges, we implemented a number of process improvements during the first quarter of this year, based on Lean approaches to product development. This post outlines the major process and tool changes we have experimented with, what we have observed as outcomes thus far, and a few of the improvements we plan to adopt in the second quarter.

Strengths and Weaknesses of “Traditional” Agile

The product development process we used through 2011 was basically a hybrid of Scrum and XP, and it has several key strengths. We have had daily stand-ups and weekly retrospective meetings since the team was first formed. Our engineers are fiercely committed to XP technical practices, including full-time pair programming, test-driven development, and continuous integration. We generally deploy features as soon as they are completed.


However, despite consistent and disciplined use of these Agile techniques, we began experiencing a range of challenges as the team size has increased. For example, a growing portion of the team’s time was being spent on post-deployment changes or fixes, rather than new feature development. Velocity gradually decreased as a result, and it appeared to take more effort to ship meaningful user-facing features than it had just a few months before with a smaller team. Work committed to at the beginning of a sprint would regularly overflow into the next sprint.


We set a very high bar early on for hiring technical talent, and as a result the engineers at Change.org have an extremely high level of technical competence. So we were pretty confident these were not problems with our technical expertise. The features that were completed by the engineers were solid, well-architected, and very well tested. The source of our problems appeared to be mostly process-related. We decided to enhance our Agile processes with a healthy dose of Lean Thinking.


Small, Cross-functional Teams Focused on Business Goals

The first major process change divided our (at the time) 12-person team of engineers into small groups of four to six. Each team was asked to focus on a specific business goal or KPI (such as virality, new users, or revenue) and the stories assigned to each team are related to that over-arching theme. These team goals are directly linked to our core, long-term business strategy. Each team even has a its own set of metrics that it uses to gauge its effectiveness over time as it ships features related to its goal. 


Also, where we had previously planned our work on a weekly sprint basis, allowing for rather abrupt changes in business priority from one week to the next, we now focus the team’s attention on specific long-running business objectives for a whole quarter. We still have our daily stand-ups and weekly retrospectives, but the notion of a “sprint” is starting to fade away.

The use of small, cross-functional teams is of course prominent within the Scrum community. But, the relentless focus on business-related KPIs is, in our reading of the literature, is more prominent in Lean thinking.


Constraining Work In Progress (WIP)

A second major change we implemented was to limit the work in progress (WIP). Since our tool was not able to handle WIP constraints, we migrated from Pivotal Tracker to AgileZen. Pivotal Tracker enforces a rather rigid workflow with fixed states, based presumably on Pivotal Labs’ own internal development process. Our process is different from theirs, and we found it hard to get their tool to conform to our way of working.


Transitioning from Scrum to a more nuanced Lean / Kanban process required us to use a tool that supported, at a minimum, these three features:

  1. Configure work “states” according to your unique workflow.
  2. Constrain the work-in-process (WIP) allowed for any state in our workflow.
  3. Capture and report Lean metrics such as cycle time, lead time, and throughput, rather than just Scrum-style velocity.

After reviewing nearly 20 Kanban-based tools, we settled on AgileZen. It’s not perfect, but it was definitely the most simple, and it has all three of the above requirements.

Classes of Service and Explicit Policies

Another major change was to completely abandon the notion of a distinction between features and bugs, with one arbitrarily considered “bad” and other “good”. All of these items are simply requests to change the software in one way or another. They all provide some amount of value for the business or the end user (or ideally, both) that can be quantified, and they all come with some level of business and technical risk and a cost of development and maintenance.

Instead of characterizing work arbitrarily as bugs, features, or chores, we introduced four or five specific classes of work. We use color coding in Agile Zen to reflect each class. Since we use a Kanban pull-system for selecting work, these classes of service allow work items to traverse our workflow in order of business value and impact.


Standard: Most items are standard, and are completed in a normal FIFO order.
Fixed Date: Items that have a fixed delivery date are prioritized over standard items.
Expedite: Extremely urgent items are allowed to skip ahead of Standard and Fixed Date items.
Intangible: Mostly “technical debt”, these items are sprinkled throughout the backlog evenly.

Results Promising So Far

As we reach the end of the first quarter of 2012, we already seeing positive results in several areas:
  • Focused feature teams: The engineers are feeling more engaged in the overall business goals of the organization, and are able to see direct impacts on our KPIs for each of the features they complete.
  • WIP limits: The throughput of the product team has increased substantially, as engineers are focused on fewer items at the same time.
  • Classes of Service: External stakeholders get faster response times to their urgent requests without the engineering teams feeling like they are constantly interrupted.
We have already decided to implement a number of new improvements to augment everything described above, and we will post updates of what we've learned periodically. As we move into Q2, for example, we are merging our design and implementation phases into one iterative design and engineering cycle, inspired by the Lean UX movement. We have also completely abandoned the concept of "sprints", moving toward a more continuous delivery model. Stay tuned...

Tuesday, March 13, 2012

A simple Ruby idiom for time-constrained batch operations

Sometimes wall-clock time matters. As web developers, we care about user experience and frequently obsess over things like time-to-first-byte and how long it takes for the browser to start drawing the page. We pore over NewRelic RPM graphs and slow transaction traces looking for that edge that will improve user experience. Sometimes we even get low-level and invent handy tools for profiling individual method calls.

Idiosyncratically, there are certain situations where minimizing individual execution times comes at a high cost. Being "small" in the individual case means you're slow overall. Over the years I've run into one of these points again and again: batch purging of data. We've all run into it: a stakeholder tells you she really needs all that data and needs it stored forever. You oblige and, a few months later, show everyone the 400GB table that has never been queried for more than the last week's data. Conversations ensue, deals are brokered, and finally you get permission to purge everything beyond a current one-month working set. Huzzah!

But what to do with that existing data? Just deleting it in one fell swoop would probably cause a brutal table lock for hours. You could swap out the table and back-fill what is needed, but your ongoing purge operations still need to be handled gracefully and without blocking other write operations for too long.

In this situation, wall-clock time matters in a peculiar way: you need to delete in batches, but you need to find the sweet spot between doing too much or too little in a given batch.

For that, I use a time-constrained variation on the familiar loop construct:



The concept is rather elegant. You create a loop-like block that does work in a given batch size. Rather than using Ruby's built-in loop, use timed_batch_loop and provide a couple of parameters: the target wall clock time, an initial batch size, and a batch stepping factor. Each time your block is executed this code will keep a stopwatch on it. If it finished fast, the batch size for the next pass will be increased. If it took too long, the batch size will be decreased.

Here's an example use for purging data as described above:



This example executes the block starting at a batch size of 1000. Each SQL DELETE is timed and the batch size is adjusted relative to the target time of a quarter of a second (in steps of 100). You'll also notice the old trick of checking how many rows were affected by the operation and breaking out of the loop when it deletes fewer rows than the LIMIT specifies. This is done with only a slight abuse of ActiveRecord's update_sql (a delete is an "update"-ish, right?).

The advantage of this approach is that it is self-adjusting and adapts to a changing environment. Add a couple of indexes that make the delete operations more costly? The batch sizes shrink. Put some snazzy new disks under you DB? The batch sizes go up. The whole Internet shows up to your site and starts gobbling up IOPS? This purge job gets out of the way and shrinks the work it is doing per-pass.

This idiom is helpful as is, but could certainly be extended. You could improve on the stepping factor by using the over/under percentage as a batch-size multiplier rather that this simple +/- method. You might store the converged-upon best batch size in order to start at the sweet spot on subsequent runs.

In the end, you end up doing potentially costly batch operations in a manner that your DBA and time-sensitive users will love you for.

Sunday, March 4, 2012

Find slow methods in your Ruby code with method_profiler

This past week, Alain Bloch and I worked on improving the performance of our petition signing flow. The first step of the process was to identify the bottlenecks in the code, and to have a good way of measuring the change as we tweaked things.

Initially, we threw in a little method called measure_it which simply wrapped the code in each method and ran the block, capturing the time before and after and outputting the difference. I started looking at various profiling tools including ruby_prof, perftools.rb, and the built-in Rails profiling. They were all useful but didn't give us exactly the report we wanted.

I ran with the idea of measure_it and created a gem called method_profiler, which allows you to observe an object, track the wall clock time spent executing its methods, and pretty print the results in a sortable table. We hooked this up to our test suite and could see the impact of changes we made each time we ran the tests.

method_profiler is itself not as low-impact as tools like perftools.rb, as it modifies the observed classes at runtime and aliases its methods to versions timed by the standard library's Benchmark class. But when you need a quick and dirty estimate of the performance of an object's methods, method_profiler does the trick.

Run gem install method_profiler to get started. The project is open source and available to all via the MIT license. Take a look at the details of how to use it in the README and documentation on GitHub.

Wednesday, February 29, 2012

Dealing with translating attribute names for i18n

While embarking on the vast challenge that is internationalizing a site you will encounter many unforeseen gotchas. Recently QA found that when an error message was being displayed for a certain field in a form. It was translating the error message properly but not the field name for instance, "first name no puede estar en blanco". This is easily handled by adding an ActiveRecord translation file for the fields.
However if you have a sizable app then you could find yourself spending a lot of time digging through all of your models and finding out what fields have validations and in turn copying those into a translation file. Enter validation reflection added in Rails 3. This sweet little nugget allows you to see what attributes in your model have validation. I wrote a script to aid you in finding these fields and placing them into a properly formatted yaml file. Enjoy!

Rails.application.eager_load! # load everything

subclasses = ActiveRecord::Base.subclasses
models = []
while !subclasses.empty?
  subclass = subclasses.shift
  models += subclasses += subclass.subclasses
  subclasses.uniq!
  models.uniq!
end

all_attributes = {}

models.each do |model|
  attributes = {}
  model.validators.map(&:attributes).flatten.uniq.each do |attribute|
    attributes[attribute.to_s] = attribute.to_s
  end
  all_attributes[model.to_s] = attributes unless attributes.blank?
end

File.open('activerecord.yml', 'w') {|f| f.write(all_attributes.to_yaml)}

Thursday, February 24, 2011

Cutting Gems for Change

Yesterday we open sourced two libraries around our recent facebook integration: open_graph and facebook-stub. They're centered around quick development and more importantly testing of our facebook user interaction. Let's take a stroll around these libraries.

OpenGraph

You run a website. You want to acquire new users. I recently heard this facebook website has a lot of users. They also have a lot of developers. (Side props to my friend Rob Arnold who recently landed a job there.) Back to me. Now back to Change. We decided to integrate Change.org with the facebook Graph API, allowing users to more easily create accounts on Change, and sign petitions about subjects they care about. That is after all our main goal, to allow real humans to make an impact on their community.

We evaluated many options for integrating with facebook, different ruby libraries, client side only, etc. We decided that nothing offered exactly what we wanted. I like working with "invisible" libraries. Libraries which do magic, and don't tell me about it, until I gem open. All facebook interaction is done through passing cookies from the browser to the controller. So it was immediately apparent that ActionController must be responsible for initializing our FBOpenGraph object. It then validates the cookie and allows you call methods which automatically act on the users behalf.

Testing is a critical to the success of any application. So OpenGraph had to be testable. OpenGraph automatically includes methods to stub out your application. In your testing environment it will automatically return you my profile information, my friends, etc (maybe I should change that to Dumbledore or somebody else).


facebook-stub

Did I mention testing already? Well this is for testing. Facebook provides a Javascript SDK which provides simple access functions for logging in, getting sessions, etc. We use their JS SDK to allow users to login to facebook and create an account with Change.org. We wanted to seamlessly test our javascript code against an appropriately responding FB object. So we wrote our own. It is a drop in replacement which emulates the FB object and provides back entries for setting the state manually. Like a test should. We can run full stack integration tests through the use of our FB stub.


There's been a lot of documentation reading. Which is a nice change from `gem open`, the usual form of documentation I wind up reading. We now have official change accounts on gemcutter so expect more gems for change (GFCs) in the future.

And now for something completely different. Here's a cool Pendulum music video: http://www.youtube.com/watch?v=tEPB7uzKuh4

New Tech Blog

I've never really done much technical blogging. And by much, I mean none. But I'm here to welcome you to the one and only Change.org Engineering Tech Blog of Sunshine. We have a fantastic knowledgable technical team, with profiles listed at http://www.change.org/about/team. Stay tuned as we implement social change through the authorship of rubygems, scaling of key-value document stores, innovation of javascript plugins, and the general adventures we have with development.