This morning I had the opportunity to chat with software engineers and data scientists at the AI Dev World Conference on a topic I just happen to be v...
In a recent Quora Session, I was asked about the performance metrics that matter most to a business’s success. For a software company, this means both the metrics that tell you about the performance of the business as a whole, as well as the metrics that tell you specifically about the performance of engineering. There’s a pretty stark divide between the two categories. We have a fairly established body of metrics that tell us about the performance of a software businesses. But for software engineering, it’s been a different, much hazier story. Time to change that.
On the business side, we consider things like annual contract value (ACV), life-time value (LTV), and customer acquisition cost (CAC). These kinds of metrics have a few things in their favor:
Engineering should work the same way: with a small set of key insights that tell anyone, technical or not, how we’re performing.
Imagine software engineering as a pipeline. Ideas come in, and what goes out is (hopefully) quality software to solve a problem or create opportunities for the business. Framed this way, what business leaders want to know about engineering is: how much do we get through the pipeline, how quickly do we do it, how good is what comes out?
There are the five metrics I use to answer these questions:
Taken together, these demonstrate the end-to-end performance of our software pipeline, in language anyone can understand.
With Backlog Change, we’re looking at the number of ideas and requests stacking up at the front of the engineering pipeline. By watching how the backlog is changing (Growing? Shrinking? By how much?) we get a clear view of whether we’re keeping up with demand. More importantly, by following the trend in backlog change month over month or quarter over quarter, we can track and show whether things are getting better or worse.
Throughput is a little more involved. It consists of two parts. The first is the quantity of work getting done. This is measured by the number of issues each person completes every month. Per person, because otherwise larger teams would have an advantage over smaller ones. Per month, because it’s easier to understand and compare teams in 30-day increments than it would be with larger time frames.
The second part of Throughput is the complexity of the work. In software engineering, no single issue is identical in effort to another issue. Traditionally, the complexity of a given piece of work is signaled through something like story points (t-shirt sizes? fibonacci?). But these abstractions exist to remove the time element. Instead, we determine the actual average time, measured in days elapsed, required to complete an issue. In pipeline terms, this is the Cycle Time, which here works as a proxy for size or complexity. The number of issues completed by each person every month, multiplied by the average Cycle Time, gives you Throughput.
While Cycle Time is a variable in Throughput, I like to see it on its own as well. As a standalone, Cycle Time is essentially our speedometer. How quickly are we completing work, and how is that pace changing over time?
The above three signals address the how much and how fast aspects of our pipeline. What about how good? “Good” here means not only the quality of what we build, but how efficiently the team is working. If we produce bad software fast, that’s still a problem; if we produce good software in a way that’s expensive or unsustainable, that’s also a problem.
For the question of efficiency, I use Workload Balance. Workload Balance uses the 80/20 rule as a means to understand how evenly work is spread across a team. Specifically, Workload Balance evaluates what percent of the available team capacity is involved in delivering 80 percent of the work. It’s a data-driven way of seeing which teams might be underutilized, and where we may be at risk for burnout. It also obviously informs when to hire.
What about understanding the quality of what we build? This is where Defect Ratio comes in. Defect Ratio is simply the number of closed defects over the number open. It tells me whether we’re creating more bugs than we’re closing. If so, there’s a quality issue somewhere in the pipeline.
You can see the check and balance among these signals. For example, if your Backlog Change is negative (meaning, you’re burning through your backlog), your Throughput and Cycle Time are holding steady or improving, your Workload Balance is even but your Defect Ratio is rising, this all indicates speed is beginning to outstrip what the organization can sustain, with quality taking a hit.
Of course, there’s a sixth signal I care about: cost. Specifically, the annual labor cost of engineering. What the above five metrics give me is a way to justify further investment, based on quantified performance. It makes investment discussions pretty straightforward. If we’re getting a lot done, at good speed, with high efficiency and quality, then I can explain to anyone—my CFO, the board, etc.—why it’s time to grow engineering.
Now, while I’m a technical CEO and may be able to read a burndown chart (really, I still can’t), that shouldn’t be a requirement for understanding or explaining how engineering is performing. These five metrics, plus cost, give an intuitive, cooperative, portable set of measures to convey the performance of any engineering organization, regardless of its methodologies, stack, or architectural choices...and to any audience.
Founder & CEO
Meet Andrew. He’s our Director of Backend Engineering and is a member of Team Bolt ⚡️ who is currently working on buildi...
Meet Mike. He’s a Platform Operations Engineer here a Pinpoint and member of Team Bolt ⚡️ . who is currently working on ...