This morning I had the opportunity to chat with software engineers and data scientists at the AI Dev World Conference on a topic I just happen to be v...
Software engineering has more data about the way it works than probably any other organization. We’ve spent the last couple of decades automating nearly every part of the build process, meaning that just about everything we do—working issues, committing code, testing quality, releasing software—is captured in one system or another. The trouble is, much of this activity data gets treated as a forgotten byproduct, like wood shavings on a factory floor.
Now: imagine a world where this data was coupled with the right intelligence. What kinds of questions about engineering performance could we answer? Here are four examples.
A simple enough question. But if you lead an engineering organization, it doesn’t usually have a simple answer. There are too many variables. What kind of work? Bug fixes? Enhancements? KTLO? What do you mean by “well”?
To answer, we have to first demystify what engineering does. Engineering turns ideas into working software. We’re a pipeline: ideas come in, software goes out. As with any pipeline, we want to show much goes through, how fast, and with what quality.
We can answer how much and how fast using Throughput. Our unit of measure is issues—specifically, the number we complete per person, per month. Because no single issue is identical in effort to another issue, we also look at the actual average time, measured in days elapsed, required to complete an issue. In pipeline terms, this is the Cycle Time, which works as a stand-in for size or complexity. Multiply the number of issues completed per person per month by the average Cycle Time, and you have Throughput.
The cleanest, clearest answer to quality is Defect Ratio. Defect Ratio is the number of closed defects over the number open. It tells you whether you’re creating more bugs than you’re closing. If so, there’s a quality issue somewhere in the pipeline.
The trends between these two signals is important. For example, if Throughput is holding steady or improving, but Defect Ratio is rising, this indicates speed is impacting quality.
Here the data gets a tad more complex. There are a myriad of variables that might impact a team’s ability to deliver its work on time. The type and priority of the work, the people that make up the team, the type and amount of other work on their plate, even the size of their backlog… One thing’s certain. The best way to predict the likelihood of work getting done on time is to analyze past actuals for similar work—something we have based on the aforementioned Cycle Time and Throughput.
All of this takes machine intelligence. We use it to make an automatic prediction of the time remaining to complete the assigned work, based on the variables above. Our predictive model is applied at the lowest, or most granular issue level. This means that where parent or grouping issues are used (e.g. stories, epics), we traverse the entire issue tree, calculating Forecast Work Months for each child issue, and rolling up from there.
We also analyze the amount of concurrent work a team has. It’s possible our modeling will suggest they have enough time to finish the project in question by the due date—but only if it’s their top priority. If other work gets in the way, the date is at risk.
Which delivery locations are providing the best return? For distributed engineering organizations, this is a question of where to place our not-inexpensive bets.
Geography is one of the cohorts we analyze for performance. In this case, we’re interested in measuring how efficiently people grouped by region perform. Most of the signals we analyze concern coding efficiency—how quickly code moves through the pipeline, and with what quality. We compare this performance against the actual average annual labor cost of the region. (In a recent real-world example, one customer CTO discovered that his top performing region was Rome, where they had an office comprised of two engineers. Both were promptly promoted.)
Maybe you feel like a particular location is underperforming, relative to others. Not only can you confirm that it’s actually the case (or discover it’s not true?), but you can answer the more interesting question: Why? This transforms a largely invisible problem into something actionable.
No sane person looks at a month of data in isolation and concludes things are going well or badly. When it comes to understanding performance of any kind, what matters is the trendline. This—the ability to show how performance has changed, month over month, quarter over quarter, year over year—is maybe the hardest nut to crack, even for engineering organizations that take performance measurement seriously.
When we harness the raw activity data of engineering delivery systems, we capture the entire historical record of usage—meaning we can derive engineering performance signals back to the first moment those systems were put in place. This means we can show where and how the entire organization is improving (or struggling) on any of the signals that we surface. Just as importantly, there’s granular data about trends on the location, team and personal level.
Trend information is often more important that a snapshot. When comparing the performance of two teams, for instance, it’s more useful to know that one has been declining and one improving than to be told both teams are roughly the same.
The ability to answer these four questions gives engineering leaders the information they need to make better strategic decisions, to give better coaching and mentoring, and to advocate for the department in conversations with other business stakeholders. With data, we have fewer conversations that boil down to, “Just trust me.”
This morning I had the opportunity to chat with software engineers and data scientists at the AI Dev World Conference on...