This morning I had the opportunity to chat with software engineers and data scientists at the AI Dev World Conference on a topic I just happen to be v...
This post is adapted from a talk I gave at DevExecWorld in San Francisco, titled: “In a World of Data Science, Why is So Much of Engineering Still Guesswork?”
You’re the leader of a software organization. Someone — maybe the CFO, maybe the CEO, maybe you yourself — asks what should be a simple question: How are we doing?
What’s your answer?
If you run sales, the answer is simple. You’ll talk deal pipeline and revenue growth. If you run marketing, you’ll talk lead acquisition and funnel conversion. If you run finance, you’ll talk P&L and balance sheet.
But what numbers do you use, what data do you point to, to show engineering performance?
To complicate the question a little, let’s assume that the teams across your organization use a variety of methodologies (Scrum, Kanban, etc.), as well as a variety of technologies. And remember: you’re explaining performance to a business audience. If your answer includes words like “story points” or “burndown,” you’ve already lost them.
Having spent most of my career building software companies, this isn’t an idle question for me. The answer, I think, lies in reframing what software delivery is about. At its most essential, software engineering is about taking ideas, engaging talented developers, and turning those ideas into working product. Ideas come in, we move them through our build steps, and what comes out is (hopefully) high quality, timely product.
Put another way, engineering is a pipeline: ideas enter, move through the pipeline of development, and exit as quality software.
So far, so basic. But this framing starts to unlock a way of both describing and measuring engineering performance, in language both business and technology leaders can understand. This is because we know how the performance of a pipeline should be measured:
There’s a lot of kinship between the way we measure the performance of a physical pipeline and the way we might measure the performance of a software pipeline. (It’s a big reason why lean manufacturing has had such an influence on the way we think about software delivery.) But the major difference historically has been this: physical pipelines can be instrumented for measurement. A software “pipeline” can’t, at least not without a lot of manual inputs and data gathering. We’re dealing with bits, not atoms.
This is what we set out to change. With Pinpoint we said, What if we could harness all the raw activity data that occurs inside the systems used to build software, and then used data science to turn that raw activity data into performance insights? Among other things, this meant deciding which measures — we call them signals, based on how we derive them — would best reflect pipeline performance.
Here’s what we came up with:
Taken together, these five signals give a fairly comprehensive understanding of how well engineering is performing:
To emphasize: we derive all of these instantly by harnessing the raw activity data in our Jira and GitHub and adding our own machine learning and statistical analysis. There’s no manual computation or people running around trying to collate information from different teams or systems.
Here’s a real example of how illuminating the software pipeline has helped us. When I looked at our pipeline performance over the last six months, I saw this:
Backlog-wise, we were closing more issues than we were opening, which is good. Our Workload Balance needed to be better, but it was trending in the right direction, having improved by 30% over the prior six month period. Both our Throughput and our Defect Ratio were strong and getting better.
But our Cycle Time—yikes.
Looking across all work types (enhancements, features, bugs, etc.), it was taking us an average of 102 days from start to completion. Worse, that was more than four times as long as in the prior six months! In a startup, innovation and speed are your competitive strength — at least they should be.
Digging in a little further, here’s what the product showed:
This is Pinpoint’s breakdown of Cycle Time by work type. The paler blue measures number of days in development; the darker blue is days in verification. So the actual bottleneck, across almost all work types, wasn’t in building but in testing.
In speaking with the teams, I learned we were dealing with a high number of handoffs, especially between our engineering and data science teams, who work to train to our models. This led to direct, specific action. For example, we moved from a legacy database architecture (MySQL!) to a real-time data streaming architecture. This meant any team could access the necessary data when and how it was needed, and freed the data science teams to build their own models without various engineering handoffs. This was further complemented by work we did to improve the flow of work through our pipeline.
The pipeline view let me see where we had room for improvement, and more importantly what needed to be done. In the not-to-distant future, our data science will go one better: instead of me clicking on a diagnostic view to understand why a given signal might be underperforming, I’ll receive a specific, prescriptive action to take to improve whatever it is that’s trending in the wrong direction. But that’s a topic for another post...
Founder & CEO
Meet Andrew. He’s our Director of Backend Engineering and is a member of Team Bolt ⚡️ who is currently working on buildi...
Meet Mike. He’s a Platform Operations Engineer here a Pinpoint and member of Team Bolt ⚡️ . who is currently working on ...