Pinpoint Engineering

The TL;DR from my session on AI and EngOps at AIDe...

This morning I had the opportunity to chat with software engineers and data scientists at the AI Dev World Conference on a topic I just happen to be v...

Calculating the cost of legacy code

Technical debt is one of those broad engineering matters that everyone agrees should be addressed, but which contains so many flavors and interpretations that it’s often difficult to get momentum around what, exactly, we’re addressing and why.

Let’s start with what we know, or at least feel instinctively: that older languages are more burdensome to maintain. Whatever they may deliver in stability (having been hardened over their comparatively longer application lives), they’re more brittle, generally more verbose, and as they age, more esoteric, known to an ever-decreasing number of people inside the organization.

The question is, how do you quantify the burden of legacy code?

We decided to examine this through the lens of Cycle Time. Cycle Time is a signal we derive from work and ticket systems like Jira, which measures the average number of days from start of work to completion. Because we also examine the metadata of code commits linked to these work items, we can calculate Cycle Time by code repo and programming language.

For this analysis, we chose one of our customers with a wide variety of programming languages (more than fifty). When we looked at the Cycle Times across the languages with a thousand or more linked work items, there was a clear outlier: Java

We found that in this particular company, the average Cycle Time for work done in Java was 12 days longer than the average for all languages:

Cycle Time example

This begins to quantify what we instinctively feel: that working with an older language like Java is less efficient than newer, more modern languages. But to give it real punch, we need to put it in terms business leaders will pay attention to. We need to calculate the cost in dollars.

So: In 2018, this particular company completed roughly 27,000 work tickets, with an average Cycle Time of 16 days per ticket. That means an annual work capacity of 432,000 days (27,000 tickets * 16 days). To convert to dollars, we divide by their approximate annual R&D labor cost, $40 million, to give us $93 per capacity day (432,000 capacity days / $40 million).

Ten percent of the 27,000 tickets was for work in Java codebases. Because our Cycle Time data above show that work will take on average 12 days longer to complete, this means an extra 32,400 capacity days spent. Or in dollar terms, an extra $3 million per year:

Calculating the Cost of Legacy Code

Now, obviously it isn’t possible to turn a bunch of old Java code into something newer overnight. But it is possible—in fact, it’s essential—to know the carrying cost of doing nothing. With that data, we can determine a much truer cost and benefit, and to decide whether action is warranted.

In this case, action could mean starting to move the logic to a more contemporary language. Alternatively, the action might start by looking at the parts of Cycle Time—the wait time, the work time, the verification time—to understand more about where the bottlenecks occur (a topic we’ll discuss in a future post).

Each company’s history informs its cost of legacy code. What’s yours?

Learn more about how Pinpoint helps improve that story by reducing technical debt's impact and prevents more debt from incurring using machine learning. 

Related Post:

The TL;DR from my session on AI and EngOps at AIDevWorld

This morning I had the opportunity to chat with software engineers and data scientists at the AI Dev World Conference on...

Our 12 go-to Python libraries for data science

We use data science — machine learning, natural language processing, etc. in Pinpoint to correlate data from all the too...

Two data science life hacks to improve your workflow

Data science is fundamental to Pinpoint’s application. But, like most startups, we are still in the process of building ...

Subscribe to the Pinpoint Engineering Blog

cat developer