This morning I had the opportunity to chat with software engineers and data scientists at the AI Dev World Conference on a topic I just happen to be v...
I’m a creative, visually-oriented person, which is why I’m studying statistics with a focus on data visualization. When I saw an opportunity through the University of Waterloo’s internship office to work on data visualization with Pinpoint, I jumped at the opportunity. Not only did I want to work on projects that I thought would move my career forward and complement the work I’m doing in the classroom, but I also wanted to expand my professional experience outside of Canada. Pinpoint was a perfect fit.
My fall internship with Pinpoint was focused on refining and perfecting our Performance Score metric. Pinpoint believes in this particular metric because engineering teams work differently (Agile, Kanban, different languages, repos etc.) and all have a different way to answer “how are we doing?” as a result. We want to help teams find a way to talk about performance with their peers and the rest of the business that is consistent across teams and methodologies, without having to change the way they work. This is where the data science modeling I worked on comes in.
The Performance Score will help teams understand what makes them high-performing across a number of data points in order to improve processes and provide training for a more effective engineering organization, similar to what Sabermetrics did for the Houston Astros.
It’s worth noting that Pinpoint’s Performance Score continues to be refined and is not yet in production. I worked on several iterations of a model using data from our own team, each one improving on the one before, and getting closer to something that can fairly and accurately measure overall performance of an engineering organization.
The main focus of my internship was to improve the way we calculate the Performance Score by testing various data science models.
My first takeaway: The final result depends on what you’re looking for.
When I first started working on Performance Score, it was mostly based on the lines of code written. More lines meant a higher score. I quickly realized that this meant that people whose jobs involve writing lots code will always score higher. In other words, people who do maintenance, specialize in debugging, review code — these people would always have a lower score simply because the type of work they do is not purely focused on writing code. A manager, for example, would get a low score if we only looked at lines of code written.
The metrics we looked at included Code Ownership, Average Late Days, Cycle Time, New Feature Percent, Rework Percent, Traceability, Total Changes (Changes per commit X Number of Commits) and Issues Completed. Even though that’s more nuanced than just the number of lines of code written, it’s still heavily skewed towards people whose roles involve writing code rather than reviewing code or making strategic decisions.
The key to any successful metric is to make sure you’re comparing apples to apples. In the next iteration, we started comparing people not across entire companies, but only to members of the same team to account for differing responsibilities on different teams.
After every iteration, I would talk with the rest of the data science team and get feedback about what I could do to improve the model further. After the second iteration on the model, it was suggested to evaluate the highest-scoring person on each team together, to see how they compared with other high-performers in the company. So I created clusters to compare the best, mid-level and lowest performers in each team with each other. These scores seemed more accurate to me, since it allowed us to account for a lot more variables. For example, someone who had just started in the company last week wasn’t being compared to the CEO.
The fourth and final iteration was to try to account for my own bias in the clustering process. In order to do so, I created 1,000 random teams and calculated scores on these random, dynamic teams. Everyone was being compared to everyone, randomly at large scale, to account for whatever bias I might have had in clustering individuals. The second problem this helped me overcome was teams with just one person, who would get a 100% score and skew the results.
While this model has its merits, it still has its areas of opportunity. It is still too heavily focused on people who write code. But the model has improved with each iteration and it’s on the way towards being something useful for when the team is ready to roll it out to production.
Different color of boxplot represents the teams
Different color of boxplot represents the clusters
What I didn’t expect going into my internship at Pinpoint is that I’d come back to my studies so far ahead. Most of the classes I’m taking now are teaching concepts and techniques that I have already gained hands-on experience with from Pinpoint.
I also better understand the importance of getting accurate data, identifying and removing any data that will provide inappropriately skewed results and controlling for bias. Even in machine learning there are still a lot of decisions that are made by humans — in this case, how to cluster people by performance level — and it’s important to recognize that our own bias can creep in to our models.
Aside from the intense Texas heat, the biggest surprise during my time at Pinpoint was the diversity in peoples’ backgrounds. I’ve worked at a number of places, and I’m used to being around people who have a Master’s degree, or Doctorate, in their field of expertise. The first thing that stuck out to me at Pinpoint was that so many people had taken a less obvious road to their careers. For example, at the first retreat, I learned that our Head of UX, Matthew Congrove used to be a pilot. He wasn’t the only one with an interesting history. I felt like everyone had a non-traditional unusual backstory, which all helped bring different perspectives to how we build the product.
I only left Texas with one regret. I did not get out to explore as much as I would have liked. I really enjoyed the social scene in Austin. Back home in Canada, we generally only go out on Fridays, but it wasn’t uncommon to get dinner or drinks with my Pinpoint colleagues on a Wednesday. I think this speaks to the culture that Pinpoint and the Austin technology community has. While I only got to be a part of this dynamic environment for 3 months, the impact that Pinpoint has made on me personally and professionally will far outlive my stay in Austin.
Data Science Intern
This morning I had the opportunity to chat with software engineers and data scientists at the AI Dev World Conference on...