I’m a creative, visually-oriented person, which is why I’m studying statistics with a focus on data visualization. When I saw an opportunity through the University of Waterloo’s internship office to work on data visualization with Pinpoint, I jumped at the opportunity. Not only did I want to work on projects that I thought would move my career forward and complement the work I’m doing in the classroom, but I also wanted to expand my professional experience outside of Canada. Pinpoint was a perfect fit. 

My fall internship with Pinpoint was focused on refining and perfecting our Performance Score metric. Pinpoint believes in this particular metric because engineering teams work differently (Agile, Kanban, different languages, repos etc.) and all have a different way to answer “how are we doing?” as a result. We want to help teams find a way to talk about performance with their peers and the rest of the business that is consistent across teams and methodologies, without having to change the way they work. This is where the data science modeling I worked on comes in.  

The Performance Score will help teams understand what makes them high-performing across a number of data points in order to improve processes and provide training for a more effective engineering organization, similar to what Sabermetrics did for the Houston Astros

It’s worth noting that Pinpoint’s Performance Score continues to be refined and is not yet in production. I worked on several iterations of a model using data from our own team, each one improving on the one before, and getting closer to something that can fairly and accurately measure overall performance of an engineering organization. 

Visualizing Performance Score 

The main focus of my internship was to improve the way we calculate the Performance Score by testing various data science models. 

My first takeaway: The final result depends on what you’re looking for. 

Iteration One: Lines of Code

When I first started working on Performance Score, it was mostly based on the lines of code written. More lines meant a higher score. I quickly realized that this meant that people whose jobs involve writing lots code will always score higher. In other words, people who do maintenance, specialize in debugging, review code — these people would always have a lower score simply because the type of work they do is not purely focused on writing code. A manager, for example, would get a low score if we only looked at lines of code written. 

Iteration 1

The metrics we looked at included Code Ownership, Average Late Days, Cycle Time, New Feature Percent, Rework Percent, Traceability, Total Changes (Changes per commit X Number of Commits) and Issues Completed. Even though that’s more nuanced than just the number of lines of code written, it’s still heavily skewed towards people whose roles involve writing code rather than reviewing code or making strategic decisions. 

Iteration 2: Teams

The key to any successful metric is to make sure you’re comparing apples to apples. In the next iteration, we started comparing people not across entire companies, but only to members of the same team to account for differing responsibilities on different teams. 

Iteration 2 - Teams

Iteration 3: Performance Clusters

After every iteration, I would talk with the rest of the data science team and get feedback about what I could do to improve the model further. After the second iteration on the model, it was suggested  to evaluate the highest-scoring person on each team together, to see how they compared with other high-performers in the company. So I created clusters to compare the best, mid-level and lowest performers in each team with each other. These scores seemed more accurate to me, since it allowed us to account for a lot more variables. For example, someone who had just started in the company last week wasn’t being compared to the CEO. 

Iteration 2 - Performance Clusters

Clusters

Iteration 4: Removing Bias

The fourth and final iteration was to try to account for my own bias in the clustering process. In order to do so, I created 1,000 random teams and calculated scores on these random, dynamic teams. Everyone was being compared to everyone, randomly at large scale, to account for whatever bias I might have had in clustering individuals. The second problem this helped me overcome was teams with just one person, who would get a 100% score and skew the results.

Iteration 4 - Removing Bias

Iteration 5: Overall Score

While this model has its merits, it still has its areas of opportunity. It is still too heavily focused on people who write code. But the model has improved with each iteration and it’s on the way towards being something useful for when the team is ready to roll it out to production.

Final Iteration

Overall Score

Different color of boxplot represents the teams

Boxplot representation of teams

Different color of boxplot represents the clusters

What I’ve Learned

What I didn’t expect going into my internship at Pinpoint is that I’d come back to my studies so far ahead. Most of the classes I’m taking now are teaching concepts and techniques that I have already gained hands-on experience with from Pinpoint. 

I also better understand the importance of getting accurate data, identifying and removing any data that will provide inappropriately skewed results and controlling for bias. Even in machine learning there are still a lot of decisions that are made by humans — in this case, how to cluster people by performance level — and it’s important to recognize that our own bias can creep in to our models. 

Some Surprises

Aside from the intense Texas heat, the biggest surprise during my time at Pinpoint was the diversity in peoples’ backgrounds. I’ve worked at a number of places, and I’m used to being around people who have a Master’s degree, or Doctorate, in their field of expertise. The first thing that stuck out to me at Pinpoint was that so many people had taken a less obvious road to their careers. For example, at the first retreat, I learned that our Head of UX, Matthew Congrove used to be a pilot. He wasn’t the only one with an interesting history. I felt like everyone had a non-traditional  unusual backstory, which all helped bring different perspectives to how we build the product. 

I only left Texas with one regret. I did not get out to explore as much as I would have liked. I really enjoyed the social scene in Austin. Back home in Canada, we generally only go out on Fridays, but it wasn’t uncommon to get dinner or drinks with my Pinpoint colleagues on a Wednesday. I think this speaks to the culture that Pinpoint and the Austin technology community has. While I only got to be a part of this dynamic environment for 3 months, the impact that Pinpoint has made on me personally and professionally will far outlive my stay in Austin.


Get the data science behind high-performance teams.