Imagine this: You’re in your first management role, leading a team of data scientists and ML engineers at a high-growth start-up. After months of building and experimenting, your team has finally shipped a new ML-driven product feature. This feature is improving user experiences and engagement. Until one morning, you wake up to a deluge of Slack notifications, support tickets, and frantic e-mails. Then, your team of highly skilled data scientists must spend the next 12 hours reading thousands of news articles before other users notice the impact of the failing NLP fact extraction pipeline. 

I still get chills when recalling these incidents, which happened more frequently than I’d like to admit during my time at Mattermark. So it may not surprise you that the first category I sought to invest in was ML performance management tools upon transitioning into VC. What is more surprising is that I did not make an investment in this category for over 5 years. 

When I first became an investor, I sought to validate that the challenges my team experienced when maintaining our models in production were shared by others. I talked to hundreds of ML practitioners all of whom had similar stories about the fire drills that occurred when they had to fix an issue associated with ML-driven applications – and the costs, in terms of dollars and credibility, that they had to bear as a result of those incidents. Many also shared that their efforts to pursue new projects or iterate on existing ML-driven products were completely handicapped by concerns about the reliability of their ML pipelines. Without tools to build and manage models in complex, dynamic environments and resolve failures and edge cases increased, they couldn’t innovate. 

Although my conviction in the need for tools to operate ML-driven applications intensified; I wasn’t able to find startups building products that aligned with user needs. Most startups were focused on evaluating model performance metrics like precision, recall, and F1 scores – but ML teams cared far more about understanding the impact of model performance on user experience (measured by engagement, retention, etc.). Most startups developed tools to facilitate debugging by evaluating feature importance – but these tools were worthless if you didn’t know when your ML-driven applications were failing their users. 

I knew what I wanted to invest in – but I just couldn’t find it…until Josh Tobin and Vicki Cheung decided to build Gantry. Josh and Vicki knew exactly what to build. They have stronger empathy for data scientists and ML engineers than almost any other founding team I’ve ever met. This shouldn’t be surprising since Josh founded the Full Stack Deep Learning training program after completing his PhD in ML at Berkeley and working as a research scientist at OpenAI. Vicki has developed tools to support ML teams at companies like DuoLingo and Lyft and designed OpenAI’s deep learning infrastructure as their first engineer. Through their experience in ML research, Josh and Vicki know what’s possible with AI. Through their experience in industry, they know why it’s far more challenging to build an ML-driven application than it is to train a Transformer on a benchmark dataset. They’re committed to narrowing this gap and unlocking a future where companies build better products and make better decisions with continuous learning systems. 

What they see so clearly is that most ML teams are not building models; they’re building applications powered by ML models. For this reason, they need tools to help them detect when the performance of ML-driven applications degrades. What’s more, they recognized that most teams must iterate on their models to maintain or exceed their performance goals. As such, it’s not enough to just highlight when changes in user behavior or the environment negatively impact the performance of ML-driven applications. ML teams need tools that will help them proactively find opportunities to improve their applications. 

With a platform like Gantry, ML teams can finally build great ML products quickly and reliably. They can ship ML-driven applications sooner – knowing that they can rapidly integrate user and other feedback that will drive performance gains. 

Additionally, Josh, Vicki, and the Gantry team know that ML teams often use several other tools including some designed for software engineering and DevOps. As such, they built Gantry to integrate seamlessly into both tech and ML stacks. With Gantry, ML teams need not change their workflows, behavior, or existing architectures. 

Put simply, Gantry is the tool I wish I had and that I hope to share with so many friends and former colleagues – which is why I’m thrilled to announce our seed and Series A investment in the company.