Spending on observability platforms has gotten out of control. It is often the number two cost behind a company’s main cloud vendor, representing up to 25% of the engineering team’s budget. I saw this firsthand when I was part of the engineering team at Coinbase, where I built the Site Reliability Engineering (SRE) function from scratch.
My time at Coinbase coincided with the height of crypto mania. The company was in hypergrowth, and I’d been charged with scaling the data-intensive application infrastructure for our crypto wallet to support dozens of different blockchains and hundreds of crypto assets. Coinbase had a high bar for security and performance, which ultimately meant that our spending on observability services became one of the biggest line items on my budget. At one point after I stepped back from managing our observability infrastructure, Coinbase’s annual DataDog costs surged to $65M.
While few if any other companies reach the level of spending that Coinbase did, our experience of overpaying for things that we didn’t need was far from unique. Why?
- The pricing model for many of these services is notoriously opaque. Between overage fees and penalties, complex bundles that force customers to pay for features they don’t actually need and spikes in costs from peak usage, it’s often difficult to predict what your costs are going to be on a monthly or annual basis.
- Most engineers don’t have the time to sift through all of the data generated by these platforms, or don’t understand which metrics are the most important in terms of understanding the health of their applications.
- Datadog, Splunk, New Relic and other platforms only diagnose issues without providing a solution, which often means hours of work from specialist database administrators and performance engineers to address any problems that were identified.
A new approach to performance and reliability
After my experience at Coinbase, I was convinced that there was a need in the market for something new – a tool that could boost performance reliably without crushing costs or manual troubleshooting. This is particularly true in today’s economic climate, when headcounts continue to shrink and companies are trying to manage their spending.
I spent many hours discussing this challenge with my co-founder and former Coinbase colleague, Harry Tormey. After months of validating our ideas by talking to customers and building prototypes, in December of 2023 we decided to leave our full-time jobs and start StepChange Labs. Our aim was simple: to help engineering teams at high-growth companies run faster and scale more reliably, combining our collective engineering experience and the latest in generative AI. Alongside a small team of engineers with decades of experience at companies like Apple, Facebook, Google and Coinbase, we started with a consulting offering that helped clients to optimize their database, application, and infrastructure.
In just under a year since founding, we’ve worked with companies ranging from early-stage startups to mature tech companies to achieve better performance and reliability with more predictable costs. And while we’re proud of these initial results, our aim was always to expand our offering with a turnkey SaaS product that leverages LLMs and would bring the StepChange team’s expertise to more companies via an automated tool. Today, we’re one step closer to realizing that vision with the launch of APX (Application Performance Xcellerator).
Introducing APX
APX is an AI-enabled monitoring platform which automatically detects, diagnoses, and fixes performance issues. With APX, teams and enterprises can focus on higher order development efforts, and also optimize observability costs and ongoing staff headcount budgets. It automatically detects and fixes performance issues like slow queries and bottlenecks with expert-driven optimizations and regression detection, removing the need for engineers to manually sift through performance dashboards. And most importantly, it helps optimize observability costs, cutting DataDog spend by up to 50 percent.
“As an early stage startup with a lean engineering team, StepChange has been invaluable for us. It’s like having a magic staff engineer in the cloud that keeps us on top of everything so that we can focus on scaling,” said Kyle Noble, Co-Founder of Atlas. “I sleep like a baby knowing StepChange is watching over us and we are excited for the ways APX will continue to improve the reliability of our systems.”
Announcing StepChange’s seed round
In addition to the launch of APX, I’m also proud to share that we have raised $4M in seed funding, which we’ll be using to invest in product development. The round was co-led by Kindred Ventures and The General Partnership, with participation from Ritual Capital and Liquid 2 Ventures. Kindred has backed startups that have scaled their systems globally such as Uber and Coinbase, developed new approaches to infrastructure such as Northflank and Anjuna, and is innovating at the cutting edge of AI with startups like Perplexity and Fal. The General Partnership is a new venture platform which brings developers and startups together for accelerated technical progress
“Nearly every founder who I speak to is trying to run leaner and do more with less. They want to keep their engineering teams focused on product development versus playing a game of constant whack-a-mole with performance issues while paying a lot of money to bloated observability platforms,” said Kindred Ventures Founder & Managing Partner Steve Jang. “We’re excited to back Niall and Harry on their mission to create a truly intelligent automated agent that makes observability services work well at higher performance and lower costs.”
"In his path from builder at TheGP to founder, Niall naturally saw an opportunity in infrastructure modernization that we were proud to support,” said Phin Barnes, Co-founder and Partner at The General Partnership. “Last year, over $400B was spent on system integration, and as data and software platforms grow in complexity, so does this spend. With StepChange, engineering teams don’t have to choose between app performance and sensible spending.”
What’s next for StepChange
I couldn’t be more excited about the future at StepChange. Everyday I see the impact that we’re having for our customers, and the launch of APX is poised to accelerate our mission even faster. The entire team can’t wait to get this tool in the hands of more high-growth engineering teams everywhere so they can scale their organizations to the next level. The stellar group of investors that we’ve attracted is just one more piece of evidence that our mission is resonating with others.
For more information or to try AXP, visit https://stepchange.work/products/apx.