The Quiet Crisis in QA: More Code, Same Old Problems

18 Nov 2025

Most have noticed the explosion of AI-driven software development. Engineers are planning, writing, and shipping code faster than ever. But, if AI means more code, shouldn’t it also mean more testing? And if there’s more testing, where are all the new breakthrough quality assurance (QA) companies?

What’s surprising is there aren’t many. Even A16Z’s “The Trillion Dollar AI Software Development Stack” barely touches on the advances in QA and lists no companies in that space. This gap isn’t because people aren’t trying, it’s because they are finding out how hard automating QA is.

The discovery

I realized this after starting the Antler accelerator. I joined building Trailway.ai, an AI agent that runs through your feature before release, finding design problems and bugs. While outlining my QA approach from past roles, I found it easy to write down obvious things like broken links and buttons. It was a lot harder to define what “quality” and “good” looked like. There are many ways issues appear, and all I could say was “I know it when I see it”.

Antler founders were excited by the idea of AI finding their bugs. Yet, they too struggled to say what they needed. Speaking with more people at different sized companies, I realized they too had a problem but had trouble articulating their needs. It was the early sign of a tarpit idea.¹ So, I went to work better defining the problem to solve to help find a solution.

How QA Plays Out in the Real World

This is a simplified view, but it’s helpful to start here. QA exists to make sure your app works the way it’s supposed to. Bugs are what happens when it doesn’t. So the job of QA is to find bugs before your users do.

Bugs aren’t just when something doesn’t work. It could work, but not look the way you meant. Sometimes it’s worse. It works the way you meant, but breaks something else in the process.

The most subtle bugs aren’t technical at all, they come from people misunderstanding each other. A feature works as built, but it doesn’t match what the people defining the experience actually wanted.

When you’re solo, most bugs are caught quickly because you test as you build. Add another person and misunderstandings start creeping in. What to build gets misread, conversations get forgotten, and assumptions drive decisions. The risk shifts from how you build to how you communicate. As teams grow, these challenges multiply. Bug-finding becomes even more about communication and coordination.

If all of this sounds complicated, that’s because it is. But that’s not what people want to hear when they are trying to solve their QA problem.

QA is hard

Folks are now used to Vibe-coding platforms like Loveable, that make it simple to build and deploy a web app quickly. Just say what you want and watch AI code your site for you. This expectation for simplicity shows up in QA as well. Let’s call it Vibe-QA’ing.

But as most people learn after using vibe-coding tools, software development is hard work and these tools don’t solve that. They’re great for prototypes or simple sites, but struggle as your app inevitably grows.

It’s safe to assume Vibe-QA products will face similar struggles, and maybe more due to the nebulous nature of QA. Even if these products work on small-scale examples, it begs the question: Who is their ideal customer?

QA is often an afterthought

Small teams don’t focus on QA. Their aim is growing the company by solving more and more problems for customers, not delivering bug-free software. When needed, they focus on testing critical user paths only, catching bugs quickly, and skipping automation.

Larger teams have bigger stakes. Each engineer only knows a small slice of the product, so bugs spread wider and get more expensive. With paying customers, contractual agreements, and brand reputation on the line, structured QA and automation become the only way to keep quality under control.

It’s often why companies finally decide to prioritize QA. It could be that their business stakeholders are tired of manual testing all the time. Maybe their difficult release process burns out the engineers. Or, it could be a major issue that causes revenue impact causing testing to be a top-down mandate. Whatever the reason, now the company has an uphill battle to “do QA”. Fortunately for them, this has been the pattern for a while, so there are a lot of tools that help.

Big players solve most things

To no surprise, companies have been working on solutions to improve testing for 10+ years. Some have close to half a billion in revenue. These tools like SmartBear, Tricentis, and Browserstack are comprehensive testing suites.

That doesn’t mean they work perfectly. It’s just, there is limited room for re-inventing software testing in a breakthrough way.

Most smaller players are re-packaging what the big players already solve

Every smaller player I’ve seen is tackling a different piece of the problem. But, almost every one of those pieces is already owned by the big players.

Some go after test case management. They help teams organize, reuse, and maintain their tests. These tools are about keeping what already works from breaking. It’s a stable market with old tools like TestRails dominitating and newer upstarts like Qase.io finding room around the edges.

BrowserStack Test Case Management Marketing

Others focus on visual testing. They catch unexpected visual changes before they slip through to users. This is done by comparing screens between versions to see if anything looks different. It’s easy to use and surprisingly effective. Meticulous.ai and Chromatic.dev are the main ones here.

BrowserStack Visual Testing Marketing Nov 2025

Then there’s bug reporting, for manual testers. These tools make it easier for people to share what’s wrong with all the helpful information an engineer needs to fix it. jam.dev and marker.io are the main ones here.

Jam.dev Marketing

The biggest wave, though, is record-and-playback automation. These tools provide a visual builder that watches your actions in the browser, turns them into a repeatable flow, and lets you add simple checks along the way so the system knows whether each step is working. The result is a set of flow-based tests that sit on top of code-based tests but are far easier to create and maintain. They help you catch regressions in areas of the app that used to work. With LLMs involved, the automation gets even smarter, figuring out what to click even when the UI changes slightly. Startups like RainforestQA and QA Wolf are going after this, along with a wave of YC companies like Momentic, Spur, Stably, Docket, Qualgent, and others.

RainforestQA Marketing

Most of these companies have “AI” mentioned big and bold on their landing pages. But so do the big players.

The new wave of QA companies

A few places are working on novel approaches to QA. They aim to test the whole app with minimal work. They fall in the gray zone of “Vibe-QA”.

One approach is using browser agents. These tools are using exciting new breakthroughs in AI models that can use your browser, like a real person.² They explore your web app trying to break things. One of the few places fully focused on this is Propolis. It’s like the “monte carlo simulation” of testing, hoping one of their agents will find an issue.

Propolis Demo Video

Another approach is session replay. These tools capture real users’ sessions directly from your live site and automatically replay those sessions on both the old and new version of your app to spot any new errors.³ Companies like meticulous.ai and older open source tools like GoReplay are approaching testing this way.

It’s exciting to see new approaches appear. We’ll have to see if they fundamentally change QA. But, either way, it’s clear that choosing and using these tools is complex and becoming its own problem.

It’s simpler to use Q-as-Service (QAaaS)

With all these tools and options, someone has to know how to pick and use the right ones. Usually, that means hiring a QA team. But finding people with the right expertise and then managing them isn’t easy.

So instead of building that expertise in-house, many teams outsource it to specialists. Take RainforestQA, for example. They sell a product, but many of their customer success stories mention the RainforestQA team doing much of the heavy lifting.⁴

QAWolf took a similar route: they started as an open-sourcing testing framework then pivoted to a closed-source services company. Now, you pay them for services, while their in-house QA team uses their platform to do said services.⁵

These companies end up feeling Palantir-esque: they build complex but powerful software, then wrap it in consulting and sell the full package.⁶ It’s not necessarily a bad thing. These teams can move faster and know the domain inside-out. But it does tilt the balance of power. The closer these companies get to a customer’s workflow, the harder it becomes for that customer to leave for better alternatives.

What’s exciting

What excites me the most isn’t breakthrough new technologies that re-invent testing. It’s creating exceptional user experience with the solutions we’ve already seen.

A great example of this is Jam.dev, a bug reporting tool that came seemingly out of nowhere. To their credit, they have hustled through thirteen launches and pivots over five years to achieve success. They don’t even tackle “QA” directly; instead, they focus on making bug reporting effortless for anyone on a team. In doing so, they’ve managed to unbundle a small slice of what the big players offer, turn it into a delightful user experience, and expand their audience beyond engineers to include roles like customer success.

It’s a reminder there’s still room for creativity. You don’t need transformative technology, nor rebuilding the whole QA stack to create new value. Sometimes, the win is making one overlooked piece of what exists truly great.

What the future holds

The established players in the QA space will keep doing what they do best: adding AI features that make their tools incrementally better. Think auto-generated test case descriptions, self-healing tests that adapt when there are small ui changes, and so on. They’re focused on making QA teams faster at repetitive stuff, rather than groundbreaking technological changes.

For new companies trying to break in, it continues to be an uphill battle. They’re not just competing against big players. They’re also competing against each other in an increasingly crowded space. Incumbents have a real advantage here: they bundle everything into one product, and since QA isn’t a core differentiator for most businesses, companies usually just want whatever’s easier in the long run. That typically means sticking with the established player.

On top of that, most of these new entrants offer nothing truly differentiated or hard to copy. So, capturing significant market share is really hard.

Some of these new players promise that AI will test everything, which sounds great. But, there’s a ceiling they’ll likely hit around intent. That is, knowing what the product should do, edge cases that matter, user expectations, and more. AI can scrape clues from scattered documents, chats, design files, notes. But it can’t pull insights from hallway conversations, calls, past decisions, or the knowledge people never bothers to put down. These tools might get good at flagging potential problems, but they won’t replace human judgment.

This creates an interesting problem for these tools: they end up being additive to existing workflows rather than replacing them. They add another thing to check, another tool to integrate, rather than making what people already do dramatically faster. And additive is a much harder sell than “this replaces hours of manual work.”

That said, we’ll see real productivity gains. Testing cycles will speed up as agents propose fixes for issues found. The dumb stuff, like broken buttons, will get caught earlier saving everyone time. Bug triaging will move faster because the right debugging data will be captured automatically. And QA teams will be able to push coverage much further by using record-and-playback tools and coding agents to generate test code.

So where does this leave us? I think the perception that QA is falling behind as teams ship faster is a bit misleading. QA tools are getting better too. It’s just less explosive growth. We’re seeing incremental progress rather than a revolutionary breakthrough. Maybe we haven’t yet seen the breakthrough that fundamentally changes QA. It’s just not the kind of thing that makes for great headlines.

Tarpit ideas is a term coined by YC for ideas they see often, get a lot of early positive signals, but are so complex they kill the startup. The name is based on tar pits that looked like water to dinosaurs. When the dinosaurs went to drink at them, they got stuck, and ultimately died. ↩
Gemini Computer Use, Claude Computer Use, OpenAI’s Computer Using Agent, and Amazon Nova Act. ↩
This is the re-packaging of a classic testing strategy called dark traffic testing or shadow deployment, where a new version of a service or feature runs in parallel with the current production version to see how it handles real-world live user traffic. ↩
My guess is services used are more common than we can see from their site. ↩
It looked like you could use their product directly, but clear from their marketing materials… ↩
These tools are often a mix of the individual tools mentioned earlier but built in-house. Often the record-and-playback automation that generates playwright code. ↩