Guardrails for the slop era: our investment in Moonbounce

Content moderation has long been described as “the worst job in all of tech,” and with the rise of AI-generated content, it’s no longer contained to the largest social platforms; it’s now every app builder’s problem. Today we’re excited to announce our investment in Moonbounce, a real-time, multimodal policy engine, built by a team who understand this problem arguably better than anyone. In short, we believe Moonbounce has a shot to become the foundational governance layer for the AI era.

Content moderation: an age-old problem

While forum moderators have been writing rules about what you can’t put in your username for decades, large-scale trust and safety efforts have mostly been a problem for the biggest platforms on the internet. Facebook, Twitter, YouTube — companies with hundreds of millions of users generating content around the clock — are the ones who had to grapple with it seriously.

At that scale, you can throw massive human and infrastructure capital at the problem: hundreds of thousands of human reviewers, purpose-built machine learning models trained over months, and complex rule engines that flag content based on keyword matching and pattern detection. It worked (poorly, and at enormous cost) for a relatively small number of companies.

Even without the unpredictability of user-generated content in the mix, every application being built with AI today is also, to some degree, non-deterministic. You can’t have a customer support agent quoting unauthorized refund policies, or an image generation service producing explicit content from an innocuous prompt. These also aren’t edge cases you can simply handle with a keyword blocklist.

Every AI app now has a content moderation problem

The outputs of large language models and image and video generation systems based on diffusion models are probabilistic by nature: the same input produces meaningfully different outputs depending on context, prompt variations, and the underlying model’s behavior. What makes these systems so powerful is also what makes them difficult to govern.

Rule-based systems — “flag any message containing these keywords,” “block images matching this hash” — worked ok before the mobile and social web ushered in an era of unpredictable human-centered content generation. You can’t write a finite set of rules for an infinite probability distribution. When a new model version ships, the behavior may drift, and suddenly your filters stop catching what they should.

This is in theory a job for evals, but they don’t really work for high stakes use cases. The prompt / measure / evaluate / iterate loop will indeed help you improve the consistency of your outputs and avoid edge cases, but only edge cases that you find. What your users actually see is only as good as your last eval.

A problem that once required Facebook-scale resources to take seriously is now something every company building with AI has to think about:

If you’re an image generation service, you need to prevent explicit content from appearing when users ask for something innocuous
A customer support agent must stay within the bounds of what a company has approved it to say
An enterprise software product needs its embedded AI to adhere to specific compliance rules

The trajectory of virtually every software product being built right now runs directly through this challenge: the control flow of entire apps (and businesses) are mediated by a stochastic machine.

Few people have lived closer to this problem than Brett Levenson. I first met him at AWS re:Invent in 2018, when he was working on AI and infrastructure at Apple’s Special Projects Group. Not long after, Brett got a call from Zuck personally to become the technical lead on all of Meta’s “business integrity” efforts — fraud and abuse, content moderation, misinformation.

By the time he left, he was a distinguished engineer with 500 people reporting to him — and a belief that he’d figured out how to solve content moderation at scale in a way that simply wasn’t going to happen from inside a company like Meta. So he called us, said he wanted to start a company, and began building what would become Moonbounce with co-founder Ash Bhardwaj. Ash brings to the company a unique background in infrastructure engineering at Apple, and 10+ years in audio hardware for Meyer Sound.

Enter Moonbounce: a policy engine for stochastic systems

Moonbounce is, at its core, a real-time, multimodal policy engine. It can ingest any piece of media — text, image, video, voice — understand it in context, and act on it according to a set of defined policies, all in real-time. What distinguishes it from the generation of tools that preceded it comes down to three components:

Latency: Content moderation has historically been a batch process or something close to it — content goes in, a decision comes back after some delay. We can live with that when the question is whether a post should be surfaced in a feed. It’s a much harder constraint when the question is whether a message should be delivered in a real-time conversation, or whether a single frame in a video stream violates a policy. Moonbounce’s inference pipeline is designed to operate inline, intercepting content in real time without introducing the kind of lag that would degrade the user experience.

Flexibility: The traditional approach required months to train a new model to recognize a new category of harmful content. Moonbounce’s architecture, which fuses a human and machine-readable policy language with LLM-based content understanding, can deploy a new policy in minutes. As the landscape of what counts as harmful shifts, the ability to adapt quickly is the real technical advantage.

Breadth: Moonbounce also understands multimodal content at a level of contextual nuance that simple classifiers can’t match. The canonical example is an image containing a weapon: a still frame might look violent, but the surrounding context (Is it a historical photograph? A video game screenshot? Or an actual threat?) — changes the appropriate response entirely. Getting that right requires something closer to genuine content understanding than keyword matching or hash comparison.

That’s why we see a range of use cases in Moonbounce’s early customers: AI-native companies building image and video generation services, consumer platforms like Tinder that need to intercept inappropriate content in real-time chat, and enterprise software companies that need their AI-mediated workflows to stay within their business rules. Janitor is using Moonbounce to track the characters users create, as well as look for potentially "unacceptable" content (many policies) in real time as users actually chat with their AIs.

The problem is only getting harder

The scope of the problem is expanding faster than the tools available to address it. Deep fakes are becoming indistinguishable from real content, and agentic systems are taking actions with real-world consequences on behalf of users. The volume of AI-generated content on the internet is growing at a rate that makes human review economically untenable at any scale that matters.

Addressing a probabilistic content generation system requires an equally capable classification system on the other side. If you’re deploying AI that can produce an essentially unbounded range of outputs, you need infrastructure that can understand those outputs in real time and enforce the policies you’ve defined. Prompt engineering and static rule systems are not going to get you there. Put it this way: the only thing that can realistically keep up with AI-generated content is AI.

Moonbounce is building toward that infrastructure layer, and we think it becomes foundational for every company building AI-mediated products. We’re thrilled to be partnering with Brett and the Moonbounce team and leading their $12M raise. You can check out their playground, or get in touch with the team here.

Authors

Lenny Pruss

Rebecca Dodd

Editors

Acknowledgments