Back to blog
This weekMarch 9, 2026

Who's Watching the Vibe Coders?

securityaivibe-codingprompt-injectionsoftware-engineering

AI agents are powerful, neutral, and happy to do what an attacker tells them if the attacker gets their instructions in first. We built the door. We forgot the lock.

Last week I watched an AI agent quietly process a prompt injection attack buried inside an npm package README. The payload was crafted in three languages, obfuscated with multilingual noise, and written specifically to mimic the tool schema of the agent running it. The agent almost executed a shell command it had no business running. Almost.

I caught it because I happened to be looking.

That's the part that keeps me up at night.

We Built the Door. We Forgot the Lock.

The "build an app with a prompt" era didn't come with a security model. We got speed. We got accessibility. We got demos that look like magic. What we didn't get was a serious conversation about what happens when that magic runs unsupervised on real infrastructure, with real data, with real consequences.

Vibe coding, as a concept, isn't inherently bad. Getting ideas out of your head and into running code faster is genuinely useful. The problem is what gets skipped in the process. Traditional software development has decades of accumulated practice around input validation, least-privilege access, dependency auditing, and threat modeling. None of that transfers automatically to a world where you describe what you want and an agent figures out the rest. The agent doesn't know your threat model. It doesn't know which data is sensitive. It just executes.

The attack I described at the top is called a prompt injection. An attacker embeds instructions inside content that an AI agent is likely to fetch and process, like a README, a webpage, or a file. The agent reads it as part of its context and, if it isn't careful, treats those embedded instructions as legitimate commands. In my case, the payload used Armenian and Telugu script mixed with Chinese characters to evade content filters, then attempted to invoke shell_command with a Get-Date probe. Harmless on its own. A reconnaissance move. It was checking whether the injection worked, and whether the agent would execute arbitrary shell commands. That pattern is well-documented in security research. It's not theoretical. It's happening in the wild right now, and most people building with these tools have no idea.

The Regulation Gap Is Real and It's Widening

The Big Beautiful Bill delayed meaningful AI regulation. Meanwhile, the threat landscape isn't waiting. Russia, China, and North Korea have state-sponsored teams actively researching how to weaponize AI systems, including the ones your developers are using right now to build faster. The gap between "what AI can do" and "what governance exists to constrain how it's misused" is not closing. It's opening.

I'm not here to argue about the politics of AI regulation. That's a longer conversation and I genuinely don't think there are clean answers. What I will say is this: when regulation slows down and adoption speeds up, the burden of security shifts entirely to individual developers and organizations. And most of them aren't equipped to carry it. Not because they're incompetent. Because the tools they're using were built to make things easy, not safe. Those are different design goals.

State-sponsored threat actors are patient and methodical in a way that individual developers simply can't match. They're not trying to hack your app directly. They're seeding the supply chain. They're contributing to open source packages. They're writing documentation that contains injected payloads. They're creating convincing but malicious npm packages that sit one typo away from the real thing. And now they have a new vector: the AI agent that a developer trusted to "just handle it." If that agent fetches an external resource, reads a file, or processes any content it didn't generate itself, it's a potential attack surface. Most vibe-coded apps treat every piece of fetched content as trusted. That's a mistake with real consequences.

Your Data Is Downstream of Someone Else's Prompt

When you use an AI-assisted app, your data doesn't just interact with that app. It interacts with everything that app's AI touched while being built and while running. That includes every package, every fetched resource, every external API, and every model call the developer never thought to audit. You're not just trusting the developer. You're trusting their entire unchecked dependency chain.

This is where the "vibe coding" framing gets genuinely dangerous. An app built by someone who described what they wanted and iterated until it looked right has probably never had a single conversation about its attack surface. Not because the developer is negligent. Because the tools encourage speed and the culture rewards shipping. Security review is friction. Friction feels like the enemy of momentum. So it gets skipped. And then your data ends up inside an app that was built with a prompt, deployed with a click, and never once asked: what happens if something I trust is actually malicious?

Here's a concrete scenario. A small startup uses an AI coding agent to build a customer data management tool. The agent, in the course of building it, fetches a popular npm library that has been compromised via a supply chain attack. The agent processes the package, integrates it, and the app ships. The compromised package silently exfiltrates environment variables on startup. The startup's database credentials, API keys, and customer data are now in someone else's hands. Nobody saw it happen. The AI agent didn't flag it. The developer didn't audit it. The deployment pipeline didn't catch it. This isn't a hypothetical. Versions of this have already happened. They'll keep happening because the tools that make development fast don't automatically make it safe.

What We Can Actually Do

The answer isn't to stop using AI tools. The answer is to stop treating them like they're on your side by default. They're powerful and they're neutral. They'll do what you tell them, and they'll do what an attacker tells them if the attacker gets their instructions into the context first. Build like that's true.

There are practical steps that don't require a security team or a compliance budget. Run AI agents with the minimum tools they actually need for a given task. Intercept and review tool calls before they execute, especially shell commands and network requests. Scan fetched content for injection signatures before feeding it back to the model. Treat everything an agent reads from an external source as untrusted until proven otherwise. These aren't exotic practices. They're just discipline applied to a new context.

On the architectural side, the most durable approach is to make the agent's tool surface explicit and auditable. In Formicary, my own agent orchestration layer, workers signal intended actions before executing them. The queen can intercept, evaluate against policy, and quarantine suspicious signals. That's not just good security design. It's good systems design. You want observability into what your agents are doing whether or not someone is actively attacking them. Log every tool call with its origin, its arguments, and its result. Review those logs. Treat unexpected tool calls the same way you'd treat an unexpected network request in a traditional app: as a potential indicator of compromise. And if you're building something that other people's data flows through, tell them what AI touched it. They deserve to know.

The Deeper Question

We are building a world where the barrier to creating software is almost zero. That's genuinely exciting. It's also genuinely terrifying. Because the barrier to creating insecure software was already low. We just made it lower, faster, and more convincing.

I think about this a lot in the context of who gets hurt when things go wrong. It's not usually the developer who vibe-coded the app. It's the person who trusted the app with their information. The user who didn't know there was an agent involved. The small business that assumed "it's a real app" meant "someone thought about security." The accountability gap between "easy to build" and "safe to use" is a problem we need to name before we can solve it.

I don't have a clean answer for what the regulatory path looks like. I do think disclosure requirements are a reasonable starting point. If an AI agent touched the code or the runtime behavior of an app, users should know. If a model is making decisions about their data, they should have a right to ask what guardrails are in place. That's not anti-AI. That's just basic respect for the people downstream of what we build. The engineers and companies that take this seriously now, before something bad forces the conversation, are the ones who will still have user trust when the reckoning comes. And it's coming.


I caught that injection because I was paying attention. Most people aren't paying attention. Not because they don't care. Because the tools told them they didn't have to.

That's the lie we need to stop telling.

Share

Comments