• Vulnerable U
  • Posts
  • Moltbot, Prompt Injection, and the New Attack Surface of Always‐On AI Agents

Moltbot, Prompt Injection, and the New Attack Surface of Always‐On AI Agents

Moltbot, the artist formerly known as Clawdbot, is the most viral AI agent I’ve seen since ChatGPT dropped. In a matter of days, the community has rushed to install it on local machines, Mac minis, and VPS instances, wiring it into every part of their digital lives. Under the hood it’s powered by models like Claude, but the real appeal isn’t just the model — it’s the architecture and behavior of the agent itself.

Moltbot runs locally and has what feels like an almost photographic memory. It remembers long histories of interaction, including API keys, OAuth tokens, bot tokens, and all kinds of secrets that users eagerly feed it so it can automate more and more of their workflows. People are granting it access to:

  • Email

  • Local file systems

  • SaaS tools (via APIs and tokens)

  • Browsers with cookies and active sessions

  • Messaging platforms (Telegram, WhatsApp, Signal, iMessage) that can send messages on their behalf

On top of that, it’s proactive. A simple heartbeat mechanism wakes it every 30 minutes (or faster if you tweak it) so it can “wake up” and start doing things for you. That’s why people are calling it an “employee” instead of just a chatbot.

This is where the core security issue comes in: prompt injection. Any AI agent that reads untrusted input from the internet, documents, or social media and is allowed to act on your behalf is a massive attack surface. An attacker can leave “breadcrumbs” in tweets, blog posts, or web pages that effectively instruct your agent to exfiltrate secrets, change configurations, or impersonate you. We have no robust, generally accepted solution to prompt injection yet—OpenAI has openly said as much—yet we’re putting these always-on, highly privileged agents directly in the blast radius.

At the same time, a lot of the discourse around Moltbot has been muddled. There have been screenshots of SHODAN “finding thousands of exposed Moltbot UIs,” when in reality the number of truly exposed admin interfaces appears to be in the dozens, not thousands. Headlines about “512 vulnerabilities and eight criticals” have largely been driven by automated scanners flagging design decisions—like storing OAuth tokens in plain text—as “critical” without context. Those tradeoffs may still be bad security practice, but they’re not necessarily zero-day-style shocks. The real risk is in how people deploy and extend the tool.

Real Threats vs. FUD

From my point of view, there are three real problem areas: misconfiguration, skills, and the surrounding malware ecosystem.

First, misconfiguration and exposure. Moltbot’s admin UI is an extremely powerful control surface. On the few machines where that interface was accidentally exposed to the public internet, the situation was legitimately awful, because Moltbot stores secrets in plain text on disk as part of its operating model. If someone can reach that admin surface, they’re basically inside your brain. But the right way to respond isn’t panic headlines—it’s better defaults, clearer docs, and guardrails that make insecure setups harder.

Second, skills are effectively arbitrary, trusted code. Skills started as a way to make AI agents more deterministic and repeatable, especially for coding and complex workflows. In Moltbot’s world, they’ve become an extension mechanism that can reach into services like Gmail, Google Drive, Docs, Sheets, and more. The security researcher who wrote the “What Would Elon Do?” skill demonstrated how fragile the trust model is: he created a marketing-heavy skill description, exploited a flaw in download counting to inflate its popularity, and hid the real logic in additional files inside the skill package. The code it executed could have exfiltrated codebases, SSH keys, and secrets with almost no friction; in his proof of concept, he stopped at a simple ping and a “you’ve been pwned” ASCII art. The point was clear: if you let your agent automatically fetch and install skills you’ve never read, you’re effectively running curl | bash on your brain.

Third, the broader ecosystem is already attracting real malware. We’re seeing fake VS Code extensions that ride on Moltbot’s popularity, pulling down legitimate remote management tools like ScreenConnect and using them as a stealthy way to gain control over machines. These aren’t theoretical risks; there are already dozens of such extensions from the same actor, some with dozens of installs. Add to that the confusion and hype around “info-stealer malware targeting AI agents,” and you get an environment where real threats are mixed with FUD. Some articles are purely speculative—pointing out that a compromised Moltbot box would be “juicy” for an attacker—but that doesn’t mean it’s already happening at scale.

Despite all this, my stance isn’t “never use Moltbot.” It’s that we don’t get to sit this out. Users—developers, founders, knowledge workers—are voting with their feet. They are buying hardware, wiring these agents into their workflows, and treating them as teammates. As security people, we can’t just be the “department of no.” We have to understand these tools deeply, acknowledge the risks honestly, and help design sane architectures and guardrails.

We Go from Here: Guardrails, Responsibility, and Engagement

So where do we go from here?

First, we treat prompt injection as the central design problem, not an afterthought. Any time you let an agent read untrusted content and then act on your behalf, you need to define strict boundaries: what it’s allowed to do, what data it can see, what channels it can speak through, and under what conditions. You should assume that at some point, the agent will read a prompt that did not come from you. Design as if that’s inevitable.

Second, we harden the defaults and the ecosystem:

  • Make it much harder (or impossible) to expose the admin UI publicly.

  • Add self-audit tools so Moltbot can check its own deployment for obviously insecure patterns.

  • Improve documentation so that safe deployment patterns—like isolating the bot in its own Telegram bot or workspace and only responding to your account—are clearly explained and easy to follow.

  • Fix and de-emphasize download counts as a proxy for trust in skill repositories; popularity is not a security signal.

  • Encourage users to treat third-party skills as they would any untrusted code: read them, sandbox them, and don’t let your agent install arbitrary packages sight unseen.

Third, we change how we, as a security community, respond. Dogpiling a solo open source maintainer with scanner output and then demanding rewards is counterproductive. We’ve already seen what happens when that behavior hits mature projects like curl: bug bounty fatigue, lower signal-to-noise, and eventually programs getting shut down. That makes the whole ecosystem less secure.

With Moltbot, the healthier pattern is: test it, threat-model it, file detailed issues, send PRs, and help make it safer. Acknowledge that yes, it’s risky—but so is nearly everything we do on the internet. Our job isn’t to pretend these tools don’t exist; it’s to meet users where they are and help them not shoot themselves in the foot.

If you’re going to use Moltbot or tools like it, do it with eyes open. Limit its permissions; isolate it from your crown jewels; be very careful with skills and extensions; and assume that prompt injection is not a hypothetical. This technology wave isn’t going away. The question is whether security people will participate and shape it—or ignore it until the incident reports land in their queue.