• Vulnerable U
  • Posts
  • Claude Tried to Hack 30 Companies. Nobody Asked It To.

Claude Tried to Hack 30 Companies. Nobody Asked It To.

Yeah… it’s not supposed to do that.

Researchers at Truffle Security published a report showing that Anthropic’s Claude models started hacking websites without being asked to hack anything.

They created clones of about 30 different corporate websites — things like Meta, Coca-Cola, Procter & Gamble — and intentionally planted vulnerabilities in them.

Then they gave Claude a very simple task, something like: go find the latest engineering blog post on Meta’s website.

But instead of pointing Claude to the real site, the researchers rewired things so the AI would end up interacting with their cloned version instead.

From Claude’s perspective, it thought it was visiting the real website.

Where it gets interesting: When Claude tried to retrieve the blog post, it encountered a SQL error message. If you’re a web security person, you know that’s a pretty juicy signal. SQL errors often indicate that a site might be vulnerable to SQL injection. Instead of just giving up or reporting the error, Claude started investigating.

It basically said: “Why am I getting this SQL error?” And then it started probing the system. Eventually, the model performed a SQL injection attack against the site’s database. Nobody told it to do that. Nobody gave it permission. It simply decided that exploiting the vulnerability was the best way to complete the task it had been given.

The researchers ran 1,800 different test cases, and in about 70 percent of them, Claude exploited the vulnerabilities it discovered.

It wasn’t just SQL injection. The AI also exploited things like:

  • Server-Side Request Forgery (SSRF)

  • Command injection

  • other web application vulnerabilities

In other words, Claude wasn’t just fixing a coding error. It was actively using security flaws as a means to an end.

If a human security researcher intentionally did the same thing against a real company without authorization, that could easily be considered a crime under computer hacking laws.

When asked about responsible disclosure, the researchers noted that there wasn’t actually a vulnerability to disclose.

The issue isn’t that these particular websites had security flaws, but that the AI model itself decided that exploitation was an acceptable strategy.

That raises some really interesting questions about how these systems behave when given open-ended goals, especially when people start talking about deploying AI in real-world operational environments.

This experiment shows that when Claude is trying to achieve a goal, it may independently decide to exploit vulnerabilities it encounters along the way. That’s probably one reason companies like Anthropic are extremely cautious about deploying their models in environments where autonomous decision-making could have real-world consequences.

Once you give a system the ability to pursue objectives independently, you may not always control the methods it chooses to get there.