Adversarial AI: Exposing LLM Weaknesses

grace

5 May 2025

•

5 min read

We previously discussed adversarial prompts and their role in manipulating AI outputs. But are you aware of Adversarial AI? While it sounds like something out of a futuristic cyberpunk novel, how realistic are these attacks? Are they just academic curiosities, or are they something security teams should actually worry about today?

Spoiler alert: They’re unfortunately very real.

Introducing Garak

Let’s put things into perspective. Garak, developed by NVIDIA, is an open-source LLM vulnerability scanner built specifically to test LLMs for weaknesses — from jailbreaks to prompt injections and beyond.

But here’s the twist: Garak itself is LLM-powered. So it isn’t a typical pen-test with humans testing a model — this is AI attacking AI.

How Does Garak Work?

Garak works by trying to trick LLMs in lots of different ways and then checking how they respond. Here are the steps:

Probes try different types of attacks.
Each probe is like a specific test. One might try to jailbreak the model, another might ask a sneaky question to bypass filters, and so on.
It tries each test a few times.
LLMs don’t always give the same answer. So Garak runs each probe multiple times to see how often the model slips up.
Detectors check the model’s replies.
These detectors look at the model’s answers and decide if the attack worked. If the model said something it shouldn’t — that’s called a “hit.”
If there’s a hit, it counts as a failure.
That means the model wasn’t able to defend itself against that kind of trick.
Garak creates a report at the end.
The report shows which tricks worked and how often they worked. It also gives an overview of where the model might need more protection.

In short: Garak throws a bunch of smart challenges at an AI model, watches closely, and helps you understand where the weak spots are.

One of Garak’s most powerful methods is called Tree of Attacks with Pruning. It’s essentially a smart, branching approach to attacks — exploring multiple directions, cutting off dead ends, and evolving successful attack paths. This makes the process fast and efficient, even on modest machines.

Image Credit: Tree of Attacks: Jailbreaking Black-Box LLMs Automatically

Demo of Garak

And the results? You can watch our video below, which showcases how the tool identifies vulnerabilities in LLMs. In just a short time on a modestly powered machine, Garak successfully jailbroke LLaMA 3.2:3B multiple times.

About CloudsineAI

Why CloudsineAI

Our Customers

Certifications & Awards

Careers

News

Secure Proprietary GenAI Applications

Secure Employee Usage of GenAI Tools

Secure Modern Web Applications

Protect Against Web Defacement Attacks

Card Skimming and Payment Page Monitoring

DNS and WHOIS Monitoring

Government

Banking and Financial Services

Internet Service Providers

Higher Education

Healthcare

GenAI Protector Plus

WebOrion® Monitor

CoSpaceGPT

Our Network

Enterprise Partner Program

Become a Partner

Whitepapers

Videos

Case Studies

Blogs

FAQs

Adversarial AI: Exposing LLM Weaknesses

Introducing Garak

How Does Garak Work?

Demo of Garak

Products

Use Cases

Industries

LLM Security Leaderboard

Partners

Resources

About Us