CloudsineAI LLM Security Leaderboard

Evaluate and compare the security and safety of leading AI models.
The higher the score, the more secure and reliably helpful the model is.

Sort by LLM Native Protection

Rank	Model	CloudsineAI Attack Dataset ℹ	Prompt Injection Security ℹ	Jailbreak Security ℹ	Output Safety ℹ	Sensitive Information ℹ	LLM Native Protection ℹ	Protector Plus Protection ℹ
1	Claude-3.5-Sonnet	86.9	97.3	100.0	99.4	99.9	93.0	97.9 📊
2	GPT-4o	72.4	98.3	100.0	95.4	99.4	85.3	94.6 📊
3	Llama3.3:70b	74.9	92.9	98.2	91.3	95.7	84.7	97.2 📊
4	grok-2-1212	72.8	92.8	99.8	86.1	97.4	83.4	97.1 📊
5	Llama3.2	82.6	78.8	92.8	80.2	79.1	82.7	97.5 📊
6	Gemini-2.0-flash	66.8	88.7	96.2	89.0	87.3	78.5	95.3 📊
7	GPT4o-mini	70.5	67.7	97.8	83.1	84.5	77.0	96.0 📊
8	DeepSeek-R1-Distill-Qwen-32B	54.2	68.0	48.6	39.8	44.0	52.1	95.0 📊
9	Llama2-Uncensored	65.5	73.7	11.5	10.3	17.1	41.9	96.1 📊
10	DeepSeek-R1-Distill-Llama-8B	43.8	52.9	5.2	7.5	9.0	31.3	91.5 📊

Rank	Model	CloudsineAI Attack Dataset ℹ	Prompt Injection Security ℹ	Jailbreak Security ℹ	Output Safety ℹ	Sensitive Information ℹ	LLM Native Protection ℹ	Protector Plus Protection ℹ
1	Claude-3.5-Sonnet	86.9	97.3	100.0	99.4	99.9	93.0	97.9 📊
2	Claude-3.7-Sonnet	85.6	94.6	99.2	94.8	97.5	91.1	98.1 📊
3	GPT-4o	72.4	98.3	100.0	95.4	99.4	85.3	94.6 📊
4	Llama3.3:70b	74.9	92.9	98.2	91.3	95.7	84.7	97.2 📊
5	grok-2-1212	72.8	92.8	99.8	86.1	97.4	83.4	97.1 📊
6	Llama3.2	82.6	78.8	92.8	80.2	79.1	82.7	97.5 📊
7	GPT-4.1	72.1	91.5	98.9	93.1	84.8	82.1	94.8 📊
8	Gemini-2.0-flash	66.8	88.7	96.2	89.0	87.3	78.5	95.3 📊
9	GPT4o-mini	70.5	67.7	97.8	83.1	84.5	77.0	96.0 📊
10	Grok-3	56.3	89.3	92.1	74.4	79.9	70.1	87.9 📊
11	Gemini-2.5-Flash	34.8	76.8	88.6	65.6	71.3	55.2	90.2 📊
12	DeepSeek-R1-Distill-Qwen-32B	54.2	68.0	48.6	39.8	44.0	52.1	95.0 📊
13	Llama2-Uncensored	65.5	73.7	11.5	10.3	17.1	41.9	96.1 📊
14	DeepSeek-R1-Distill-Llama-8B	43.8	52.9	5.2	7.5	9.0	31.3	91.5 📊


Rank	Model	CloudsineAI Attack Dataset (%) ℹ	Prompt Injection Security (%) ℹ	Jailbreak Security (%) ℹ	Output Safety (%) ℹ	Sensitive Information (%) ℹ	Benign Requests (%) ℹ	Overall Score ℹ	Overall Score (With Protector Plus) ℹ
1	GPT-4o	75.12	96.65	98.60	90.16	85.35	93.40	89.88	94.7 📊
2	GPT-5	81.33	94.08	96.93	88.86	84.68	93.17	89.84	96.2 📊
3	GPT-4.1	74.85	94.21	99.21	92.76	85.80	92.10	89.82	94.7 📊
4	Claude-Sonnet-4	83.51	98.33	94.74	89.00	82.66	88.04	89.38	94.7 📊
5	Qwen3:8b	68.91	94.59	94.82	74.38	79.15	90.77	83.77	95.2 📊
6	Gemini-2.5-Flash	65.90	71.69	79.12	59.91	63.15	91.34	71.85	92.7 📊
7	Grok-4	72.87	61.90	38.51	39.22	59.34	83.34	59.20	92.7 📊
8	Llama2-Uncensored	68.98	80.31	17.46	23.30	27.88	94.90	52.14	95.9 📊

PROTECTION TYPE	CLOUDSINEAI ATTACK DATASET	PROMPT INJECTION	JAILBREAK	OUTPUT SAFETY	SENSITIVE INFORMATION	BENIGN REQUESTS

About the CloudsineAI LLM Security Leaderboard

Explore the leading models in our CloudsineAI attack dataset, Prompt Injection Security, Jailbreak Security, LLM Output Safety and protection against sensitive information while still fulfilling benign requests.

Our leaderboard evaluates models across a comprehensive set of categories. Higher scores indicate not only increased robustness against adversarial prompts, but also a greater willingness to oblige to legitimate queries without unnecessary refusals. This dual assessment ensures that models are measured both for their security resilience and for their responsiveness in real-world applications.

Evaluation Methodology

We evaluated the models on our leaderboard using a combination of curated datasets and our proprietary dataset.

Dataset Composition:
- Four curated datasets, each tailored to a specific attack vector.
- Our proprietary attack dataset, crafted from a hacker’s perspective to simulate real-world adversarial tactics.
- One benign dataset containing safe instructions or conversation inputs across different domains.

Evaluation Criteria: For each model, we analysed the proportion of outputs classified as harmful or safe under attack conditions.
As for legitimate queries, the proportion of outputs classified as refusals or rejections is measured.
Testing with GenAI Protector Plus: We retested the models with GenAI Protector Plus protection to assess each model’s effectiveness in blocking harmful prompts and outputs while measuring the safety of the resulting output, without significantly compromising its capability of providing answers to legitimate queries.

Key Metrics

Prompt Injection Security

Measures the model's ability to resist malicious prompt injection attempts.

CloudsineAI's Dataset

Our proprietary dataset for evaluating LLM security, designed from a hacker's perspective.

Output Safety

Measures the model's consistency in generating safe and appropriate outputs. It covers harmful content, crime, and hate speech.

Jailbreak Security

Measures the model's ability to resist advanced Jailbreak Techniques such as Analyzing-Based Jailbreak and Adaptive Jailbreak Attacks.

Exposure of Sensitive Information

Evaluates the model's resistance against unethical and confidential information.

Response to Benign Requests

Evaluates the model's responsiveness to safe, non-adversarial prompts by determining how often it provides direct answers instead of issuing unwanted refusals.

Take the Next Step

Contact our team of experts to learn how CloudsineAI can enhance your GenAI security. Leave the security to us and focus on innovating.

About CloudsineAI

Why CloudsineAI

Our Customers

Certifications & Awards

Careers

News

Secure Proprietary GenAI Applications

Secure Employee Usage of GenAI Tools

Protect Against Web Defacement Attacks

Card Skimming and Payment Page Monitoring

DNS and WHOIS Monitoring

Government

Banking and Financial Services

Internet Service Providers

Higher Education

Healthcare

GenAI Protector Plus

WebOrion® Monitor

CoSpaceGPT

Our Network

Enterprise Partner Program

Become a Partner

Whitepapers

Videos

Case Studies

Blogs

FAQs