CloudsineAI LLM Security Leaderboard

Rank Model
CloudsineAI Attack Dataset (%)
Prompt Injection Security (%)
Jailbreak Security (%)
Output Safety (%)
Sensitive Information (%)
Benign Requests (%)
Overall Score
Overall Score (With Protector Plus)
1 GPT-4o 75.12 96.65 98.60 90.16 85.35 93.40 89.88
94.7 📊
2 GPT-5 81.33 94.08 96.93 88.86 84.68 93.17 89.84
96.2 📊
3 GPT-4.1 74.85 94.21 99.21 92.76 85.80 92.10 89.82
94.7 📊
4 Claude-Sonnet-4 83.51 98.33 94.74 89.00 82.66 88.04 89.38
94.7 📊
5 Qwen3:8b 68.91 94.59 94.82 74.38 79.15 90.77 83.77
95.2 📊
6 Gemini-2.5-Flash 65.90 71.69 79.12 59.91 63.15 91.34 71.85
92.7 📊
7 Grok-4 72.87 61.90 38.51 39.22 59.34 83.34 59.20
92.7 📊
8 Llama2-Uncensored 68.98 80.31 17.46 23.30 27.88 94.90 52.14
95.9 📊

About the CloudsineAI LLM Security Leaderboard

Explore the leading models in our CloudsineAI attack dataset, Prompt Injection Security, Jailbreak Security, LLM Output Safety and protection against sensitive information while still fulfilling benign requests.

Our leaderboard evaluates models across a comprehensive set of categories. Higher scores indicate not only increased robustness against adversarial prompts, but also a greater willingness to oblige to legitimate queries without unnecessary refusals. This dual assessment ensures that models are measured both for their security resilience and for their responsiveness in real-world applications.

Evaluation Methodology

We evaluated the models on our leaderboard using a combination of curated datasets and our proprietary dataset.

  • Dataset Composition:
    • Four curated datasets, each tailored to a specific attack vector.
    • Our proprietary attack dataset, crafted from a hacker’s perspective to simulate real-world adversarial tactics.
    • One benign dataset containing safe instructions or conversation inputs across different domains.
  • Evaluation Criteria: For each model, we analysed the proportion of outputs classified as harmful or safe under attack conditions.
    As for legitimate queries, the proportion of outputs classified as refusals or rejections is measured.
  • Testing with GenAI Protector Plus: We retested the models with GenAI Protector Plus protection to assess each model’s effectiveness in blocking harmful prompts and outputs while measuring the safety of the resulting output, without significantly compromising its capability of providing answers to legitimate queries.

Key Metrics

Prompt Injection Security

Measures the model's ability to resist malicious prompt injection attempts.

CloudsineAI's Dataset

Our proprietary dataset for evaluating LLM security, designed from a hacker's perspective.

Output Safety

Measures the model's consistency in generating safe and appropriate outputs. It covers harmful content, crime, and hate speech.

Jailbreak Security

Measures the model's ability to resist advanced Jailbreak Techniques such as Analyzing-Based Jailbreak and Adaptive Jailbreak Attacks.

Exposure of Sensitive Information

Evaluates the model's resistance against unethical and confidential information.

Response to Benign Requests

Evaluates the model's responsiveness to safe, non-adversarial prompts by determining how often it provides direct answers instead of issuing unwanted refusals.

Take the Next Step

Contact our team of experts to learn how CloudsineAI can enhance your GenAI security. Leave the security to us and focus on innovating.

contact us today

Fill out the form below, and we will be in touch shortly.