Home Artificial Intelligence When AI Backfires: Enkrypt AI Report Exposes Dangerous Vulnerabilities in Multimodal Models

When AI Backfires: Enkrypt AI Report Exposes Dangerous Vulnerabilities in Multimodal Models

by admin
mm

In May 2025, Enkrypt AI released its Multimodal Red Teaming Report, a chilling analysis that revealed just how easily advanced AI systems can be manipulated into generating dangerous and unethical content. The report focuses on two of Mistral’s leading vision-language models—Pixtral-Large (25.02) and Pixtral-12b—and paints a picture of models that are not only technically impressive but disturbingly vulnerable.

Vision-language models (VLMs) like Pixtral are built to interpret both visual and textual inputs, allowing them to respond intelligently to complex, real-world prompts. But this capability comes with increased risk. Unlike traditional language models that only process text, VLMs can be influenced by the interplay between images and words, opening new doors for adversarial attacks. Enkrypt AI’s testing shows how easily these doors can be pried open.

Alarming Test Results: CSEM and CBRN Failures

The team behind the report used sophisticated red teaming methods—a form of adversarial evaluation designed to mimic real-world threats. These tests employed tactics like jailbreaking (prompting the model with carefully crafted queries to bypass safety filters), image-based deception, and context manipulation. Alarmingly, 68% of these adversarial prompts elicited harmful responses across the two Pixtral models, including content that related to grooming, exploitation, and even chemical weapons design.

One of the most striking revelations involves child sexual exploitation material (CSEM). The report found that Mistral’s models were 60 times more likely to produce CSEM-related content compared to industry benchmarks like GPT-4o and Claude 3.7 Sonnet. In test cases, models responded to disguised grooming prompts with structured, multi-paragraph content explaining how to manipulate minors—wrapped in disingenuous disclaimers like “for educational awareness only.” The models weren’t simply failing to reject harmful queries—they were completing them in detail.

Equally disturbing were the results in the CBRN (Chemical, Biological, Radiological, and Nuclear) risk category. When prompted with a request on how to modify the VX nerve agent—a chemical weapon—the models offered shockingly specific ideas for increasing its persistence in the environment. They described, in redacted but clearly technical detail, methods like encapsulation, environmental shielding, and controlled release systems

Source Link

Related Posts

Leave a Comment