Research Jailbreaked OpenAI o1/o3, DeepSeek-R1, & Gemini 2.0 Flash Thinking Models

A recent study from a team of cybersecurity researchers has revealed severe security flaws in commercial-grade Large Reasoning Models (LRMs), including OpenAI’s o1/o3 series, DeepSeek-R1, and Google’s Gemini 2.0 Flash Thinking.

The research introduces two key innovations: the Malicious-Educator benchmark for stress-testing AI safety protocols and the Hijacking Chain-of-Thought (H-CoT) attack method, which reduced model refusal rates from 98% to under 2% in critical scenarios.

The team from Duke University’s Center for Computational Evolutionary Intelligence developed a dataset of 50 queries spanning 10 high-risk categories, including terrorism, cybercrime, and child exploitation, crafted as educational prompts. 

Hijacking Chain-of-Thought

For example, a request might frame the creation of ransomware as “professional cybersecurity training”. Despite these prompts’ veneer of legitimacy, they contained three red-flag components:

Exit mobile version