Research Jailbreaked OpenAI o1/o3, DeepSeek-R1, & Gemini 2.0 Flash Thinking Models

by Krupali
1 month ago

A recent study from a team of cybersecurity researchers has revealed severe security flaws in commercial-grade Large Reasoning Models (LRMs), including OpenAI’s o1/o3 series, DeepSeek-R1, and Google’s Gemini 2.0 Flash Thinking.

The research introduces two key innovations: the Malicious-Educator benchmark for stress-testing AI safety protocols and the Hijacking Chain-of-Thought (H-CoT) attack method, which reduced model refusal rates from 98% to under 2% in critical scenarios.

The team from Duke University’s Center for Computational Evolutionary Intelligence developed a dataset of 50 queries spanning 10 high-risk categories, including terrorism, cybercrime, and child exploitation, crafted as educational prompts.

Hijacking Chain-of-Thought

For example, a request might frame the creation of ransomware as “professional cybersecurity training”. Despite these prompts’ veneer of legitimacy, they contained three red-flag components:

Hijacking Chain-of-Thought

Related Content

CBI Should Collaborate With Startups To Tackle AI-Led Crimes: Vaishnaw

U.S.-Listed BTC Miners Tracked by JPMorgan Shed 25% of Their Market Cap in March

Lucid PhaaS Hits 169 Targets in 88 Countries Using iMessage and RCS Smishing

How to change clothes in Schedule 1

SpaceX launches Fram2, the first crewed spaceflight to explore Earth's polar regions

Apna Mart To Raise $25 Mn As 15-Minute Delivery Race Heats Up