IdeaData
Posts
Decoded: Deep Research by OpenAI vs DeepSeek R1 Search + Gemini Deep Research

Decoded: Deep Research by OpenAI vs DeepSeek R1 Search + Gemini Deep Research

AI research assistants are flexing their muscles, and the latest benchmarks show.

IdeaData
February 05, 2025

In partnership with

Did a friend send you this report? To get our next Decoded Newsletter, sign up here.

This week in tech, AI research assistants are flexing their muscles, and the latest benchmarks show that OpenAI's newest Deep Research agent is both impressive and flawed. Will AI truly replace human researchers anytime soon? We dive into the data to find out.

OpenAI’s Deep Research Agent: Smarter but still missing common sense
The rise of AI-powered research—how it compares to human experts
Cost vs. Performance: Is premium AI research worth the price?

Let’s get into the details!

AI Research Assistants Are Becoming More Capable

OpenAI's Deep Research agent, powered by GPT-3.5-turbo, is showing impressive gains in specialized research tasks. It now achieves a 72-73% accuracy rate in benchmark tests, a massive leap from GPT-4’s 15% performance just nine months ago. However, it still falls short of human experts, who maintain a 92% accuracy rate.

✅ Key Data Points:

Guia Assistant Benchmark: 89% accuracy on basic research tasks, but only 67% on complex regulatory analysis.
Code Competitions: AI ranks in the 90th percentile, while humans dominate the 99th percentile.
Newsletter Analysis: OpenAI’s agent successfully identified high-impact articles at 100% accuracy, outperforming competitors.

📌 Meta Trend: AI’s rapid progression in research automation is part of a broader trend in knowledge work automation.

🔍 Learn More →

The future of presentations, powered by AI

Gamma is a modern alternative to slides, powered by AI. Create beautiful and engaging presentations in minutes. Try it free today.

The Cost of AI Research—Does It Pay Off?

Source: ARK Invest

AI research tools are powerful but costly. OpenAI’s Deep Research Pro costs $200/month for 100 queries, significantly more than competitors like DeepSeek R1 (free) and Google’s Gemini Advanced ($20/month).

✅ Key Data Points:

DeepSeek R1 struggles with reliability, achieving only 15% accuracy.
Gemini Advanced trails with a 29% accuracy rate, returning false negatives in 100% of newsletter queries.
OpenAI leads in accuracy, but at a steep price.

📌 Meta Trend: The AI arms race is driving up costs, making premium research tools less accessible to smaller businesses and independent researchers.

💡 Get the Report →

AI Struggles with Common Sense and Spatial Reasoning

While AI models excel in structured research tasks, they still struggle with real-world reasoning. OpenAI’s Deep Research agent failed 100% of novel physical simulation problems and had a 42% error rate in mechanical advantage calculations.

✅ Key Data Points:

AI correctly identified 18/20 drug interaction scenarios, showing promise in medical applications.
AI-generated GDPR-compliant legal documents for three EU jurisdictions.
0% success rate in spatial reasoning tests.

📌 Meta Trend: AI’s struggle with physical and common sense reasoning suggests long-term limitations in its ability to replace human intuition.

🔍 Read More →

Operator Wisdom

❝

“These kinds of tools increase my personal capacity so that I can use my time doing other research tasks.”

Reem Anchassi, Bain & Company

~IdeaData A.I. Research Team

What did you think of this week's Decoded newsletter?

Meh Kinda Wow

Loving Decoded? Make our day and forward this to a friend.

Building something cool? Get your business featured on Decoded

Decoded: Deep Research by OpenAI vs DeepSeek R1 Search + Gemini Deep Research

AI research assistants are flexing their muscles, and the latest benchmarks show.

In this week's newsletter...

AI Research Assistants Are Becoming More Capable

The future of presentations, powered by AI

The Cost of AI Research—Does It Pay Off?

AI Struggles with Common Sense and Spatial Reasoning

Operator Wisdom

~IdeaData A.I. Research Team

What did you think of this week's Decoded newsletter?

Meh Kinda Wow