The Terminology Problem
"Reasoning" models don't reason like humans—they generate more internal tokens before responding. "Flash" models aren't dumb—they're optimized for efficiency. The names are marketing, not descriptions.
A Better Mental Model
System 1 (fast, intuitive): Quick pattern matching, immediate responses. Good for most tasks where the answer is recognizable.
System 2 (slow, deliberate): Step-by-step processing, explores multiple paths. Better for novel problems requiring multi-step logic.
— Borrowed from Kahneman's "Thinking, Fast and Slow"
The Key Insight
Neither mode is "smarter." They're different processing strategies optimized for different tasks. Using System 2 for everything wastes resources; using System 1 for complex logic produces errors.
Click a mode to explore details →
Direct Response
"Flash" / Standard
Extended Processing
"Reasoning" / Thinking
Select a mode to see
when and why to use it
Why Choose Direct Response
Most queries don't need extended deliberation. The model has seen similar patterns millions of times—it can recognize the right response without "thinking it through."
Cost: Lower token usage = lower cost
Speed: Faster time-to-response
When: Summaries, translations, standard coding tasks, Q&A, content generation
Why Choose Extended Processing
Some problems benefit from exploring multiple approaches before committing. The model generates internal "scratch work"—testing paths, catching errors, refining answers.
Cost: Higher token usage (thinking tokens count)
Speed: Slower, sometimes significantly
When: Multi-step math, complex logic, novel problems, planning, code architecture