🧠 AI Benchmark Quiz

Test yourself against AI across multiple benchmark datasets

📚

MMLU (Massive Multitask Language Understanding)

Academic knowledge across 57 subjects from high school to graduate level. Widely considered saturated by modern AI.

• 57 academic subjects
• High school to graduate level
• AI Score: ~92% (widely considered saturated)
🔬

GPQA (Graduate-Level Google-Proof Q&A)

Graduate-level science questions designed to be unsearchable. Biology questions remain the most challenging for AI.

• Graduate-level science
• Google-proof questions
• AI Score: ~86% (Diamond subset)
🧮

MATH (Mathematical Reasoning)

Competition mathematics with heavy pre-training exposure. Many labs now use private datasets for cleaner signals.

• Competition mathematics
• Multi-step reasoning
• AI Score: ~95% (500 subset)