Classic Psychology Test Exposes Critical AI Weakness: Accuracy Plummets from 90% to Near Zero as Task Complexity Increases
Researchers tested leading AI models using a classic psychology attention task and uncovered a significant flaw. While models performed well on short, simple lists, accuracy collapsed dramatically as task length and complexity increased — with some top systems falling from over 90% accuracy to near-complete failure.

Highlights
- Leading AI models scored over 90% accuracy on short attention tasks but collapsed to near-zero as task length and complexity increased.
- The study used a classic psychology attention test to benchmark AI performance, exposing fundamental limitations not captured by standard AI benchmarks.
- The accuracy collapse raises significant safety concerns for AI-dependent drone applications including autonomous flight, real-time obstacle avoidance, and BVLOS operations.
- Researchers say the findings reveal inherent constraints in large language models when processing information over extended or complex contexts.
- The results are expected to inform future AI architecture development, with a focus on improving sustained attention and robustness under complex conditions.
Classic Psychology Test Exposes Critical AI Weakness
Researchers have used a widely recognized psychology attention test to evaluate today's leading AI models, revealing a fundamental flaw in how these systems handle sustained cognitive demands.
Strong Performance on Short Lists, Catastrophic Collapse Under Complexity
The study found that AI models performed impressively when asked to complete color-naming tasks on short lists, identifying and responding correctly with ease. However, as task length increased and complexity grew, model performance deteriorated sharply. Some of the industry's leading AI systems saw their accuracy drop from over 90% to near-complete failure.
Attention Mechanisms Remain a Core Challenge for AI
The findings highlight a fundamental weakness in current AI models: despite strong results across many standard benchmarks, these systems struggle significantly with tasks requiring sustained attention and resistance to distraction. This carries important implications for applications that depend on AI for prolonged, high-complexity decision-making — including autonomous drone flight, real-time environmental perception, and obstacle avoidance.
The researchers noted that these results shed light on the inherent limitations of large language models in processing information over extended contexts, and offer a clear direction for improving future AI architectures.
原文來源: 查看原文
FAQ
Newsletter
Subscribe to our Low-Altitude Industry Newsletter
Daily curated news on low-altitude economy and drone industry, delivered to your inbox.


