Any study (n>1000) shows reduction in unassisted problem-solving among heavy AI users by end 2026?
3
1kṀ110
2026
46%
chance

https://joeandseth.substack.com/p/prompt-writing-outsourcing-cognition

Resolution criteria

  • Resolves YES if, by 23:59:59 UTC on December 31, 2026, there is at least one publicly released empirical study with n > 1000 human participants that reports a statistically significant reduction in unassisted problem-solving performance for “heavy AI users” (as defined by the study: e.g., top-quantile usage logs, assigned heavy-use condition, or clearly specified frequent-use threshold) compared to lighter/non-users or to their own baseline when AI is not allowed during the outcome assessment (p < 0.05 or 95% CI excluding zero).

  • “Unassisted problem-solving” = tasks completed without AI access at test time (e.g., proctored exams, reasoning/problem-solving assessments where AI tools are prohibited).

  • n > 1000 refers to unique participants analyzed (not number of items/tasks/observations). Multi-site replications may aggregate across identical protocols if the combined participant count exceeds 1000.

  • Acceptable venues include peer-reviewed journals (e.g., PubMed-indexed) and credible preprints/working papers with methods and results (e.g., arXiv, SSRN). The resolver will link the qualifying study in a market comment.

  • Does NOT count: meta-analyses without an individual qualifying study; outcomes where AI use is allowed; purely attitudinal/self-reported “I feel worse at problem-solving”; studies where “heavy use” is undefined or only inferred without evidence (e.g., detector-only classification without reported error rates/validation).

  • If no such study is found by the deadline, resolves NO.

Background

  • A 2025 PNAS field experiment gave high-school students GPT-4 access during practice; performance improved with AI, but when access was removed, those exposed to an unfettered chatbot performed worse than peers who never had AI (a “crutch” effect). The sample was “nearly a thousand,” i.e., close to but not clearly >1000, so it would not by itself meet this market’s threshold. (pubmed.ncbi.nlm.nih.gov)

  • In higher education, an arXiv study found students identified as GenAI users scored on average 6.71/100 points lower on exams than non-users, suggesting potential learning drawbacks; sample size and measurement details determine eligibility for this market. (arxiv.org)

Considerations

  • Measurement of “heavy AI use” varies (usage logs vs. self-report vs. AI-detector inference); detector-only measures should report validation/error rates to count.

  • Causality: randomized exposure or credible quasi-experiments are stronger than cross-sectional correlations; both can qualify if they meet the criteria and test performance without AI present.

  • Large-n studies are likeliest in K–12/college or large online platforms; look for proctored, AI-prohibited assessments explicitly documented in methods sections (e.g., PubMed-indexed articles or arXiv/SSRN working papers). (jmir.org)

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy