Will AI get ≥90% one-shot accuracy on Mintaka before 2024?

260Ṁ669

resolved Jan 16

Resolved

ALL

This is a clone of Jack Clark's prediction.

Mintaka is a multilingual Q&A dataset recently released by Amazon. SOTA at the time of writing is 31% (finetuned T5).

New Year's Resolutions 2024

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ65
2		Ṁ59
3		Ṁ40
4		Ṁ11
5		Ṁ3

People are also trading

Will AI image generating models score >= 90% on Winoground by June 1, 2025?

76% chance

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

72% chance

Will any AI model score >80% on Epoch's Frontier Math Benchmark in 2025?

17% chance

Will an AI model outperform 95% of Manifold users on accuracy before 2026?

49% chance

Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?

60% chance

Will an AI achieve >80% performance on the FrontierMath benchmark before 2027?

74% chance

Will any AI model achieve > 40% on Frontier Math before 2026?

84% chance

Will an AI score over 80% on FrontierMath Benchmark in 2025

21% chance

Will any AI model score above 95% on GRAB by the end of 2025?

41% chance

Will an AI system beat humans in the GAIA benchmark before the end of 2025?

Sort by:

predictedYES

I haven't been following this and there's no nice benchmark page on paperswithcode, so I'm gonna resolve based on the results sections of papers citing the original Mintaka paper that are listed on GScholar.

predictedYES

@JavierPrieto The most recent paper I found in that search gets 53.1% using ChatGPT (see table 1). They don't say whether that beats sota but, after a cursory glance at some of the other papers, I haven't seen anyone claim higher performance, so I'm gonna go with this one and reopen if someone finds a better one.

(Buying Stability AI at ~1B and shorting anthropic at 4-6B would be the spread trade of the century, if anyone could get borrow on a large block of Anthropic shares.

Not seeing a lot of technical ability over there…)

🤮

Good paper, very skeptical of the hand-wavy nature of his claim.

Retrieval transformers are the natural fit, but human agreement was only 82% and usually 90% is way harder than lower levels. (Sigmoid curves and label error/indeterminacy.)

Is there evidence he’s good at prediction or more of a promoter?

Will AI get ≥90% one-shot accuracy on Mintaka before 2024?, 8k, beautiful, illustration, trending on art station, picture of the day, epic composition