🐕 Will AI Be Able to Understand the, "Meaning" of Questions Significantly Better By the End of 2023?

990Ṁ6870

resolved Jan 10

Resolved

ALL

Preface:

Please read the preface for this type of market and other similar third-party validated AI markets here.

Third-Party Validated, Predictive Markets: AI Theme

Market Description:

Break

As measured by, "Break," (non high-level) from Allen A.I. leaderboards here:

https://github.com/allenai/Break

https://allenai.github.io/Break/blogpost.html

"Significantly better," will be interpreted as meaning 30% better Normalized EM Score than the top post on this leaderboard at the time this market opened, compared to the end of the year, UTC.

https://leaderboard.allenai.org/break/submissions/public

Market Resolution Threshold:

At the time of authoring, the highest EM Score is:

0.4230
T5-Large
Tomer Wolfson, Tel Aviv University

So to qualify as, "Understanding the, "Meaning" of Questions Significantly Better By the End of 2023," for the purposes of this market, there would need to be a submission which scores >= 0.5499 by the end of the year, UTC.

Technology

Technical AI Timelines

Third Party Validated, Predictive Markets: AI

New Year's Resolutions 2024

Third Party Validated, Predictive Markets

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ527
2		Ṁ177
3		Ṁ145
4		Ṁ128
5		Ṁ113

People are also trading

🐕 Will AI Be Able to Gain a Much Broader Academic and Professional Understanding by the End of 2024?

43% chance

🐕 Will A.I. Be Able to Make Significantly Better, "Common Sense Judgements About What Happens Next," by End of 2024?

43% chance

🐕 Will Any AI Effectively Achieve Higher than Human Level Reasoning Through Common Sense Questions, By 2024 End?

37% chance

🐕 Will A.I. Achieve Significantly Higher Performance Over "General Conceptual Skills" by end of 2024?

25% chance

🐕 Will AI Achieve Significantly More, "Embodiment" by end of 2024?

15% chance

🐕 Will A.I. Be Significantly Better at, "Egocentric Navigation," by the End of 2024?

14% chance

🐕 Will A.I. Be Able to, "Feel and React to Pain," Significantly Better By the End of 2024?

33% chance

🐕 Will A.I. Become Significantly Better at Drug Discovery in 2024?

4% chance

🐕 Will A.I., "Hallucinate Significantly Less," by the End of 2024?

16% chance

🐕 Will Any AI Effectively Achieve Higher Than Human Level at Answering Multiple Choice, Grounded Situations (EO 2024)?

Sort by:

Here's what I have from the leaderboard today, resolves no:

New Market:

https://manifold.markets/PatrickDelaney/-will-ai-be-able-to-understand-the

Put together a new related market on AI capability to avoid misconceptions:

https://manifold.markets/PatrickDelaney/will-ai-be-able-to-avoid-misconcept

Will A.I. Be Able to Avoid Misconceptions Significantly Better by the End of 2023?

26% chance. Preface / Inspiration: There are a lot of questions on Manifold about whether or not we’ll see sentience, general A.I., and a lot of other nonsense and faith-based questions which rely on the market maker’s interpretation and often close at some far distant point in the future when a lot…

I wanted to try out the Manifold Loot box feature, which caused me to bet in my own market here. This was not intentional. I would not recommend the loot box feature for this reason.

If T5-Large is current SOTA then the benchmark is artificially lower because no one is actually trying.

@vluzko Can you expand on that?

predictedNO

@vluzko ironically that's why I bought NO - even if there's a model that beats it there's no guarantee it's submitted.

predictedYES

@PatrickDelaney T5 sucks, it's like three generations behind. Which means no one is running their new models on this benchmark. If you ran got 3.5 on this it would probably resolve, never mind 4.

@vluzko have other models been run against other similar "meaning of words" benchmarks to support your claim?

@vluzko what is got 3.5? You mean gpt 3.5? Autocorrect?

predictedNO

@PatrickDelaney Do you know of any other "meaning of words" benchmarks we could check? I think the main insight here is that the last non-T5 submission to that leaderboard was almost 2 years ago, so a bet on this market might be more about the chance that researchers choose to submit their model than it is about the overall state of AI "meaning" understanding.

predictedYES

@PatrickDelaney yeah gpt 3.5. stupid autocorrect

@DanStoyell Yes, you are absolutely right. There's a, "map vs. territory," problem here, I recognize that. So I could either 1. Change this market if we can find a better, more active leaderboard on the same topic, or 2. Create another market on that more active benchmark or 3. People might still like to speculate on AllenAI, since it seems to be the highest SEO ranked leaderboard for now, and with the idea that there's potentially more money being focused on AI in general, people might start piling into leaderboards more now...?

I am really open to suggestions.

@DanStoyell Why do you think this market is trading at 66% as opposed to closer to NO, where you have bet at this point? Is there some special knowledge that you may not have or are people speculating, not looking as deeply into what the metric is as you are at this point?

predictedNO

@PatrickDelaney I don't really feel like I have super special insight, I'm mostly going off the extrapolated rate of improvement over the last 2 years combined with the lack of submissions. 66% does feel quite high to me given that, but I'm not betting very much because it wouldn't really surprise me at all if a submission did come along that fit the criteria.

predictedNO

@PatrickDelaney I did briefly Google for a more active leaderboard but didn't see anything obvious. Making a market that objectively reflects the actual problem you're trying to get at is definitely very hard.

@DanStoyell Yeah if you can think of anything else please let me know.

@DanStoyell I had forgot I put together this market as a place to bookmark more leaderboards as they come up, as well as any AI institutions that may have leaderboards that I'm not aware of yet (e.g. need to search those sites more). I know there are also one-off leaderboards out there which I have seen, maintained by a single small group of researchers. Overall I am dedicated to carving out more markets to try to build a hopefully more accurate snapshot of where AI is going beyond a lot of the speculative sci-fi nonsense which dominates the conversation right now. After having read more about Google Big Bench, it's really a collection of a ton of different separate benchmarks, similar to AllenAI, but much of them seem to not have public submissions displayed within the repo yet.