🐕 Will AI Be Able to Understand the, "Meaning" of Questions Significantly Better By the End of 2023?
52
closes Jan 1
29%
chance

Preface:

Please read the preface for this type of market and other similar third-party validated AI markets here.

Third-Party Validated, Predictive Markets: AI Theme

Market Description:

Break

As measured by, "Break," (non high-level) from Allen A.I. leaderboards here:

https://github.com/allenai/Break

https://allenai.github.io/Break/blogpost.html

"Significantly better," will be interpreted as meaning 30% better Normalized EM Score than the top post on this leaderboard at the time this market opened, compared to the end of the year, UTC.

https://leaderboard.allenai.org/break/submissions/public

Market Resolution Threshold:

At the time of authoring, the highest EM Score is:

0.4230
T5-Large

Tomer Wolfson, Tel Aviv University

So to qualify as, "Understanding the, "Meaning" of Questions Significantly Better By the End of 2023," for the purposes of this market, there would need to be a submission which scores >= 0.5499 by the end of the year, UTC.

Get Ṁ500 play money

Related questions

In 2028, will AI be at least as big a political issue as abortion?
ScottAlexander avatarScott Alexander
38% chance
Will AI be a major topic during the 2024 presidential debates in the United States?
MatthewBarnett avatarMatthew Barnett
28% chance
Will Biden sign an executive order primarily focused on AI in 2023?
SG avatarS G
50% chance
Will AI pass the Longbets version of the Turing test by the end of 2029?
dreev avatarDaniel Reeves
52% chance
Will an AI get gold on any International Math Olympiad by 2025?
Austin avatarAustin
31% chance
Will I observe significant Negative Polarization around AI generated art in 2023?
LarsDoucet avatarLars Doucet
30% chance
Will AI outcompete best humans in competitive programming before the end of 2023?
Will there have been a noticeable sector-wide economic effect from a new AI technology by the end of 2023?
Nostradamnedus avatarNostradamnedus
16% chance
Will AI be a Time Person of the Year in 2023?
Will >$100M be invested in dedicated AI Alignment organizations in the next year as more people become aware of the risk we are facing by letting AI capabilities run ahead of safety?
BionicD0LPH1N avatarBionic
81% chance
Will Tyler Cowen agree that an 'actual mathematical model' for AI X-Risk has been developed by October 15, 2023?
JoeBrenton avatarJoe Brenton
9% chance
Will anyone very famous claim to have made an important life decision because an AI suggested it by the end of 2023?
IsaacKing avatarIsaac
22% chance
Will I use an x.ai product during 2023?
jacksonpolack avatarjackson polack
22% chance
Will a NZ parliamentary party release an artificial intelligence policy prior to the 2023 election?
🐕 Will A.I. Be Able to Make Significantly Better, "Common Sense Judgements About What Happens Next," by End of 2023?
PatrickDelaney avatarPatrick Delaney
41% chance
Will an AI system be known to have resisted shutdown before 2024?
PeterWildeford avatarPeter Wildeford
14% chance
Google Trends: Will "AI" search term popularity peak again in 2023?
itsTomekK avatarTomek K 🟡
46% chance
Will Science's Top Breakthrough of the Year in 2023 be AI-related?
dp avatardp
40% chance
Will an AI produce encyclopedia-worthy philosophy by 2026?
JacobPfau avatarJacob Pfau
25% chance
Will AI be a Time Person of the Year in 2023?
Sort by:
PatrickDelaney avatar
Patrick Delaneysold Ṁ17 of YES

I wanted to try out the Manifold Loot box feature, which caused me to bet in my own market here. This was not intentional. I would not recommend the loot box feature for this reason.

vluzko avatar
Vincent Luczkowbought Ṁ10 of YES

If T5-Large is current SOTA then the benchmark is artificially lower because no one is actually trying.

14 replies
PatrickDelaney avatar
Patrick Delaney

@vluzko Can you expand on that?

DanMan314 avatar
Danpredicts NO

@vluzko ironically that's why I bought NO - even if there's a model that beats it there's no guarantee it's submitted.

vluzko avatar
Vincent Luczkowpredicts YES

@PatrickDelaney T5 sucks, it's like three generations behind. Which means no one is running their new models on this benchmark. If you ran got 3.5 on this it would probably resolve, never mind 4.

PatrickDelaney avatar
Patrick Delaney

@vluzko have other models been run against other similar "meaning of words" benchmarks to support your claim?

PatrickDelaney avatar
Patrick Delaney

@vluzko what is got 3.5? You mean gpt 3.5? Autocorrect?

DanMan314 avatar
Danpredicts NO

@PatrickDelaney Do you know of any other "meaning of words" benchmarks we could check? I think the main insight here is that the last non-T5 submission to that leaderboard was almost 2 years ago, so a bet on this market might be more about the chance that researchers choose to submit their model than it is about the overall state of AI "meaning" understanding.

vluzko avatar
Vincent Luczkowpredicts YES

@PatrickDelaney yeah gpt 3.5. stupid autocorrect

PatrickDelaney avatar
Patrick Delaney

@DanStoyell Yes, you are absolutely right. There's a, "map vs. territory," problem here, I recognize that. So I could either 1. Change this market if we can find a better, more active leaderboard on the same topic, or 2. Create another market on that more active benchmark or 3. People might still like to speculate on AllenAI, since it seems to be the highest SEO ranked leaderboard for now, and with the idea that there's potentially more money being focused on AI in general, people might start piling into leaderboards more now...?

I am really open to suggestions.

PatrickDelaney avatar
Patrick Delaney

@DanStoyell Why do you think this market is trading at 66% as opposed to closer to NO, where you have bet at this point? Is there some special knowledge that you may not have or are people speculating, not looking as deeply into what the metric is as you are at this point?

DanMan314 avatar
Danpredicts NO

@PatrickDelaney I don't really feel like I have super special insight, I'm mostly going off the extrapolated rate of improvement over the last 2 years combined with the lack of submissions. 66% does feel quite high to me given that, but I'm not betting very much because it wouldn't really surprise me at all if a submission did come along that fit the criteria.

DanMan314 avatar
Danpredicts NO

@PatrickDelaney I did briefly Google for a more active leaderboard but didn't see anything obvious. Making a market that objectively reflects the actual problem you're trying to get at is definitely very hard.

PatrickDelaney avatar
Patrick Delaney

@DanStoyell Yeah if you can think of anything else please let me know.

PatrickDelaney avatar
Patrick Delaney

@DanStoyell I had forgot I put together this market as a place to bookmark more leaderboards as they come up, as well as any AI institutions that may have leaderboards that I'm not aware of yet (e.g. need to search those sites more). I know there are also one-off leaderboards out there which I have seen, maintained by a single small group of researchers. Overall I am dedicated to carving out more markets to try to build a hopefully more accurate snapshot of where AI is going beyond a lot of the speculative sci-fi nonsense which dominates the conversation right now. After having read more about Google Big Bench, it's really a collection of a ton of different separate benchmarks, similar to AllenAI, but much of them seem to not have public submissions displayed within the repo yet.

PatrickDelaney avatar
Patrick Delaney

@DanStoyell So over time, as I comb through these, I might find a better answer to our question.

PatrickDelaney avatar
Patrick Delaney

Another relevant market:

PatrickDelaney avatar
Patrick Delaney

See my other relevant market:

1 reply
PatrickDelaney avatar
Patrick Delaney

@PatrickDelaney Another: