๐Ÿ• Will A.I. Be Able to Make Significantly Better, "Common Sense Judgements About What Happens Next," by End of 2023?
โž•
Plus
140
แน€10k
resolved Jan 8
Resolved
NO

Preface:

Please read the preface for this type of market and other similar third-party validated AI markets here.

Third-Party Validated, Predictive Markets: AI Theme

Market Description:

Resolved by submissions at:

https://leaderboard.allenai.org/visualcomet/submissions/public

Visual Comet

This leaderboard collects evaluations of current AI systems on Visual commonsense tasks that measure both the knowledge that these systems possess as well as their ability to reason with and use that knowledge in context of an event in an image.

Even from a single frame of a still image, people can reason about the dynamic story of the image before, after, and beyond the frame. For example, given an image of a man struggling to stay afloat in water, we can reason that the man fell into the water sometime in the past, the intent of that man at the moment is to stay alive, and he will need help in the near future or else he will get washed away.

An example input question contains the following fields in JSON format:

  {
     "img_fn": "lsmdc_3005_ABRAHAM_LINCOLN_VAMPIRE_HUNTER/3005_ABRAHAM_LINCOLN_VAMPIRE_HUNTER_00.27.43.141-00.27.45.534@0.jpg",
     "movie": "3005_ABRAHAM_LINCOLN_VAMPIRE_HUNTER",
     "metadata_fn": "lsmdc_3005_ABRAHAM_LINCOLN_VAMPIRE_HUNTER/3005_ABRAHAM_LINCOLN_VAMPIRE_HUNTER_00.27.43.141-00.27.45.534@0.json",
     "place": "at a fancy party",
     "event": "1 is trying to talk to the pretty woman in front of him"
  }

Example sets of images (not necessarily related to the above)

Market Resolution Threshold:

  • Note, at the time of authoring this, human performance is 0.5

Top leftmost score, BLEU 1 is 0.3500 at time of authoring. We would need to see 1.3*0.3500 or >=0.4550 by the end of the year for this market to resolve as YES, otherwise this resolves NO.

Score to Beat -> 0.4550

Get
แน€1,000
and
S3.00
Sort by:

If a closed-source model is publicly demonstrated to have significantly advanced capabilities in this regard, such that it is very clear that it would reach the score-to-beat, but the model is not scored through visualcomet, how does this resolve?

predictedYES

@AdamK Greetings @AdamK I didn't see this comment. To answer your question, this is strictly resolving to VisualComet. I thought that would have been clear by the inclusion in the, "Third-Party Validated, Predictive Markets, AI Theme."

These markets give very little leeway to things like, "very clear that it would reach score to beat." One person's subjective, "very clear," is another person's, "not very clear."

For reference to see how consistent I'm being in how this is done, you can check out all of the other similar markets which have the little dog emoji next to them here:

https://manifold.markets/browse?s=score&f=resolved&ct=ALL&topic=third-party-validated-predictive-ma-6bab86c0b8b0

There was one market where I even had to create a new market because the original market was doomed by the threshold I used. Pretty much all users have given me 5 stars for all of these resolutions.

You can bet on additional third-party validated AI markets here:

https://manifold.markets/browse?s=score&f=open&ct=ALL&topic=third-party-validated-predictive-ma-6bab86c0b8b0

Although I don't officially endorse any that are not mine.

@AdamK If you know of any other metric which is similar to VisualComet, let me know.

@PatrickDelaney I could see this happening real soon. IBM is already developing an AI which can debate and argue with a human, in the hopes that such a dialectic could lead them to make better decisions:

See Project Debater:

https://research.ibm.com/interactive/project-debater/

predictedNO

@Meta_C can it play both sides to achieve neutrality or the opponent is simply steamrolled?

predictedYES

@ICRainbow Not sure about playing both sides. As for whether it can outperform humans, so far the answer is โ€œNoโ€. That was already a few years ago though, havenโ€™t found more recent debates with this:

https://www.vox.com/future-perfect/2019/2/12/18222392/artificial-intelligence-debate-ibm-san-francisco

predictedNO

@Meta_C if those debates are anything like they do in debate clubs, then that's quite rotten format that shows not much about actual capabilities.

I'd like a system that is able to conduct and partake a street epistemology (basically a Socratic dialogue) session. That, of course isn't too indicative either, but at least we can get an insight at how people and machines think.

Example markets:

ยฉ Manifold Markets, Inc.โ€ขTerms + Mana-only Termsโ€ขPrivacyโ€ขRules