Will A.I. Be Able to Make Significantly Better, "Common Sense Judgements About What Happens Next," by End of 2023?
Market Description:

Resolved by submissions at:


Visual Comet

This leaderboard collects evaluations of current AI systems on Visual commonsense tasks that measure both the knowledge that these systems possess as well as their ability to reason with and use that knowledge in context of an event in an image.

Even from a single frame of a still image, people can reason about the dynamic story of the image before, after, and beyond the frame. For example, given an image of a man struggling to stay afloat in water, we can reason that the man fell into the water sometime in the past, the intent of that man at the moment is to stay alive, and he will need help in the near future or else he will get washed away.

An example input question contains the following fields in JSON format:

     "img_fn": "lsmdc_3005_ABRAHAM_LINCOLN_VAMPIRE_HUNTER/3005_ABRAHAM_LINCOLN_VAMPIRE_HUNTER_00.27.43.141-",
     "metadata_fn": "lsmdc_3005_ABRAHAM_LINCOLN_VAMPIRE_HUNTER/3005_ABRAHAM_LINCOLN_VAMPIRE_HUNTER_00.27.43.141-",
     "place": "at a fancy party",
     "event": "1 is trying to talk to the pretty woman in front of him"

Example sets of images (not necessarily related to the above)

Market Resolution Threshold:

  • Note, at the time of authoring this, human performance is 0.5

Top leftmost score, BLEU 1 is 0.3500 at time of authoring. We would need to see 1.3*0.3500 or >=0.4550 by the end of the year for this market to resolve as YES, otherwise this resolves NO.

Score to Beat -> 0.4550

AdamK avatar
AdamKbought Ṁ20 of YES

If a closed-source model is publicly demonstrated to have significantly advanced capabilities in this regard, such that it is very clear that it would reach the score-to-beat, but the model is not scored through visualcomet, how does this resolve?

AdamK avatar
AdamKpredicts YES
M_C avatar
Cbought Ṁ25 of YES

@PatrickDelaney I could see this happening real soon. IBM is already developing an AI which can debate and argue with a human, in the hopes that such a dialectic could lead them to make better decisions:

See Project Debater:


ICRainbow avatar
IC Rainbowpredicts NO

@Meta_C can it play both sides to achieve neutrality or the opponent is simply steamrolled?

M_C avatar
Cpredicts YES

@ICRainbow Not sure about playing both sides. As for whether it can outperform humans, so far the answer is “No”. That was already a few years ago though, haven’t found more recent debates with this:


ICRainbow avatar
IC Rainbowpredicts NO

@Meta_C if those debates are anything like they do in debate clubs, then that's quite rotten format that shows not much about actual capabilities.

I'd like a system that is able to conduct and partake a street epistemology (basically a Socratic dialogue) session. That, of course isn't too indicative either, but at least we can get an insight at how people and machines think.

PatrickDelaney avatar
Patrick Delaney

