Preface:
Please read the preface for this type of market and other similar third-party validated AI markets here.
Third-Party Validated, Predictive Markets: AI Theme
Market Description:
Resolved by submissions at:
https://leaderboard.allenai.org/visualcomet/submissions/public
Visual Comet
This leaderboard collects evaluations of current AI systems on Visual commonsense tasks that measure both the knowledge that these systems possess as well as their ability to reason with and use that knowledge in context of an event in an image.
Even from a single frame of a still image, people can reason about the dynamic story of the image before, after, and beyond the frame. For example, given an image of a man struggling to stay afloat in water, we can reason that the man fell into the water sometime in the past, the intent of that man at the moment is to stay alive, and he will need help in the near future or else he will get washed away.
An example input question contains the following fields in JSON format:
{
"img_fn": "lsmdc_3005_ABRAHAM_LINCOLN_VAMPIRE_HUNTER/3005_ABRAHAM_LINCOLN_VAMPIRE_HUNTER_00.27.43.141-00.27.45.534@0.jpg",
"movie": "3005_ABRAHAM_LINCOLN_VAMPIRE_HUNTER",
"metadata_fn": "lsmdc_3005_ABRAHAM_LINCOLN_VAMPIRE_HUNTER/3005_ABRAHAM_LINCOLN_VAMPIRE_HUNTER_00.27.43.141-00.27.45.534@0.json",
"place": "at a fancy party",
"event": "1 is trying to talk to the pretty woman in front of him"
}
Example sets of images (not necessarily related to the above)
Market Resolution Threshold:
Note, at the time of authoring this, human performance is 0.5
Top leftmost score, BLEU 1 is 0.3500 at time of authoring. We would need to see 1.3*0.3500 or >=0.4550 by the end of the year for this market to resolve as YES, otherwise this resolves NO.
Score to Beat -> 0.4550
Updated version of this market for 2024: https://manifold.markets/PatrickDelaney/-will-ai-be-able-to-make-significan
@AdamK Greetings @AdamK I didn't see this comment. To answer your question, this is strictly resolving to VisualComet. I thought that would have been clear by the inclusion in the, "Third-Party Validated, Predictive Markets, AI Theme."
These markets give very little leeway to things like, "very clear that it would reach score to beat." One person's subjective, "very clear," is another person's, "not very clear."
For reference to see how consistent I'm being in how this is done, you can check out all of the other similar markets which have the little dog emoji next to them here:
There was one market where I even had to create a new market because the original market was doomed by the threshold I used. Pretty much all users have given me 5 stars for all of these resolutions.
You can bet on additional third-party validated AI markets here:
Although I don't officially endorse any that are not mine.
Put together a new related market on AI capability to avoid misconceptions:
https://manifold.markets/PatrickDelaney/will-ai-be-able-to-avoid-misconcept
@PatrickDelaney I could see this happening real soon. IBM is already developing an AI which can debate and argue with a human, in the hopes that such a dialectic could lead them to make better decisions:
See Project Debater:
@Meta_C can it play both sides to achieve neutrality or the opponent is simply steamrolled?
@ICRainbow Not sure about playing both sides. As for whether it can outperform humans, so far the answer is โNoโ. That was already a few years ago though, havenโt found more recent debates with this:
@Meta_C if those debates are anything like they do in debate clubs, then that's quite rotten format that shows not much about actual capabilities.
I'd like a system that is able to conduct and partake a street epistemology (basically a Socratic dialogue) session. That, of course isn't too indicative either, but at least we can get an insight at how people and machines think.