Third-Party Validated, Predictive Markets: AI Theme

Market Description

AI2-THOR Rearrangement Challenge

This question pertains to the following AI challenge:

  • The goal of this challenge is to build a model/agent that move objects in a room to restore them to a given initial configuration.

  • Example query:

task involves moving and modifying (i.e. opening/closing) randomly placed objects within a room to obtain a goal configuration. There are 2 phases:

  1. Walkthrough 👀. The agent walks around the room and observes the objects in their ideal goal state.

  2. Unshuffle 🏋. After the walkthrough phase, we randomly change between 1 to 5 objects in the room. The agent's goal is to identify which objects have changed and reset those objects to their state from the walkthrough phase. Changes to an object's state may include changes to its position, orientation, or openness.

The resolution for this market will be here:

Market Resolution Threshold:

  • If any 2022 AI2-THOR Rearrangement Challenge Submission Get a % Fixed Strict (Test) Score of >0.4 by end of 2023, this resolves as YES, otherwise NO.

predicted NO

Same question for 2024: ... though I have not set the threshold yet.

predicted NO

the only other contest I could find with a quick search was this one: ... which had a score of 0.2894 as the top one thus far in 2023, so I'm resolving NO. Please let me know if you object.

predicted NO

@PatrickDelaney For this market it doesn't change the outcome, but in general I think you should only use the specific contest variant you were supposed to use for resolution.

predicted NO

@na_pewno I agree. I was just posting that as further evidence for anyone who may have come by and pointing this contest out, but yes, I completely agree, and the purpose of this market was to stay strict to the above metric chosen.

I also think it's interesting (though not relevant to market resolution) to just stay informed on this stuff. AllenAI claims now that there is going to be an, "exponential improvement," in embodiment due to something called ProcThor (question posted here

predicted NO
predicted NO

Another one, more of a cross section based on Google Big Bench:

predicted NO

Other relevant markets:

