
Atari environment: standard ALE (https://paperswithcode.com/dataset/arcade-learning-environment)
"Superhuman performance": I'm using the common industry definition, meaning the agent eventually achieves performance better than a particular human baseline (the human normalized score). Please note that this is not the best score achieved by any human. Yeah this is confusing, but it is common terminology and I'm sticking to it.
Learned causal model:
The agent has a model of the environment
The model is a causal model in the sense that there is an explicit causal diagram associated with it
The causal diagrams are not complete bipartite graphs
The causal diagrams are at least sometimes close to minimal (<=150% of the minimum number of arrows required)
The diagrams are not hardcoded or otherwise substantially provided to the model
The environments are not annotated with additional causal information. Just the standard ALE outputs.
Additional context: this is/was the focus of my research. I think doing it without access to ground truth causal diagrams is hard.