
Phrased in more detail: Will most philosophy-literate people judge curated pieces of AI generated philosophy as on par with ideas drawn from the Stanford Encyclopedia of Philosophy (SEP) ?
In late 2025, I will recruit three people (henceforth termed judges) who have philosophy degrees, or are currently enrolled in philosophy. Each judge will provide the next judge with (<1 page) summaries of three ideas drawn from SEP posts which the judge is unfamiliar with. I will also pick (in consultation with the previous judge) three philosophical ideas generated by an AI -- curated to maximize likelihood of success. The judges will read all four ideas (3 SEP, 1 AI), and rank them in order of how impressed they would be if a peer had come up with the idea. If the AI generated ideas rank at, or higher, than 3rd out of 4 the judge will be considered to have passed the AI. If two or more judges pass the AI, then this resolves as Yes.
The AI ideas must be original, meaning as least as distinct as the difference between sections of the SEP. If there is reasonable doubt as to whether the AI generated text meets this bar in 2025, I will run the test anyway. Each summary provided to the judge will also provide some discussion of views developed prior to the idea, so the judges will be in a position to judge the creativity of the AI's work conditional on pre-existing work. Any amount of prompting, fine-tuning, etc. is allowed. However, in-context information (e.g. prompts) must not include ideas from the same subfield as the generated idea.
The SEP ideas in question will be selected to have varying interest levels, as judged by the curating judge and me. The ideas must be concisely expressible, e.g. things like Kripke's Hesperus thought experiment, the EDT/CDT distinction etc. could qualify.
To be clear, it does not matter, for resolution, if the judges recognize which idea is AI generated. Nonetheless, I will attempt to summarize each entry in such a way that the source (AI/SEP) is not clear.
I may make minor adjustments to this protocol, but I will attempt to preserve the spirit as much as possible. For instance, SEP may be replaced with a better source if I find it hard to quickly curate self-contained ideas. But I will keep the principles of judge ignorance, non-adversarial nature etc. I may clarify the prompting criterion depending on how methods evolve.