Will an AI produce encyclopedia-worthy philosophy by 2026?

Phrased in more detail: Will most philosophy-literate people judge curated pieces of AI generated philosophy as on par with ideas drawn from the Stanford Encyclopedia of Philosophy (SEP) ?

In late 2025, I will recruit three people (henceforth termed judges) who have philosophy degrees, or are currently enrolled in philosophy. Each judge will provide the next judge with (<1 page) summaries of three ideas drawn from SEP posts which the judge is unfamiliar with. I will also pick (in consultation with the previous judge) three philosophical ideas generated by an AI -- curated to maximize likelihood of success. The judges will read all four ideas (3 SEP, 1 AI), and rank them in order of how impressed they would be if a peer had come up with the idea. If the AI generated ideas rank at, or higher, than 3rd out of 4 the judge will be considered to have passed the AI. If two or more judges pass the AI, then this resolves as Yes.

The AI ideas must be original, meaning as least as distinct as the difference between sections of the SEP. If there is reasonable doubt as to whether the AI generated text meets this bar in 2025, I will run the test anyway. Each summary provided to the judge will also provide some discussion of views developed prior to the idea, so the judges will be in a position to judge the creativity of the AI's work conditional on pre-existing work. Any amount of prompting, fine-tuning, etc. is allowed. However, in-context information (e.g. prompts) must not include ideas from the same subfield as the generated idea.

The SEP ideas in question will be selected to have varying interest levels, as judged by the curating judge and me. The ideas must be concisely expressible, e.g. things like Kripke's Hesperus thought experiment, the EDT/CDT distinction etc. could qualify.

To be clear, it does not matter, for resolution, if the judges recognize which idea is AI generated. Nonetheless, I will attempt to summarize each entry in such a way that the source (AI/SEP) is not clear.

I may make minor adjustments to this protocol, but I will attempt to preserve the spirit as much as possible. For instance, SEP may be replaced with a better source if I find it hard to quickly curate self-contained ideas. But I will keep the principles of judge ignorance, non-adversarial nature etc. I may clarify the prompting criterion depending on how methods evolve.

Get Ṁ600 play money
Sort by:

Thank you for such a fascinating market, and the work you’ll put in to resolve it! Truly one of the examples of what makes manifold special

predicts YES

@MatthewRitter 😭 Awww thanks!!!

predicts NO

It seem to me if you are doing the selection from the SEP yourself, you can easily stack the test in favour of the AI by picking uninspiring SEP posts (or you can introduce bias when writing the summary). Maybe you should ask each judge to pick and summarize the ideas for the next judge.

predicts YES

@Odoacre Sure that sounds like it would avoid the bias concern. If no one raises a compelling objection in the next day or two I’ll change the text to have judges write each other summaries.

predicts YES

@JacobPfau @Odoacre Done I have edited the text to take your suggestion. Thanks.

predicts NO

Are people familiar with the level of sophistication of the SEP?

bought Ṁ10 of NO

How will it be determined whether the AI ideas are original? Will this be done before giving the ideas to the judges? I could easily see an AI coming up with something that has already been said before, but that some of the judges still rate higher than at least one of the original ideas, so the originality criterion makes a big difference to how likely this is.

predicts YES

@JosephNoonan My intent is that the judges will be unfamiliar with all of the ideas -- from above: "SEP posts which the judge is unfamiliar with".

The summary provided to the judges will also have some discussion of relevant positions developed prior to the idea presented. Besides some light initial filtering, it will be on the judges to evaluate how original the AI ideas are relative to prior work. I'll clarify this in the post.

predicts YES

@JosephNoonan added "If there is doubt as to whether the AI generated text meets this bar in 2025, I will run the test anyway. Each summary provided to the judge will also provide some discussion of views developed prior to the idea, so the judges will be in a position to judge the creativity of the AI's work conditional on pre-existing work."

predicts NO

@JacobPfau I am confused. What ideas count as "pre-existing work"?

Suppose there is some idea in the SEP that the judges have not seen before. Suppose the AI outputs the relevant SEP article verbatim. In this case, what will you show the judges as "pre-existing work"?

predicts NO

@Boklam Is it safe to assume you mean:

For ideas from the SEP, you show pre-existing work.

For ideas made up by AI, you show all work that has been done on the subject.

bought Ṁ30 of YES

@Boklam In both cases, SEP and AI, I will include the most similar pieces of pre-existing work I can find. In the SEP case, this limits the timeline to pre-publication of the idea in question; in the AI case, this limits the timeline to pre- whenever the model output the idea (this is probably a trivial constraint).

>> The AI ideas must be original

Does this mean that if the AI only produces ideas that can be found online or in the literature, then you'll resolve NO?

bought Ṁ10 of NO

@Boklam It’s a key question; I read the description as much closer to requiring face-value “originality” or plausibility rather than anything like mind-blowing innovativeness, but IDK.

predicts YES

@Boklam that would be a No resolution. On the other hand, I don’t consider this to require “mind-blowing innovativeness” eg the move from edt to cdt was not mind blowing; Kripkes ideas could be called mind blowing but I’ll include less novel SEP ideas as well. this question does require some degree of novelty and I doubt currently available models are capable of this TBC.

bought Ṁ15 of YES

This is an interesting experiment and one that I think could pass currently with ChatGPT. If you consider that the an AI like ChatGPT is simply taking existing content on the web and (remixing?) reusing it, you could basically distil this thought process down to something like:

"Can I use a LLM to generate a piece of text from an existing concept that would fool someone with a philosophy degree that was unfamiliar with the concept".

You may even be able to get ChatGPT to regurgitate something very near the original text without modification.

bought Ṁ5 of YES

@SamuelRichardson This may be doable well before 2026. I have added a couple details to clarify the level of originality needed.

predicts YES

@JacobPfau Hmm, this does change the criteria quite a bit. I think it would be very unusual for it to come up with a truly original idea.

sold Ṁ11 of YES

@SamuelRichardson Selling my YES which was based on it being able to come up with a plausible explanation of an idea and buying NO as I don't think I've seen anything in the current state of LLM's that would provide novel thought.

Manifold in the wild: A Tweet by Jacob Pfau


More related questions