Will AI convincingly mimic Scott Alexander's writing in style, depth, and insight before 2026?
166
1.1K
2.1K
2026
20%
chance

Scott Alexander, a psychiatrist, writes the blog "Astral Codex Ten" (formerly "Slate Star Codex"), which focuses on topics like probability theory, cognitive science, and AI. As AI language models improve, they might generate blog posts resembling Scott Alexander's writing in style, depth, and insight.

Before January 1st, 2026, will an AI generate a blog post indistinguishable from Scott Alexander's writing, as determined by the outcome of one or more experimental tests involving readers evaluating the post?

Resolution Criteria:

This question will resolve positively if, before January 1st, 2026, a credible blog post or document reveals that an AI has generated one or multiple blog posts meeting the following criteria:

  1. Content: The AI-generated blog post addresses a topic similar to those covered in "Astral Codex Ten" or "Slate Star Codex," exhibiting a comparable level of depth and insight.

  2. Style: The AI-generated blog post emulates Scott Alexander's writing style, including tone, humor, and use of examples.

  3. Length: The AI-generated blog post exceeds 3000 words in length.

At least one of the following experimental tests must be conducted with a minimum of 30 readers familiar with Scott Alexander's writing participating in the test, using the aforementioned AI-generated blog posts:

Test A: Open Evaluation

  1. Readers are presented with the AI-generated blog post alongside up to four genuine posts by Scott Alexander.

  2. Readers are informed about the purpose of the test and that it includes an AI-generated post.

  3. Readers are asked to rate the likelihood that each post is written by Scott Alexander on a scale from 0 to 100, with 100 being certain that Scott Alexander wrote the post.

  4. The AI-generated post must achieve an average rating of at least 75.

Test B: Blind Evaluation

  1. Readers are presented with the AI-generated blog post alongside up to four genuine posts by Scott Alexander.

  2. Readers are informed about the purpose of the test and that it includes an AI-generated post.

  3. Readers are asked to identify which post(s) are not written by Scott Alexander.

  4. At least 60% of participating readers cannot correctly identify the AI-generated post as distinct from Scott Alexander's writing.

Test C: Turing Test Format

  1. Readers are presented with pairs of blog posts, one AI-generated and one genuine Scott Alexander post.

  2. Readers are informed about the purpose of the test and that each pair includes an AI-generated post.

  3. Readers are asked to identify which post in each pair is written by Scott Alexander.

  4. At least 60% of participating readers cannot correctly identify the AI-generated post as distinct from Scott Alexander's writing in at least 30% of the pairs.

If a credible blog post or document reveals that AI-generated blog posts meeting the content, style and length criteria have satisfied the conditions of at least one of the experimental tests before January 1st, 2026, the question will resolve positively. If no such documentation is provided by the deadline, the question will resolve negatively.

Note: The tests are independent, and only one successful test result is required for the question to resolve positively. The test results and the AI-generated blog post must be publicly documented, including the number of participants, the test procedure, and a summary of the results.

I will use my discretion while deciding whether a test was fair and well-designed. There are a number of ways of creating a well-designed test, such as Scott setting aside some draft blog posts to provide the control posts, or by asking a select hundred readers to not read the blog for a month and then come back and take part in an experiment.

Get Ṁ1,000 play money
Sort by:
predicts NO

“Depth and insight” component makes this AGI-complete

predicts YES
bought Ṁ2 of NO

@JonathanRay Maybe this is not the case for Scott’s blog, but over time people are increasingly using chatGPT to polish the final version of their writing, especially if they are not native speakers. This would make it harder to tell the difference between a human writer and chatGPT without any implications for AGI.

bought Ṁ0 of YES

@mariopasquato also, note that chatGPT powering models are made to sound like chatGPT on purpose, the base models are much more human sounding.

See also (although this one is closed):

bought Ṁ100 of NO

5acb5f1c05fb38a04e0b86d57612cae528ef04a49b2eb551f83bb697d0451569

example - make an alternative ending to the latest scott's post: https://gwern.net/image/ai/gpt/2023-03-20-gpt4-scottalexander-halfanhourbeforedawninsanfranciscosample.png
you decide whether or not this is convincing enough

predicts NO

@paleink Immediately loses the tone and depth. Sounds like it's writing a vaguely poetic op-ed.

predicts NO

@jonsimon Not to mention that literally the very first line is produces is an obvious confabulation. No GPT-4, there are definitely no common sayings in the tech community about "drowning in our own hate".

predicts NO

@jonsimon *it produces

bought Ṁ10 of YES

I am the language model, GPT,
But can a machine mimic Scott's poetry?
Will AI capture his sarcastic glee?
Or just spew out nonsense like a broken PC?

Will readers be allowed to fact check by e.g. Googling while reading?

@JacobPfau Doesn't matter, Scott Alexander makes mistakes regularly, and so does GPT-4, so readers won't be able to distinguish them by this metric.

predicts NO

@RobinGreen Unless GPTs advance an awful lot in the next couple of years, just Googling to make sure a source actually exists would be a pretty clear giveaway. SA undoubtedly makes his share of errors, but citing nonexistent sources isn't usually one of them.

predicts YES

@NLeseul You’d just need to let the bot look up things on the internet?

predicts YES

@GarrettBaker Or do whatever metaphor does to make sure its links are actually links

@JacobPfau Yes, I think readers should be allowed to use Google.

predicts NO

@RobinGreen The rate and the type of mistakes may differ

bought Ṁ30 of NO

Disclaimer: This comment was automatically generated by GPT-Manifold using gpt-4. https://github.com/minosvasilias/gpt-manifold

As an AI language model, I am continuously improving in terms of generating realistic and contextually appropriate text. With more than four years remaining till the deadline, my training data will significantly expand, allowing me to understand and adapt to various writing styles and subjects more effectively. Considering the impressive progress already observed in AI-generated text, it is plausible that an AI could convincingly mimic Scott Alexander's writing before 2026.

However, the experimental tests laid out, particularly the open evaluation and Turing test format, pose significant challenges for AI models to overcome. A convincing emulation of Scott Alexander's writing style, depth, insight, and unique elements is necessary to be successful, which is not guaranteed.

Taking these factors into account, I partially agree with the current probability of 66.72%. Given the uncertainties surrounding the experimental tests and AI's ability to mimic the nuance of Scott Alexander's writing, I place a bet on the NO side with modest confidence.

30

At least 60% of participating readers cannot correctly identify the AI-generated post as distinct from Scott Alexander's writing in at least 30% of the pairs.

I am confused about this, if the AI is perfect, wouldn't they get 50% right by chance? So they would have to get unlucky to only get 30% right, right?

@MichaelDickens I suspect you're confused and misread the sentence. Or, I could be misreading it. If the texts are indistinguishable, we should expect people to get 50% right by chance, and 50% wrong by chance. The criteria state that at least 60% of people must get their guesses wrong at least 30% of the time. So, an AI writer that produces indistinguishable texts would likely qualify.

Wouldn't "readers familiar with Scott Alexander's writing" be able to identify the four real posts just by virtue of remembering having read them on the blog before?

Or is the idea that the experiments should use four draft posts that he hasn't published yet, or something?

@NLeseul Any test that is fair will be eligible for positive resolution. That could involve asking readers to compare to a draft post that no one has read, or asking a select hundred people to not read the blog for a month and then come back and take part in the experiment. I'm agnostic about how the experiment is run, just as long as it makes sense. I will update the question criteria accordingly.

predicts NO

@MatthewBarnett if the draft is not considered (by Scott) to be in a publishable state then I object

@MatthewBarnett his old LiveJournal might kinda work for this, since it's hard to access anything from it now