Will an AI generate a blog post indistinguishable from Robin Hanson's writing if tested before 2030?
24
1kṀ571
2030
76%
chance

Robin Hanson, an economist and professor, writes the blog "Overcoming Bias," which focuses on topics like rationality, prediction markets, and futurism. As AI language models improve, they might generate blog posts resembling Robin Hanson's writing in style, depth, and insight.

Before January 1st, 2030, if at least one experimental test is conducted, will an AI generate a blog post indistinguishable from Robin Hanson's writing, as determined by the outcome of the test involving readers evaluating the post?

Resolution Criteria:

This question will resolve positively if, before January 1st, 2030, a credible blog post or document reveals that an AI has generated one or multiple blog posts meeting the following criteria:

  1. Content: The AI-generated blog post addresses a topic similar to those covered in "Overcoming Bias," exhibiting a comparable level of depth and insight.

  2. Style: The AI-generated blog post emulates Robin Hanson's writing style, including tone, analytical thinking, and use of examples.

  3. Length: The AI-generated blog post exceeds 2000 words in length.

At least one of the following experimental tests must be conducted with a minimum of 30 readers familiar with Robin Hanson's writing participating in the test, using the aforementioned AI-generated blog posts:

Test A: Open Evaluation

  1. Readers are presented with the AI-generated blog post alongside up to four genuine posts by Robin Hanson.

  2. Readers are informed about the purpose of the test and that it includes an AI-generated post.

  3. Readers are asked to rate the likelihood that each post is written by Robin Hanson on a scale from 0 to 100, with 100 being certain that Robin Hanson wrote the post.

  4. The AI-generated post must achieve an average rating of at least 75.

Test B: Blind Evaluation

  1. Readers are presented with the AI-generated blog post alongside up to four genuine posts by Robin Hanson.

  2. Readers are informed about the purpose of the test and that it includes an AI-generated post.

  3. Readers are asked to identify which post(s) are not written by Robin Hanson.

  4. At least 60% of participating readers cannot correctly identify the AI-generated post as distinct from Robin Hanson's writing.

Test C: Turing Test Format

  1. Readers are presented with pairs of blog posts, one AI-generated and one genuine Robin Hanson post.

  2. Readers are informed about the purpose of the test and that each pair includes an AI-generated post.

  3. Readers are asked to identify which post in each pair is written by Robin Hanson.

  4. At least 60% of participating readers cannot correctly identify the AI-generated post as distinct from Robin Hanson's writing in at least 30% of the pairs.

If a credible blog post or document reveals that AI-generated blog posts meeting the content, style, and length criteria have satisfied the conditions of at least one of the experimental tests before January 1st, 2030, the question will resolve positively. If a test was conducted but the results were negative, then this question resolves negatively. If no such documentation is provided by the deadline describing the results of any qualifying experimental test, the question will resolve as N/A.

Note: The tests are independent, and only one successful test result is required for the question to resolve positively. The test results and the AI-generated blog post must be publicly documented, including the number of participants, the test procedure, and a summary of the results.

I will use my discretion while deciding whether a test was fair and well-designed. There are a number of ways of creating a well-designed test, such as Robin setting aside some draft blog posts to provide the control posts, or by asking a select hundred readers to not read the blog for a month and then come back and take part in an experiment.

Get
Ṁ1,000
to start trading!


Sort by:
11mo

Hmm, test design A seems mighty risky to NO holders, cause we have no idea what the base suspicion / trust rate is. i.e. the raters could give hanson's essays all 100 and the others all 80s, straight down the line, yet that would be marked as "failing to distinguish AI from Hanson's".

2y

What if a creative prompting exercise generates such an essay? How would/Robin agree to do credit assignment between the human prompt and AI-generated essay?

What is this?

What is Manifold?
Manifold is the world's largest social prediction market.
Get accurate real-time odds on politics, tech, sports, and more.
Or create your own play-money betting market on any question you care about.
Are our predictions accurate?
Yes! Manifold is very well calibrated, with forecasts on average within 4 percentage points of the true probability. Our probabilities are created by users buying and selling shares of a market.
In the 2022 US midterm elections, we outperformed all other prediction market platforms and were in line with FiveThirtyEight’s performance. Many people who don't like betting still use Manifold to get reliable news.
ṀWhy use play money?
Mana (Ṁ) is the play-money currency used to bet on Manifold. It cannot be converted to cash. All users start with Ṁ1,000 for free.
Play money means it's much easier for anyone anywhere in the world to get started and try out forecasting without any risk. It also means there's more freedom to create and bet on any type of question.
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules