Can my friend tell GPT-4 from respected bloggers? (Lady tasting tea experiment)

570Ṁ15k

resolved May 10

Resolved

ALL

The rules, according to my friend: "You pick posts from a few respected blogs from before 2022 (no cherry-picking) and also generate GPT-4 blogs (cherry-picking allowed). Say 4 of each. I bet I can pick out which are which given a decent length sample."

Resolves YES if he can tell them apart correctly (at the 1.4% significance level), NO if the null hypothesis is not rejected.

Science

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ641
2		Ṁ63
3		Ṁ59
4		Ṁ50
5		Ṁ35

People are also trading

Will GPT-5 like to delve?

34% chance

Will it appear that the openai.com GPT-4.5 turbo blogpost that appeared in DuckDuckGo results was real?

5% chance

Will I be impressed by GPT-5?

67% chance

Is GPT-5 a mixture of experts?

79% chance

Will multiple credible sources claim GPT-5 is sentient?

17% chance

Will GPT-5 be more competent than me in my area of expertise?

44% chance

Is GPT4 sentient?

12% chance

Will GPT-5 be able to be a decent co-author?

79% chance

Will LLMs such as GPT-4 be seen as at most just a part of the solution to AGI? (Gary Marcus GPT-4 prediction #7)

91% chance

What made gpt2-chatbot smarter (than GPT4)?

21 Comments

27 Holders

50 Trades

Sort by:

predictedYES

Damn. I was decently confident I could tell the difference but did no better than chance. I do notice a few clues in retrospect but some of GPT-4’s generations were shockingly good. (And some human posts were a little underwhelming.)

predictedYES

what were the GPT4 posts and real posts?

@jacksonpolack andrew.fi/ltt-gpt4.zip

predictedYES

not getting my ip

predictedYES

I don't see where it says which were gpt4 or real. does manifold have spoiler tags? it'd be fun for us to post our guesses

predictedYES

I just blind guessed and got all of them right (confirmed with google)

predictedYES

(obviously within the last 20 minutes, not within the last two minutes)

don't post the answers w/o a spoiler so other people can do it.

also, how many was he off by?

predictedNO

@jacksonpolack answers (obviously spoils the whole thing): https://pastebin.com/3zVtkjFs

predictedYES

@jacksonpolack Two right and two wrong. So no better than random chance 😔

predictedYES

!!Indirect spoilers!!: Even having read them, still a bit surprised you were tricked - the writing styles are very clearly different - the GPT-4 ones have very smooth flow to the words (not in a good way) with lots of connecting clauses and unnecessary topic sentences and similar, whereas real ones have more awkward phrasings and points that are at least a little bit hard to understand because the author cares more about communicating something than words flowing. The non gpt4 blogs are the densest - they have interesting ideas and pack more content into sentences and paragraphs. I didn't pick up on who GPT was supposed to imitate at all. The least generic GPT post was <censored> but precisely because it was less generic it made several meaningful mistakes.

I wonder how much of that is whatever finetuning/rlhf/... openai did on top of the base model though, especially the writing style parts.

If possible, could you give an example of a respected blog post that may or may not be used?

Are the blogs you are picking/generating about topics your friend has decent (70th percentile of western educated people or above) knowledge of?

For 8 trials, to get p<0.05 you'd need to get 7/8 correct. Based on that I think this is slightly high.

@DanMan314 oops, good point, I should've used the actual number, which is the 1.4% significance level. I've updated the description to match.

@AndrewG So it requires all 8/8?

@DanMan314 yeah

@AndrewG 7/8 is not a possible guess since they know it's 4 and 4 ?

predictedNO

@Odoacre Depends on the guesser's utility function. They could want to hedge even knowing it guarantees not getting them all right.

Is/was the test using ChatGPT, or the GPT-4 API? I'm guessing the former, since it doesn't specify. I'd expect ChatGPT-4 to be worse at imitating blog posts (the "assistant" character tends to leak into the generated text) than the API-based model with a decent system prompt.