Is the “AI Winter Break Hypothesis” on why ChatGPT got “lazier” true?
Mini
59
5.4k
resolved Jun 20
Resolved
N/A

I will resolve based on all evidence available by end of January 2024. I will also try replicate myself to verify the claims if possible

Get Ṁ600 play money
Sort by:

i don’t have the energy to resolve this market, can we n/a or someone else takes ownership? @mods

Just checking, by your own criteria you'd have to run an experiment comparing versions of the openai api to resolve this? I'm pretty sure this resolves no, but I don't think anyone's going to do that so we should probably N/A this in that case

I for one am not in a hurry and am down to wait another year for Soli to get the energy, but I'm not a top shareholder here.

Resolving N/A - no mods want to take ownership, resolving would require some complex effort that isn't going to happen

resolve no?

@jacksonpolack i was being a bit lazy - will resolve sooooon

@Soli ping

@traders i will write some code to test this hypothesis over the weekend. I will share my

code here for inspection and then resolve the market accordingly.

@Soli would have to check previous version through API right?

@CalebW 💯

tweet

and failed replication

finally, note that the size of the claimed effects is small enough that I don't think anyone would've noticed it while actually chatting.

(super low confidence all)

Imo there's a big leap between these two things which people don't seem to be separating out

  1. GTP produces statistically significant less output if December is included in the prompt

  2. The reason why is because it learned to do less work over the holidays

1 seems totally plausible to me. Given that people are claiming it and claiming to have reproduced it, I'd bet—without having looked into it much—60%-70% that 1 is true.

But 2 strikes me as unlikely, and there are probably many other reasons why 1 could be true. I'd say around 1-3% for 2.

I think the market should be about 1. 2 is more interesting, but we'd need a lot of evidence to conclude anything about causation which we probably won't get

@JeremiahKellick agree with you - I added some information on how I plan to resolve at the bottom in the comments

Successful reproduction attempt:

@4rthurRainbow you never know with this stuff until you replicate yourself - there were many instances were people shared such claims over Twitter only to be invalidated later

Can you also make a poll and post it here?

@firstuserhere I voted “🤷🏻‍♂️ - I do not know”

interesting that everyone who bet till now thinks it is not true

@Soli Could you clarify the resolution criteria? Would you resolve YES if a system prompt ending with "2013-12-07" leads to 5% fewer characters than a system prompt ending with "2013-5-07"? What if the difference disappears if you use other months, or "05" instead of "5"?

@Lorenzo Good question - I did not think deeply about it and would appreciate some input from anyone.

In general, I will resolve this as ‘yes’ if there is strong evidence that when adding the date to the prompt in a standard format without adding anything else, December leads the model to return responses with fewer characters. In general, I want to validate the claim that this actually contributed to making ChatGPT “lazier”, so ideally, the date format is the same one used by OpenAI

predicted NO

What if it doesn't work for January, or it also happens for October?

@Lorenzo the claim being made here is that it contributed to ChatGPT becoming “lazy”. It would be enough if December is worse than October and November