Is the “AI Winter Break Hypothesis” on why ChatGPT got “lazier” true?
Jan 31

I will resolve based on all evidence available by end of January 2024. I will also try replicate myself to verify the claims if possible

@traders i will write some code to test this hypothesis over the weekend. I will share my

code here for inspection and then resolve the market accordingly.

@Soli would have to check previous version through API right?

finally, note that the size of the claimed effects is small enough that I don't think anyone would've noticed it while actually chatting.

Imo there's a big leap between these two things which people don't seem to be separating out

  1. GTP produces statistically significant less output if December is included in the prompt

  2. The reason why is because it learned to do less work over the holidays

1 seems totally plausible to me. Given that people are claiming it and claiming to have reproduced it, I'd bet—without having looked into it much—60%-70% that 1 is true.

But 2 strikes me as unlikely, and there are probably many other reasons why 1 could be true. I'd say around 1-3% for 2.

I think the market should be about 1. 2 is more interesting, but we'd need a lot of evidence to conclude anything about causation which we probably won't get

@JeremiahKellick agree with you - I added some information on how I plan to resolve at the bottom in the comments

Successful reproduction attempt:

@4rthurRainbow you never know with this stuff until you replicate yourself - there were many instances were people shared such claims over Twitter only to be invalidated later

@Soli Could you clarify the resolution criteria? Would you resolve YES if a system prompt ending with "2013-12-07" leads to 5% fewer characters than a system prompt ending with "2013-5-07"? What if the difference disappears if you use other months, or "05" instead of "5"?

@Lorenzo Good question - I did not think deeply about it and would appreciate some input from anyone.

In general, I will resolve this as ‘yes’ if there is strong evidence that when adding the date to the prompt in a standard format without adding anything else, December leads the model to return responses with fewer characters. In general, I want to validate the claim that this actually contributed to making ChatGPT “lazier”, so ideally, the date format is the same one used by OpenAI

What if it doesn't work for January, or it also happens for October?

@Lorenzo the claim being made here is that it contributed to ChatGPT becoming “lazy”. It would be enough if December is worse than October and November

