Is the “AI Winter Break Hypothesis” on why ChatGPT got “lazier” true?
Jan 31

I will resolve based on all evidence available by end of January 2024. I will also try replicate myself to verify the claims if possible

Get Ṁ600 play money
Sort by:

resolve no?

@jacksonpolack i was being a bit lazy - will resolve sooooon

@Soli ping

@traders i will write some code to test this hypothesis over the weekend. I will share my

code here for inspection and then resolve the market accordingly.

@Soli would have to check previous version through API right?

@CalebW 💯

bought Ṁ100 of NO


and failed replication

finally, note that the size of the claimed effects is small enough that I don't think anyone would've noticed it while actually chatting.

(super low confidence all)

Imo there's a big leap between these two things which people don't seem to be separating out

  1. GTP produces statistically significant less output if December is included in the prompt

  2. The reason why is because it learned to do less work over the holidays

1 seems totally plausible to me. Given that people are claiming it and claiming to have reproduced it, I'd bet—without having looked into it much—60%-70% that 1 is true.

But 2 strikes me as unlikely, and there are probably many other reasons why 1 could be true. I'd say around 1-3% for 2.

I think the market should be about 1. 2 is more interesting, but we'd need a lot of evidence to conclude anything about causation which we probably won't get

@JeremiahKellick agree with you - I added some information on how I plan to resolve at the bottom in the comments

bought Ṁ20 of YES

Successful reproduction attempt:

@4rthurRainbow you never know with this stuff until you replicate yourself - there were many instances were people shared such claims over Twitter only to be invalidated later

sold Ṁ105 of NO

Can you also make a poll and post it here?

@firstuserhere I voted “🤷🏻‍♂️ - I do not know”

interesting that everyone who bet till now thinks it is not true

bought Ṁ100 of NO

@Soli Could you clarify the resolution criteria? Would you resolve YES if a system prompt ending with "2013-12-07" leads to 5% fewer characters than a system prompt ending with "2013-5-07"? What if the difference disappears if you use other months, or "05" instead of "5"?

@Lorenzo Good question - I did not think deeply about it and would appreciate some input from anyone.

In general, I will resolve this as ‘yes’ if there is strong evidence that when adding the date to the prompt in a standard format without adding anything else, December leads the model to return responses with fewer characters. In general, I want to validate the claim that this actually contributed to making ChatGPT “lazier”, so ideally, the date format is the same one used by OpenAI

predicts NO

What if it doesn't work for January, or it also happens for October?

@Lorenzo the claim being made here is that it contributed to ChatGPT becoming “lazy”. It would be enough if December is worse than October and November

More related questions