
I will resolve based on all evidence available by end of January 2024. I will also try replicate myself to verify the claims if possible
Just checking, by your own criteria you'd have to run an experiment comparing versions of the openai api to resolve this? I'm pretty sure this resolves no, but I don't think anyone's going to do that so we should probably N/A this in that case
@traders i will write some code to test this hypothesis over the weekend. I will share my
code here for inspection and then resolve the market accordingly.
finally, note that the size of the claimed effects is small enough that I don't think anyone would've noticed it while actually chatting.
(super low confidence all)
Imo there's a big leap between these two things which people don't seem to be separating out
GTP produces statistically significant less output if December is included in the prompt
The reason why is because it learned to do less work over the holidays
1 seems totally plausible to me. Given that people are claiming it and claiming to have reproduced it, I'd bet—without having looked into it much—60%-70% that 1 is true.
But 2 strikes me as unlikely, and there are probably many other reasons why 1 could be true. I'd say around 1-3% for 2.
I think the market should be about 1. 2 is more interesting, but we'd need a lot of evidence to conclude anything about causation which we probably won't get
@JeremiahKellick agree with you - I added some information on how I plan to resolve at the bottom in the comments
@4rthurRainbow you never know with this stuff until you replicate yourself - there were many instances were people shared such claims over Twitter only to be invalidated later
@Soli Could you clarify the resolution criteria? Would you resolve YES if a system prompt ending with "2013-12-07" leads to 5% fewer characters than a system prompt ending with "2013-5-07"? What if the difference disappears if you use other months, or "05" instead of "5"?
@Lorenzo Good question - I did not think deeply about it and would appreciate some input from anyone.
In general, I will resolve this as ‘yes’ if there is strong evidence that when adding the date to the prompt in a standard format without adding anything else, December leads the model to return responses with fewer characters. In general, I want to validate the claim that this actually contributed to making ChatGPT “lazier”, so ideally, the date format is the same one used by OpenAI
@Lorenzo the claim being made here is that it contributed to ChatGPT becoming “lazy”. It would be enough if December is worse than October and November