How fast things change! Claude 3.5 Sonnet is clearly best right now. Will that change again soon?
Resolves to YES if Zvi uses Claude 3.5 Sonnet or a future Claude 3.5 Opus or Haiku for the majority of his chat LLM queries each month of July, August, September and October.
Resolves to NO if this is not the case for at least one month.
API calls do not count since there might be reason to go cheap or open weights.
https://substack.com/app-link/post?publication_id=573100&post_id=146844015&utm_source=substack&utm_medium=email&utm_content=share&utm_campaign=email-share&action=share&triggerShare=true&isFreemail=true&r=1kvzcb&token=eyJ1c2VyX2lkIjo5NTU1MDYzNSwicG9zdF9pZCI6MTQ2ODQ0MDE1LCJpYXQiOjE3MjE2NTE4NDYsImV4cCI6MTcyNDI0Mzg0NiwiaXNzIjoicHViLTU3MzEwMCIsInN1YiI6InBvc3QtcmVhY3Rpb24ifQ.HNOKFvq9-nrjXkqDG4yuoAA4H196T7DjpvTHhF0wtb8
Claude mentioned a bunch in this latest post:
> Claude’s median estimate is roughly 1,000 people died due to the outage, when given the hypothetical scenario of an update with this bug being pushed and no other info.
Where Claude got it wrong is it expected a 50%+ drop in share price for CrowdStrike. We should be curious why this did not happen. When told it was 11%, Claude came up with many creative potential explanations, and predicted that this small a drop would become an object of future study.