Very clear two week timeframe. So, do you believe him?
Criteria for "open source": the code does not need to be runnable (it may be intertangled with their infrastructure), but does need to completely include how all recommendations are made. If a model is used, the model must be included (i.e., the code can't just use a remote blackbox model that automagically recommends tweets, and that counts as "open source all code").
If Twitter releases this before or on March 31, market resolves to YES. Otherwise, market resolves to NO.
I have decided to resolve as N/A. Fundamentally, this is a market predicting whether Elon's commitment was true, and whether the market resolution criteria that I posted were met. I could have done much better in clear criteria, and minimally provided more hedging if I wanted to resolve in the "spirit of the commitment" direction. Several of my clarifications actually ended up not clarifying much, based on what actually happened. Sigh.
Deciding criteria:
How much to weigh the exact phrase "all code"? As in, being reasonable, could this market ever have been resolved as YES, or would this criteria require nit picking anything that's missing?
Did Elon and Twitter fully intend on open sourcing the recommendations algorithm? Did they believe they actually open sourced everything?
Is either side more surprising to an external layperson's understanding of this market? I want to resolve fairly, where "fairly" minimizes surprise based on what actually happened.
Are there missing components that should be considered important and key to how recommendations work?
I'm torn about (1), because there is a very reasonable interpretation that "all code" is not even possible, so the resolution would necessarily need to leave room for intention. However, nowhere in the market did I or anyone else discuss abuse/spam, so while it's reasonable for Twitter to not include this, it's also reasonable for a market participant to vote NO assuming they'll need to keep abuse/spam signals private. However, at its limit, that does assume an interpretation that would highly likely never be met, despite Twitter's intentions, due to practical considerations. If I were pressed, slight favor for NO here.
For (2), Elon knows it's not everything. He's known for puffery, yet has mentioned twice that it's not everything ("Most of the recommendation algorithm", "This is most of the recommendation code. In the coming weeks, we will open source literally everything that contributes to showing a tweet"). However, we should also look at the actions at happened. Hundreds of thousands of lines of code were open sourced, and there are clear signs that a lot of effort went in to making this happen. For example, in production, it appears some heuristic thresholds are read from config files, and they ported them to hardcoded variables in the public version. So while it does seem they truly intended to open source what they could, it is evident that they know it's not everything. This also slightly favors NO.
For (3), from reading the discourse on Twitter and tech news, most people indeed do think Twitter "open sourced the recommendation algorithm". However, this is difficult to quantify, the discourse in general is pretty garbage, and I did find several people who's first reaction was that Elon didn't keep the commitment. I believe this leans slightly towards YES, but I wasn't able to get a strong signal.
(4) would have made this the easiest, because if very clear gaps were found, this is a clear NO. I'll note that much of the public discussion has been flat wrong — people have jumped on a "current thing" without having the chance to understand what the code actually is. To review the hundreds of thousands of lines of code would take a lot of time, so we can safely assume that no one yet understands how it works and whether it's complete, especially since it doesn't compile (it's missing some utility classes, such as MostRecentCombinedUserSnapshotSource). One option I considered was to wait longer to see if clear obvious gaps would be found, but that holds up people's M$.
Uncontroversially, abuse/spam is incomplete. I tried to find evidence of other gaps to make the (4) resolution criteria more clear.
The largest gap that I found was in the ML repo. The features are listed in a markdown readme, but are not directly specified in the model, because the data is read from a remote database. They provided a script to generate random data, but the provided schema seems incomplete (note how only a few features are listed). However, in this market's criteria, I did leave room for private data to be used, and we can infer what a lot of the data would be if we had access.
I won't spend weeks on this, so picked one non-obvious feature of recommendations to see what I found find: how Tweets with links are de-boosted. This has been shared often, but (a) applies to EarlyBird only, not timeline model, (b) doesn't have any clear weight for posts with links (is it urlParams
? if so, it's definitely not clear how that's set or used elsewhere). However, I found what seems to be the primary contribution here, which is fairly easy to read and follow. (Fun fact: super short tweets can basically never go viral due to the length filter. Interesting.)
Could I have resolved this as NO? Very much. The abuse/spam issue still nags me, and I empathize with the conservative interpretation of the market. If this was an empty gesture, however, I find it hard to believe that we'd find so much in the repository. Yes, the ML repo is hard to follow and uses their internal BigQuery database. However, when people misinterpreted the "is_republican" / "is_elon" / etc to be evidence of algorithmic bias, Elon's reaction was one of embarrassment. He wanted everything out there.
Could I have reasonably resolved this as YES? Yes. Twitter came incredibly close (surprising to me!), and I expect that most NO holders did not assume that Twitter would open source so much and meet the deadline. To resolve as YES, I would have wanted to not see hedging from Elon about the completeness, and significantly more completeness in the ML repo. Counterfactually, a YES could have been possible if I had been more clear in the resolution criteria about the possibility of "intent".
I did hedge quite a bit ("sufficiently", "mildly sophisticated layperson"), but also used absolutes ("completely", "everything", "all code"). My expectation was that Twitter would either obviously not meet the commitment, or would. Buried in that is at least an acknowledgement towards being able to meet the commitment as practically possible, which the most extreme interpretation of "all code" mostly prohibits. After having talked to several other software engineers, I've heard "they obviously did it, I'm surprised" and "this is obviously missing important parts". Since it's been several days with no smoking gun (modulo the abuse/spam stuff, which I cover above), I decided the safest thing to do here is resolve N/A.
Finally—and you'll just have to trust me here—I consulted a friend who is still working for Twitter. I sent this market to them, and they leaned towards a N/A resolution. There was a clear mandate from Elon to open source the algorithm, and earnest effort was made to meet this goal. Anything left out was either due to critical concern for the platform or inability to separate code from their infrastructure fast enough. And they do acknowledge that due to this, it's not "all code", so YES feels wrong. I have known this person for many years and they don't have a particular stake in this either way.
Apologies if anyone feels this resolution is disappointing. Hopefully my intention for a fair resolution is clear, and I've most definitely learned a lot about how to approach subjective markets like this in the future. If you feel you should have won, reach out to me on Discord (@acon#3721) and I'll try to make it right.
@andrew Regardless of details, I just want to applaud and thank you for how hard you've worked on resolving this market fairly and thoughtfully.
It's most definitely not an April Fool's joke — it's about as earnest as it could be.
I intend on resolving later today after I chat with a Twitter employee who agreed to give me some more information. I've oscillated between YES and NO, but @Conflux's suggestion for N/A (as disappointing as it is) may be the most fair for all involved. If I do decide YES or NO, I will offer to refund people who disagree with my thought and decision process. I've learned a lot about clear market criteria.
Issues indicating this might be an April Fool's joke:
https://github.com/twitter/the-algorithm/issues/690
https://github.com/twitter/the-algorithm-ml/issues/34
An article going over some of the jokes inside:
https://arstechnica.com/tech-policy/2023/03/twitter-posts-the-code-it-claims-determines-which-tweets-people-see-and-why/
Posted algorithm code includes "is_democrat," "is_republican," and "is_elon."
@YonatanCale You can argue whether the released code satisfies the question, but it is definitely not a joke
@NunoBalbona
What do you mean by "not a joke"? Do you think this is the code they run? Or are you saying it's not funny?
@YonatanCale I commented on it when it had just come out. I think there’s very important parts missing, but I think they do run the released code yes
Thanks for everyone’s patience, and for good faith discussion below. First thing: I take responsibility for any ambiguities in resolution criteria, where people earnestly took positions with very different assumptions about the outcome.
I spent a good amount of time in the code yesterday to find clear gaps. For example, it’s well known that including a link in a tweet limits it’s reach significantly, but I haven’t yet found where that happens.
A lot of the public discussion elsewhere is fairly misinformed. My goal is to minimize the gap between “spirit” and “letter” of his commitment since both interpretations seem reasonable — finding a significant gap in the code would help. Hang tight.
@andrew Are you currently considering an N/A resolution? I think that's often appropriate for cases like this where there's a spirit-letter gap: it is appropriate to cancel a market if it was underspecified to begin with.
Some NO bettors said they were specifically betting on the scenario of most but not all of the code being released, while YES bettors bet on the fact that Elon actually followed through and released the essential parts of the code! I don't think either of these groups deserve to lose their investment.
I know N/A resolutions are sometimes a bit underwhelming, appreciate your attempts to figure out exactly whether the current situation is more YES or NO, and won't be mad if you select either of those resolutions, but I think N/A is the fairest option here. Manifold has it for a reason.
@ShaiNatapov Sure, this is definitely more code than I expected! And it's super interesting. That's just not what the question is about :)
https://twitter.com/elonmusk/status/1642085053504028673 elon implies this is not all of the recommendation code
@parafactual I strongly believe that resolution should stick to the resolution criteria specified (i.e., whether "all code" is included), rather than weaker criteria such as whether all "important" code was shared or whether there was a "genuine effort." I think sticking to the letter of resolution criteria is extremely important for reliable, well-calibrated prediction markets.
Also note that this is the second time that Elon has explicitly tweeted that this is only "most" of the code and that the rest will follow. Not even he claims they released all of it. https://twitter.com/elonmusk/status/1641874582473695246
@JacyAnthis i agree. also, i had a situation like this in mind when i betted on NO. given that i assigned a lot of probability to an outcome like this, expecting it to resolve to NO, i would feel pretty betrayed if it resolved positively
@parafactual a central member of the internal category i had in mind might be "most recommendation code is open-sourced, but people manage to find leaked code that's unambiguously recommendation code, but was not put in the repository (presumably forgotten). therefore, NO". this isn't exactly that, but i think is a member of the category
@parafactual I'm maybe halfway there. I was very tempted to bet more on NO, especially with the spike up to ~72% price (!), but I held back because of the issues with incorrect resolutions on Manifold.
@parafactual also, upon rereading the market, i'm just confused about the resolution criteria now tbh
@parafactual The one I was grumpy about was this market resolving YES because Jimmy Kimmel was danced off stage (which somehow counted as "fake/joke violence"), but I'm not well-informed. If you search Manifold for "incorrect resolution," you can find a bunch of meta-markets on it (e.g., https://manifold.markets/IsaacKing/will-manifold-fix-the-problems-arou).
@parafactual Oh, here's a current market that I think very clearly should have already resolved YES, but it seems the market creator is arguing that the Meta layoff announcement was an announcement of an announcement instead of an actual announcement. https://manifold.markets/TheSkeward/will-there-be-more-meta-layoffs-by