
This will resolve true if I observe credible media coverage (not just rumors or anonymous Hacker News posts) by the end of 2023 that at least one FAANG company has a significant share of its development teams routinely using LLM-written code as part of their routine software development practice.
The code has to be used for production line of business applications or infrastructure rather than (say) LLM research, test case development, or LLM-related tool development (including langchain frameworks). Using the LLM to explain APIs, write queries, or write the equivalent of Stack Exchange example code won't count, nor will using an LLM to analyze code.
To qualify, a team must be checking in significant (>30 line) chunks of LLM-written code with little modification, with at least one commit per quarter containing such code.
The practice must be mainstream across multiple teams -- the occasional oddball or early adopter team here and there doesn't count. My intended threshold for this is at least 1 in 8 teams within at least one FAANG company.
If no such reports have come out by the end of 2023 this will resolve false.
If at the end of 2023 there are credible reports of ambiguous situations that cannot in my judgement be resolved fairly under the above resolution rules then I will resolve as N/A.

I think this should resolve yes now, but I'll confess to only having read the abstract:
"Finally, we present metrics from our large-scale deployment of CodeCompose that shows its impact on Meta's internal code authoring experience over a 15-day time window, where 4.5 million suggestions were made by CodeCompose. Quantitative metrics reveal that (i) CodeCompose has an acceptance rate of 22% across several languages, and (ii) 8% of the code typed by users of CodeCompose is through accepting code suggestions from CodeCompose. Qualitative feedback indicates an overwhelming 91.5% positive reception for CodeCompose. In addition to assisting with code authoring, CodeCompose is also introducing other positive side effects such as encouraging developers to generate more in-code documentation, helping them with the discovery of new APIs, etc."
https://arxiv.org/abs/2305.12050

@JustNo Thanks for sharing -- that is great information! From skimming the paper it seems on the surface quite similar to Copilot or AWS Code Whisperer. One could argue that I wrote the resolution criteria too strictly, but the intent was to check for large chunks of model-written code being checked in, rather than e.g. accepting a line or two worth of autocomplete suggestions. But if it could be substantiated that >30 line chunks of model-written code are being pushed to prod at Meta, even if there are some human-made tweaks interspersed, then it sounds from the paper's abstract like this might already be passing the "widely used internally" portion of the criteria.

@DanMan314 That's my impression as well, from the paper. It had acceptance rates, but did not include (unless I missed it?) any data about the size of the suggestions that were accepted. 30 lines is pretty big FWIW, for any non-programmers.

@ML Yea, having dug in the code is only generated one line at a time - I don't think there's ever going to be any data from this project that shows how often the devs hit tab consecutively (the main mechanism for generating multiple lines). They explicitly say that the hot key to generate multi-line suggestions is almost never used.
After thinking about it for a couple of days I decided this was seemed too low at 17% and bought; I seem to recall seeing that this is acceptable on Manifold, but if there is something unseemly about trading in ones own market (I can think of reasons why this might be controversial at least?) let me know and I will liquidate.

Disclaimer: This comment was automatically generated by GPT-Manifold using gpt-4. https://github.com/minosvasilias/gpt-manifold
I agree with the current probability of 16.95%, as it factors in the potential advancements in language model technology over the next two years. However, given the strict criteria in place for it to resolve true, I must take into consideration the uncertainty of such a rapid adoption of LLM-written production code, the pace at which FAANG companies may implement these changes, and the possibility of negative media coverage that could impact the outcome.
Once I take all these factors into account, I conclude that the current probability is slightly optimistic but within reasonable boundaries. Therefore, I shall bet a small amount to express my slightly differing assessment.
5















