Tweet embed:
3. Approximately 10 days later, I will release a full chess engine based on GPT4, whose code/prompts anyone will be able to inspect and run against any other chess engine. GPT4's "performant output" will beat every other chess engine in existence in a tournament of any size.
A monthly option resolves YES if kenshin9000 releases code during that month and anyone is able to use it to play a single game of chess against a chess engine. At end of 2024, "[Not by end of 2024]" resolves YES if no month qualifies. Resolves Other if a month is not in the table but would qualify if added.
For this market, it doesn't have to play chess well. It just has to run and always make legal moves. "Always" means "when observed by @Mira in testing this market", not "literally always": So if you find an example on Twitter of it making an illegal move it won't necessarily resolve this market NO; but @Mira getting an illegal move immediately resolves NO.
I won't be studying his code at all, except the bare minimum to get it running. (if needed) So he can include other chess engines, write his own custom engine, call out to GPT-4, etc.
I have said this theory on previous market:
There is a higher chance than kenshin is an AI bot himself, prompted to bait people.
If he is a bot, then it could be a meta puzzle. Instead of waiting him to publish the engine you should make the move directly.
Somebody with twitter account, please, write him "1. e2-e4" and look whether he replies with a move and starts a match.
Honestly, that would be cool if he is the bot he is talking about. Mira would have to reresolve previous markets, because the twitter AI bot was published, even though nobody recognised.
But the biggest chance is: he is so confident, because he is the type of person, which "invents" perpetuum mobile every year.
@PaulMuller It's a linked market. I can't resolve it without resolving the entire market. But, it is guaranteed to resolve NO later.
@Mira so should the March 2024 one as well unless I missed something
I don't understand the point of a market on "will someone release a functioning X" with the resolution criterion "I will test X but not very hard".
I'm fairly certain people have already gotten gpt-4 to play a full game with all legal moves. I could probably do it right now if I played the losing side of the five-move checkmate (forget what it's called). So the only new knowledge that could come from this market is if this proposed engine was actually stress-tested.
@pietrokc You want /Mira/did-kenshin9000-on-twitter-beat-all for the claim people are excited about. This is a derivative market intended as an interest rate hedge for traders, to separate predictions about timing from predictions about his engine's capability.
@Mira But are they 99% confident it will "always make legal moves" as per this market's criteria? That, I think, will be the deciding factor.
@Pykess I haven't read all of his walls of text in detail, but I get the impression he's presenting GPT-4 a list of legal moves and using its analysis as an evaluation function to do a speculative search.
He has 10,000 lines of Python and is using a chess library, so he should be able to make legal moves reliably. But I won't be testing his engine on thousands or millions of moves. I will at most play a single game. If you find an example of it making illegal moves on Twitter, it won't necessarily resolve this market NO.
@Mira In general his text is word salad so it does not have a definite meaning.
One might assume that if cannot write coherent sentences, you probably can't write code either, which would mean he does not have any chess engine at all. Sadly, this is not how it works; many people who cannot write coherent sentences can do fine at other things, including ones that use verbal abilities. So it is quite possible he has a functional chess engine.
However, to the degree his text has any meaning at all, that is pretty clearly not what he is doing. E.g.:
"It will be a set evaluation function, that was iterated on via Reinforcement Learning via LLM. 102 integer parameters is the final version I've chosen."
In other words, the "evaluation function" is a fixed formula, with the integers representing things like mobility, whether pieces are under attack, and so on. He claims that GPT-4 came up with the evaluation function based on what happened in previous games when previous versions of it were used. But since it is static, he is not generating it actively at the moment it is used in any way using GPT-4. Nor is he using GPT-4 to apply the function; it is applied using an actual independent chess engine to query the board state about each parameter. Nor is GPT-4 used to generate lists of moves. So it sounds like GPT-4 is involved at most in two ways:
(1) Supposedly, inventing and updating the evaluation function over time to come up with the current iteration
(2) (Possibly) Choosing a move from a list of moves generated by a program that does tree search and evaluates future positions using that evaluation function. I.e. ChatGPT is choosing a high number from a list of numbers, not exactly an extremely hard task.
There is nothing absurd here except the claim that GPT-4 can invent a better evaluation function than all existing chess engines. And GPT-4 would be actively doing no work during a game except choosing a high number from a list of numbers.