Tweet embed:
3. Approximately 10 days later, I will release a full chess engine based on GPT4, whose code/prompts anyone will be able to inspect and run against any other chess engine. GPT4's "performant output" will beat every other chess engine in existence in a tournament of any size.
Resolves YES if kenshin9000 releases code and anyone is able to use it to play a single game of chess against a chess engine. NO if it's end of January(PST) and he still hasn't released any code.
For this market, it doesn't have to play chess well. It just has to run and always make legal moves.
I won't be studying his code at all, except the bare minimum to get it running. (if needed) So he can include other chess engines, write his own custom engine, call out to GPT-4, etc.
This is very unlikely to resolve YES, but still much more likely than What modalities will GPT 4.5 have? (January 2024) | Manifold
Correctly represented by the prices.
Note that kenshin9000_ is now claiming to have reached "just about 3800" Elo with with Llama2-70B, and expects ~3900 vs SF16 with GPT4 (but also he had failed to deliver anything yet)
update from https://nitter.net/kenshin9000_/status/1740776282382643374 :
I was going to state that I would do this by January 15 now, but no, I need more time, and will give myself ample time, to January 30. Unless WW3 starts, god forbid, I will post my engine then. I am working on this in my free time, and have other responsibilities, and considering the intense interest, I need to be absolutely certain that the software works consistently
@duck_master note that the key part of this humungous wall of text being:
"I am delaying ALL Chess related results to a single thread to be posted", with no mention of the results having been promised by Dec 25 (while also lying about the code release date, which had been "approximately" Jan 4).
I played the chess engine at ParrotChess - Can you beat a stochastic parrot?
I won, first try, I haven't played in years, and I wasn't really putting thought into the moves, either. I moved close to the speed it did.
That makes me skeptical the supposed 1700-1800 engines would actually get that rating if they played humans in tournaments.
@DavidBolin I ran a fairly comprehensive evaluation against Stockfish opponents, and the results indicated a strength close to 1700 Elo on the CCRL scale. That is supposed to be close the the human Elo, although how close is an open question actually.
So no, those engines would not get the widely proclaimed 1800, but may be likely around 1700. That is still a decent rating, but nothing suggesting the supposed AGI level. You might be at 1900, say, which means easy wins most of the time.
@Zozo001CoN You need a large sample of games to get an accurate Elo estimation. To get a +- 20 error you probably need up to 1000 games. Because this cannot be automated, nobody is willing to try this.
Nevertheless, my chess engine (~2500) needs as little as 0.03 seconds per move to win.
@sscg13 but ofc this (measuring engine Elo) CAN be automated, if said engine were to be actually released.
The real problem is that it may be too expensive to actually do it.
"That said, an inference time "concept-search" algorithm can improve the input tokens so that "cross-concept" information transfer is done on both the most applicable "developed" "concepts", as well as with the most applicable "anchor points". This is something I have been working on, and it indeed does work, with room for a lot of improvement."
Writing word salad is direct evidence they will not accomplish whatever they say they will accomplish in that connection.
I do hate to be the one always asking for clarifications, but here we go.
What does it mean, specifically, that he releases a functioning engine?
My concern is this: his code can be a minimally modified clone of any GPT-based bots already available. Or simply a Python wrapper that combines GPT prompting with pychess library (perhaps even pulling in Stockfish to boot). Would that count?
With regard to "make legal moves": would any illegal ones make for a "NO" resolution? And what is the judgement if his code filters out illegal moves (which are happening regularly with GPT-bots) by using a pre-existing library?
@Zozo001CoN I won't be studying his code at all, except the bare minimum to get it running. So he can filter moves internally, call out to chess engines, etc. Anything he likes as long as I can get whatever he publishes to play a game of chess.
If it generates an illegal move, this resolves NO.
A "functioning engine" is anything that always generates legal chess moves that I can either input into Stockfish or that communicates via a standard chess protocol.
@Mira , thanks for the clarification.
I would not bet on the question as posed, then. From my POV this leaves too much room for cheating by him. As a matter of fun fact, it seems to me quite easy to make ChatGPT itself write code that would satisfy these rules, given the ready availability of open source chess software with the capability (both real engines and interfaces that enforce legal moves).
EDIT: instead of "seems easy", above, I should have written "is trivial"; see https://sl.bing.net/e75h63f8Ztk
@DavidBolin > That's true but presumably he does not care about this market.
My point had nothing to with his caring about the market. He likely cares about getting embarassed about his xeet. The real question is whether he chooses to release something that is non-functional, or something that has borrowed functionality from existing code. Or just admit that his grandiose claim was unrealistic, thus refrain from releasing anything.
@Zozo001CoN He won't be embarrassed if he says nothing and does nothing. That is normally what they do whey predict something and fail to do it.
Also, according to some people, he has been claiming he was going to release the chess thing for a while, without doing it.
@DavidBolin , I see your point, and I agree that whimpering out is the default behavior to be expected. But in this case he made the very specific claim to post "Chess results" from his magically all-powerful code by Dec 25 (Dec 11 +2 weeks), then release a chess engine "Approximately 10 days later".
Since this xeet-storm had a lot of replies already, quietly backing out would be considerably embarassing.
@Zozo001CoN (2) Christopher Bouzy (spoutible.com/cbouzy) (@cbouzy) / X (twitter.com) promised to delete his Twitter account in 2022 if the Democrats did not gain 2 or more Senate seats. They gained 1. He has 300k followers so presumably a lot of people saw him say that -- and he insisted at the time that he was absolutely serious.
He did not delete his account, and never even mentioned it again.
@DavidBolin like I said I see your point, and I agree with regard to the general tendency of xeeters quietly dropping failed claims. But, again, this specific case feels different. In this domain it is just too easy to release something, anything, that appears to deliver the proclaimed results RealSoonNow(TM).
Note also that kenshin9000_ (formerly known as kenshinsamurai9) emanates strong perpetuum mobile inventor vibes. This type of person is more likely to relentlessly carry on than to ever admit being wrong.
@Zozo001CoN I found this from June:
Tic Tac Toe 6 (openai.com)
This is utterly ridiculous. If he plans to do anything similar with chess, it will involve explaining the value of every move at every point -- which means he absolutely will need a separate chess engine to do that for him.
@DavidBolin what I gather from his latest verbiage is that chess engines would be used to optimize a GPT-based evaluation function, which in his mind would lead to a superior engine. This ofc ignores the minor problem of accuracy in predicting future performance from past results.
he absolutely will need a separate chess engine to do that for him
I mean yes, in actual reality. But he appears to think the "auto-regressive" engine can eventually pull itself up by its own bootstraps a la Munchausen.
@Zozo001CoN Have you noticed that he has not released the games he said he would release by Christmas?
It is becoming more probable he will release absolutely nothing by the end of January, as I suggested.
@DavidBolin thanks for the note - I had not checked yet. Given that we're still close to the holiday, he might still be considered on a grace period for the Dec11 + 2 weeks supposed posting date. We'll see when we do, or not, I guess.