
Resolves YES if kenshin9000 releases code and it decisively beats Stockfish (ver.16). NO if it's end of January and such superiority hasn't been demonstrated.
For this market, it has to run without calling chess engines (or equivalent); game mechanical support libraries like python-chess are allowed, but only to a degree when neither move suggestions nor evaluations are taken from anywhere but GPT-4.
The reference Stockfish opponent shall play under CCRL 40/15 testing conditions, on 1 CPU with 256 MB hash size, but without endgame tablebases:
https://www.computerchess.org.uk/ccrl/4040/about.html
Superiority is defined as either 55% or +2 net wins, whichever is higher, in a set of games with at least 10 decisive (i.e. non-draw) result.
NOTE: criteria to be revised, for some reasonable amount of real money to be spent on verification
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ34 | |
2 | Ṁ4 | |
3 | Ṁ2 | |
4 | Ṁ1 | |
5 | Ṁ0 |
People are also trading
Note that kenshin9000_ is now claiming to have reached "just about 3800" Elo with with Llama2-70B, and expects ~3900 vs SF16 with GPT4 (but also he had failed to deliver anything yet)
An interesting tidbit from the latest xeet by kenshin9000_:
"The evaluation function I currently have was updated through ~11000 games "
This really is a miniscule training set, especially for such a contraption as the supposedly NLP-assisted weight determination his scheme ostensibly uses. For comparison, the main repository of test games for Stockfish has close to 7 billion (yes, with a B) games.