Did kenshin9000 beat the Stockfish chess engine with GPT-4 by end of January?
7
190Ṁ515
resolved Jan 30
Resolved
NO

Resolves YES if kenshin9000 releases code and it decisively beats Stockfish (ver.16). NO if it's end of January and such superiority hasn't been demonstrated.

For this market, it has to run without calling chess engines (or equivalent); game mechanical support libraries like python-chess are allowed, but only to a degree when neither move suggestions nor evaluations are taken from anywhere but GPT-4.

The reference Stockfish opponent shall play under CCRL 40/15 testing conditions, on 1 CPU with 256 MB hash size, but without endgame tablebases:
https://www.computerchess.org.uk/ccrl/4040/about.html

Superiority is defined as either 55% or +2 net wins, whichever is higher, in a set of games with at least 10 decisive (i.e. non-draw) result.

NOTE: criteria to be revised, for some reasonable amount of real money to be spent on verification

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ34
2Ṁ4
3Ṁ2
4Ṁ1
5Ṁ0
Sort by:

Note that kenshin9000_ is now claiming to have reached "just about 3800" Elo with with Llama2-70B, and expects ~3900 vs SF16 with GPT4 (but also he had failed to deliver anything yet)

See my post in a sister market, with some hilarious background info.

NOTE I may have to revise the resolution criteria, as kenshin9000_ is now quoting excessively high cost of running his engine code (>$50/game!). Unfortunately, given the vagueness of his proclamations, this cannot yet be pinned down. I'll be glad to receive suggestions from market participants!

An interesting tidbit from the latest xeet by kenshin9000_:
"The evaluation function I currently have was updated through ~11000 games "

This really is a miniscule training set, especially for such a contraption as the supposedly NLP-assisted weight determination his scheme ostensibly uses. For comparison, the main repository of test games for Stockfish has close to 7 billion (yes, with a B) games.

Comment hidden
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules