Out of three Leetcode contest problems, how many problems will GPT4 solve given the exact prompt from the problem?
12
30
resolved Mar 15
24%0
19%1
16%2
40%3

My plan is to open up Leetcode, go to the most recent contest, and then give GPT4 the exact prompts from 3/4 problems and ask it to solve them in Python. I may do some amount of prompt twiddling (e.g. adding "let's think step-by-step" or other tricks people post) but the test will be zero-shot. The problem will count as solved if GPT4's solution passes the Leetcode automated evaluation.

EDIT #1: If I somehow learn that the solutions are in the training set, I'll resolve this N/A or try to find problems for which this is not the case.

EDIT #2: I changed this to be from a recent contest so that there're no concerns about the questions being in the training set.

Nov 16, 10:34pm: Out of three Leetcode contest problems, how many problems will GPT4 solve on its given the exact prompt from the problem? → Out of three Leetcode contest problems, how many problems will GPT4 solve given the exact prompt from the problem?

Get Ṁ500 play money

#NameTotal profit
1Ṁ118
2Ṁ57
3Ṁ34
4Ṁ26
5Ṁ23
Sort by:

Alright I'm doing this now with the second, third, and fourth question from this contest: https://leetcode.com/contest/weekly-contest-336/

@StephenMalina it got "2587. Rearrange Array to Maximize Prefix Score" and failed at "2588. Count the Number of Beautiful Subarrays" and "2589. Minimum Time to Complete All Tasks". For each question, I prefixed the copy-pasted prompt with:

> The following is a programming problem I'd like you to solve in Python:

Finally, here are the answers it gave for each question. Note that I did some slight massaging to fit the code into the required Leetcode class structure but otherwise left them exactly as-is.

2587. Rearrange Array to Maximize Prefix Score
def max_score(nums):

nums.sort(reverse=True) # Sort the array in descending order

prefix_sum = 0

score = 0

for num in nums:

prefix_sum += num

if prefix_sum > 0:

score += 1

else:

break

return score

2588. Count the Number of Beautiful Subarrays
class TrieNode:

def init(self):

self.children = [None, None]

self.count = 0

class Trie:

def init(self):

self.root = TrieNode()

def insert(self, num):

node = self.root

for i in range(31, -1, -1):

bit = (num >> i) & 1

if node.children[bit] is None:

node.children[bit] = TrieNode()

node = node.children[bit]

node.count += 1

def count_zero_xor_subarrays(self, num):

node = self.root

count = 0

for i in range(31, -1, -1):

bit = (num >> i) & 1

if node.children[bit]:

count += node.children[bit].count

node = node.children[bit]

else:

break

return count

def beautiful_subarrays(nums):

trie = Trie()

xor_sum = 0

num_beautiful_subarrays = 0

for num in nums:

xor_sum ^= num

if xor_sum == 0:

num_beautiful_subarrays += 1

num_beautiful_subarrays += trie.count_zero_xor_subarrays(xor_sum)

trie.insert(xor_sum)

return num_beautiful_subarrays

2589. Minimum Time to Complete All Tasks

dp = [0] * n

left, right = 0, len(tasks) - 1

while left < right:

mid = left + (right - left) // 2

left = mid + 1

else:

right = mid

return left if tasks[left][1] < target else -1

for i in range(n):

dp[i] = min(dp[i - 1], dp[prev_task_idx] + duration)

else:

dp[i] = min(dp[i - 1], duration) if i > 0 else duration

return dp[-1]

Stephen Malinabought Ṁ35 of 0

So... I went and did this for GPT3 and it got 1/3. It got: https://leetcode.com/problems/first-missing-positive/, but didn't get: https://leetcode.com/problems/maximal-rectangle/ and https://leetcode.com/problems/regular-expression-matching/. If Leetcode really isn't in the training set, this means I haven't updated nearly hard enough.

Stephen Malinabought Ṁ30 of 1

@StephenMalina this led me to also change the question to be for a recent contest rather than an existing Q.

Stephen Malinabought Ṁ5 of 2

@StephenMalina that said, I still updated my probabilities here to weigh 1 & 2 more highly.

@StephenMalina This made me update on the hardness of "hard" Leetcode problems (I have not used it myself and I assumed the problems would actually be hard.)

@DavidBolin there's a fair bit of variation. I changed it to contest problems, which I think will be have a higher minimum hardness though (although I don't participate in these competitions so this is based on impression not experience).

My current bet is that there's a 60% chance it won't be able to solve any, a 15% chance it'll be able to solve 1, a 15% chance it'll solve 2, and a 10% chance it'll solve 3.