Out of three Leetcode contest problems, how many problems will GPT4 solve given the exact prompt from the problem?

740Ṁ549

resolved Mar 15

ALL

100%19%

24%

16%

40%

My plan is to open up Leetcode, go to the most recent contest, and then give GPT4 the exact prompts from 3/4 problems and ask it to solve them in Python. I may do some amount of prompt twiddling (e.g. adding "let's think step-by-step" or other tricks people post) but the test will be zero-shot. The problem will count as solved if GPT4's solution passes the Leetcode automated evaluation.

EDIT #1: If I somehow learn that the solutions are in the training set, I'll resolve this N/A or try to find problems for which this is not the case.

EDIT #2: I changed this to be from a recent contest so that there're no concerns about the questions being in the training set.

Nov 16, 10:34pm: ~~Out of three Leetcode contest problems, how many problems will GPT4 solve on its given the exact prompt from the problem?~~ → Out of three Leetcode contest problems, how many problems will GPT4 solve given the exact prompt from the problem?

Technical AI Timelines

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ153
2		Ṁ40
3		Ṁ20
4		Ṁ18
5		Ṁ15

People are also trading

Which of these 10 things will GPT-5 do? (@flowersslop)

Will OpenAI's next major LLM (after GPT-4) solve more than 2 of the first 5 new Project Euler problems?

57% chance

How many parameters does GPT4o have?

Will GPT-5 reach a 1000 rating on Codeforces?

96% chance

Will GPT-5 be able to solve A::B system puzzles consistently

79% chance

Will GPT-5 get the Monty *Fall* problem correct?

91% chance

In yottaFLOPs (10^24), how much compute will GPT-4 be trained with?

8 Comments

11 Holders

45 Trades

Sort by:

Alright I'm doing this now with the second, third, and fourth question from this contest: https://leetcode.com/contest/weekly-contest-336/

@StephenMalina it got "2587. Rearrange Array to Maximize Prefix Score" and failed at "2588. Count the Number of Beautiful Subarrays" and "2589. Minimum Time to Complete All Tasks". For each question, I prefixed the copy-pasted prompt with:

> The following is a programming problem I'd like you to solve in Python:

Finally, here are the answers it gave for each question. Note that I did some slight massaging to fit the code into the required Leetcode class structure but otherwise left them exactly as-is.

2587. Rearrange Array to Maximize Prefix Score
def max_score(nums):

nums.sort(reverse=True) # Sort the array in descending order

prefix_sum = 0

score = 0

for num in nums:

prefix_sum += num

if prefix_sum > 0:

score += 1

else:

break

return score

2588. Count the Number of Beautiful Subarrays
class TrieNode:

def init(self):

self.children = [None, None]

self.count = 0

class Trie:

def init(self):

self.root = TrieNode()

def insert(self, num):

node = self.root

for i in range(31, -1, -1):

bit = (num >> i) & 1

if node.children[bit] is None:

node.children[bit] = TrieNode()

node = node.children[bit]

node.count += 1

def count_zero_xor_subarrays(self, num):

node = self.root

count = 0

for i in range(31, -1, -1):

bit = (num >> i) & 1

if node.children[bit]:

count += node.children[bit].count

node = node.children[bit]

else:

break

return count

def beautiful_subarrays(nums):

trie = Trie()

xor_sum = 0

num_beautiful_subarrays = 0

for num in nums:

xor_sum ^= num

if xor_sum == 0:

num_beautiful_subarrays += 1

num_beautiful_subarrays += trie.count_zero_xor_subarrays(xor_sum)

trie.insert(xor_sum)

return num_beautiful_subarrays

2589. Minimum Time to Complete All Tasks
def min_computer_time(tasks):

tasks.sort(key=lambda x: x[1])

n = len(tasks)

dp = [0] * n

def binary_search(tasks, target):

left, right = 0, len(tasks) - 1

while left < right:

mid = left + (right - left) // 2

if tasks[mid][1] < target:

left = mid + 1

else:

right = mid

return left if tasks[left][1] < target else -1

for i in range(n):

start, end, duration = tasks[i]

prev_task_idx = binary_search(tasks, start)

if prev_task_idx != -1:

dp[i] = min(dp[i - 1], dp[prev_task_idx] + duration)

else:

dp[i] = min(dp[i - 1], duration) if i > 0 else duration

return dp[-1]

So... I went and did this for GPT3 and it got 1/3. It got: https://leetcode.com/problems/first-missing-positive/, but didn't get: https://leetcode.com/problems/maximal-rectangle/ and https://leetcode.com/problems/regular-expression-matching/. If Leetcode really isn't in the training set, this means I haven't updated nearly hard enough.

@StephenMalina this led me to also change the question to be for a recent contest rather than an existing Q.

@StephenMalina that said, I still updated my probabilities here to weigh 1 & 2 more highly.

@StephenMalina This made me update on the hardness of "hard" Leetcode problems (I have not used it myself and I assumed the problems would actually be hard.)

@DavidBolin there's a fair bit of variation. I changed it to contest problems, which I think will be have a higher minimum hardness though (although I don't participate in these competitions so this is based on impression not experience).

My current bet is that there's a 60% chance it won't be able to solve any, a 15% chance it'll be able to solve 1, a 15% chance it'll solve 2, and a 10% chance it'll solve 3.

People are also trading

Which of these 10 things will GPT-5 do? (@flowersslop)

Will OpenAI's next major LLM (after GPT-4) solve more than 2 of the first 5 new Project Euler problems?

57% chance

How many parameters does GPT4o have?

Will GPT-5 reach a 1000 rating on Codeforces?

96% chance

Will GPT-5 be able to solve A::B system puzzles consistently

79% chance

Will GPT-5 get the Monty *Fall* problem correct?

91% chance

In yottaFLOPs (10^24), how much compute will GPT-4 be trained with?

🏅 Top traders

People are also trading

People are also trading

Related questions