My plan is to open up Leetcode, go to the most recent contest, and then give GPT4 the exact prompts from 3/4 problems and ask it to solve them in Python. I may do some amount of prompt twiddling (e.g. adding "let's think step-by-step" or other tricks people post) but the test will be zero-shot. The problem will count as solved if GPT4's solution passes the Leetcode automated evaluation.
EDIT #1: If I somehow learn that the solutions are in the training set, I'll resolve this N/A or try to find problems for which this is not the case.
EDIT #2: I changed this to be from a recent contest so that there're no concerns about the questions being in the training set.
Nov 16, 10:34pm: Out of three Leetcode contest problems, how many problems will GPT4 solve on its given the exact prompt from the problem? → Out of three Leetcode contest problems, how many problems will GPT4 solve given the exact prompt from the problem?
Alright I'm doing this now with the second, third, and fourth question from this contest: https://leetcode.com/contest/weekly-contest-336/
@StephenMalina it got "2587. Rearrange Array to Maximize Prefix Score" and failed at "2588. Count the Number of Beautiful Subarrays" and "2589. Minimum Time to Complete All Tasks". For each question, I prefixed the copy-pasted prompt with:
> The following is a programming problem I'd like you to solve in Python:
Finally, here are the answers it gave for each question. Note that I did some slight massaging to fit the code into the required Leetcode class structure but otherwise left them exactly as-is.
2587. Rearrange Array to Maximize Prefix Score
def max_score(nums):
nums.sort(reverse=True) # Sort the array in descending order
prefix_sum = 0
score = 0
for num in nums:
prefix_sum += num
if prefix_sum > 0:
score += 1
else:
break
return score
2588. Count the Number of Beautiful Subarrays
class TrieNode:
def init(self):
self.children = [None, None]
self.count = 0
class Trie:
def init(self):
self.root = TrieNode()
def insert(self, num):
node = self.root
for i in range(31, -1, -1):
bit = (num >> i) & 1
if node.children[bit] is None:
node.children[bit] = TrieNode()
node = node.children[bit]
node.count += 1
def count_zero_xor_subarrays(self, num):
node = self.root
count = 0
for i in range(31, -1, -1):
bit = (num >> i) & 1
if node.children[bit]:
count += node.children[bit].count
node = node.children[bit]
else:
break
return count
def beautiful_subarrays(nums):
trie = Trie()
xor_sum = 0
num_beautiful_subarrays = 0
for num in nums:
xor_sum ^= num
if xor_sum == 0:
num_beautiful_subarrays += 1
num_beautiful_subarrays += trie.count_zero_xor_subarrays(xor_sum)
trie.insert(xor_sum)
return num_beautiful_subarrays
2589. Minimum Time to Complete All Tasks
def min_computer_time(tasks):
tasks.sort(key=lambda x: x[1])
n = len(tasks)
dp = [0] * n
def binary_search(tasks, target):
left, right = 0, len(tasks) - 1
while left < right:
mid = left + (right - left) // 2
if tasks[mid][1] < target:
left = mid + 1
else:
right = mid
return left if tasks[left][1] < target else -1
for i in range(n):
start, end, duration = tasks[i]
prev_task_idx = binary_search(tasks, start)
if prev_task_idx != -1:
dp[i] = min(dp[i - 1], dp[prev_task_idx] + duration)
else:
dp[i] = min(dp[i - 1], duration) if i > 0 else duration
return dp[-1]
So... I went and did this for GPT3 and it got 1/3. It got: https://leetcode.com/problems/first-missing-positive/, but didn't get: https://leetcode.com/problems/maximal-rectangle/ and https://leetcode.com/problems/regular-expression-matching/. If Leetcode really isn't in the training set, this means I haven't updated nearly hard enough.
@StephenMalina this led me to also change the question to be for a recent contest rather than an existing Q.
@StephenMalina This made me update on the hardness of "hard" Leetcode problems (I have not used it myself and I assumed the problems would actually be hard.)
@DavidBolin there's a fair bit of variation. I changed it to contest problems, which I think will be have a higher minimum hardness though (although I don't participate in these competitions so this is based on impression not experience).