My plan is to open up Leetcode, go to the most recent contest, and then give GPT4 the exact prompts from 3/4 problems and ask it to solve them in Python. I may do some amount of prompt twiddling (e.g. adding "let's think step-by-step" or other tricks people post) but the test will be zero-shot. The problem will count as solved if GPT4's solution passes the Leetcode automated evaluation.

EDIT #1: If I somehow learn that the solutions are in the training set, I'll resolve this N/A or try to find problems for which this is not the case.

EDIT #2: I changed this to be from a recent contest so that there're no concerns about the questions being in the training set.

Nov 16, 10:34pm: ~~Out of three Leetcode contest problems, how many problems will GPT4 solve on its given the exact prompt from the problem?~~ → Out of three Leetcode contest problems, how many problems will GPT4 solve given the exact prompt from the problem?

Alright I'm doing this now with the second, third, and fourth question from this contest: https://leetcode.com/contest/weekly-contest-336/

@StephenMalina it got "2587. Rearrange Array to Maximize Prefix Score" and failed at "2588. Count the Number of Beautiful Subarrays" and "2589. Minimum Time to Complete All Tasks". For each question, I prefixed the copy-pasted prompt with:

> The following is a programming problem I'd like you to solve in Python:

Finally, here are the answers it gave for each question. Note that I did some slight massaging to fit the code into the required Leetcode class structure but otherwise left them exactly as-is.**2587. Rearrange Array to Maximize Prefix Score**

def max_score(nums):

nums.sort(reverse=True) # Sort the array in descending order

prefix_sum = 0

score = 0

for num in nums:

prefix_sum += num

if prefix_sum > 0:

score += 1

else:

break

return score

**2588. Count the Number of Beautiful Subarrays**

class TrieNode:

def **init**(self):

self.children = [None, None]

self.count = 0

class Trie:

def **init**(self):

self.root = TrieNode()

def insert(self, num):

node = self.root

for i in range(31, -1, -1):

bit = (num >> i) & 1

if node.children[bit] is None:

node.children[bit] = TrieNode()

node = node.children[bit]

node.count += 1

def count_zero_xor_subarrays(self, num):

node = self.root

count = 0

for i in range(31, -1, -1):

bit = (num >> i) & 1

if node.children[bit]:

count += node.children[bit].count

node = node.children[bit]

else:

break

return count

def beautiful_subarrays(nums):

trie = Trie()

xor_sum = 0

num_beautiful_subarrays = 0

for num in nums:

xor_sum ^= num

if xor_sum == 0:

num_beautiful_subarrays += 1

num_beautiful_subarrays += trie.count_zero_xor_subarrays(xor_sum)

trie.insert(xor_sum)

return num_beautiful_subarrays**2589. Minimum Time to Complete All Tasks**

def min_computer_time(tasks):

tasks.sort(key=lambda x: x[1])

n = len(tasks)

dp = [0] * n

def binary_search(tasks, target):

left, right = 0, len(tasks) - 1

while left < right:

mid = left + (right - left) // 2

if tasks[mid][1] < target:

left = mid + 1

else:

right = mid

return left if tasks[left][1] < target else -1

for i in range(n):

start, end, duration = tasks[i]

prev_task_idx = binary_search(tasks, start)

if prev_task_idx != -1:

dp[i] = min(dp[i - 1], dp[prev_task_idx] + duration)

else:

dp[i] = min(dp[i - 1], duration) if i > 0 else duration

return dp[-1]

So... I went and did this for GPT3 and it got 1/3. It got: https://leetcode.com/problems/first-missing-positive/, but didn't get: https://leetcode.com/problems/maximal-rectangle/ and https://leetcode.com/problems/regular-expression-matching/. If Leetcode really isn't in the training set, this means I haven't updated nearly hard enough.

@StephenMalina this led me to also change the question to be for a recent contest rather than an existing Q.

@StephenMalina This made me update on the hardness of "hard" Leetcode problems (I have not used it myself and I assumed the problems would actually be hard.)

@DavidBolin there's a fair bit of variation. I changed it to contest problems, which I think will be have a higher minimum hardness though (although I don't participate in these competitions so this is based on impression not experience).