MANIFOLD
Does a few shot prompt improve GPT-4 debugging performance?
9
Ṁ287Ṁ201
resolved Mar 1
Resolved
N/A

If I use a 3 shot prompt giving examples of spotting bugs in my Pytorch code, will GPT-4 improve its debugging ability?

This will be a noisy manual eval: I will evaluate on 10 code snippets and resolve Yes if the few shot prompted version (compared to zero-shot, net change) does better on 2+ cases, otherwise No. I'll either go through with this eval or cancel by end of Feb 2024. If I don't end up finding/creating such a prompt this market will be cancelled.

Market context
Get
Ṁ1,000
to start trading!
Sort by:
bought Ṁ10 NO

id say it depends on the code snippet itself; it’s pretty good at basic programming with few-shot but for cp it’s pretty bad at debugging

predictedNO

Hmmm so I realized the question description is ambiguous, I intended the criterion to resolve Yes iff net improvement is >=+2 (i.e. number of better answers - worse answers >=2), if this isn't how people read it I will cancel the market.

predictedNO

@JacobPfau I'll leave this comment standing for a week, and resolve N/A if any existing traders understood the question differently (sorry if so). If no one complains, I'll update the description to clarify and let the market continue.

predictedNO

@JacobPfau Updated to add 'net change'

© Manifold Markets, Inc.TermsPrivacy