Will someone use better context/steering to get GPT-4 to get a 4 or higher on AP English Literature and Composition?
Mini
50
แน€4.7k
resolved Jan 1
Resolved
NO

GPT-4 passes most AP exams, but only gets a 2 in English Lit and Composition. A hypothesis is that this is due to it not knowing it is supposed to follow stupid English Lit rules when answering. Could one, with the right additional context window and system window inputs, get this exact model (or the closest similar one still offered by OpenAI) to a 4+ on the AP English test?

Resolves to YES if someone reports doing it by end of 2023, NO otherwise.

Get แน€600 play money

๐Ÿ… Top traders

#NameTotal profit
1แน€498
2แน€316
3แน€172
4แน€169
5แน€90
Sort by:

I get no results indicating anyone has tried here: https://www.google.com/search?q=%22GPT-4%22+%22AP+English+Literature+and+Composition%22

Maybe someone does this with custom instructions or fancy prompting (or maybe OpenAI finally allows finetuning), but I know I am too lazy to try.

@RobertCousineau I do my best not to trade on my own markets when there might be a grading dispute, so I'm not trading on this one, but it does seem like the lack of decay with time here is pretty crazy.

@ZviMowshowitz likely it was just forgotten about. Maybe people are playing 4D chess though and expecting great announcements from developer day?

@RobertCousineau A 2 represents up to the โ‰ˆ45th percentile of performance. A 4 is up to the โ‰ˆ90th percentile.

https://apstudents.collegeboard.org/about-ap-scores/score-distributions

Obviously LLM performance and human performance are vastly different, and comparing them is tenuous at best. Regardless, I think the size of the gulf between these scores should show that it will need more than some tinkering at the margins to meet this market's conditions. A 45ish% percentile jump is something like what we saw going gpt 3 to 4; not something that results from two months of prompt engineering.

A decent chunk of my bet is the weather the subset of people with sufficient skills will actually go through the hassle. Is it possible? Probably.

Does the amount of work required exist in the margin between what OpenAI was willing to put in, and what somebody else is willing to put in before the end of the year given the opportunity cost? /Possible/, but at less than the current probability.

predicted NO

s/Weather/Whether/

Really, just fixing Siri would bump up our GDP plentyโ€ฆ