Will someone use better context/steering to get GPT-4 to get a 4 or higher on AP English Literature and Composition?

990Ṁ4715

resolved Jan 1

Resolved

ALL

GPT-4 passes most AP exams, but only gets a 2 in English Lit and Composition. A hypothesis is that this is due to it not knowing it is supposed to follow stupid English Lit rules when answering. Could one, with the right additional context window and system window inputs, get this exact model (or the closest similar one still offered by OpenAI) to a 4+ on the AP English test?

Resolves to YES if someone reports doing it by end of 2023, NO otherwise.

New Year's Resolutions 2024

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ498
2		Ṁ316
3		Ṁ172
4		Ṁ169
5		Ṁ90

People are also trading

Will GPT-5 ace exams?

77% chance

Will the GPT4+code-interpreter+search score > 1350 on Lmsys Arena Leaderboard?

49% chance

Will GPT-4.5 score at least 100 in an IQ test?

63% chance

What score will GPT-5 get on the SAT?

Will GPT-5 have a perfect SAT score in either section?

Sort by:

I get no results indicating anyone has tried here: https://www.google.com/search?q=%22GPT-4%22+%22AP+English+Literature+and+Composition%22

Maybe someone does this with custom instructions or fancy prompting (or maybe OpenAI finally allows finetuning), but I know I am too lazy to try.

@RobertCousineau I do my best not to trade on my own markets when there might be a grading dispute, so I'm not trading on this one, but it does seem like the lack of decay with time here is pretty crazy.

@ZviMowshowitz likely it was just forgotten about. Maybe people are playing 4D chess though and expecting great announcements from developer day?

@RobertCousineau A 2 represents up to the ≈45th percentile of performance. A 4 is up to the ≈90th percentile.

https://apstudents.collegeboard.org/about-ap-scores/score-distributions

Obviously LLM performance and human performance are vastly different, and comparing them is tenuous at best. Regardless, I think the size of the gulf between these scores should show that it will need more than some tinkering at the margins to meet this market's conditions. A 45ish% percentile jump is something like what we saw going gpt 3 to 4; not something that results from two months of prompt engineering.

A decent chunk of my bet is the weather the subset of people with sufficient skills will actually go through the hassle. Is it possible? Probably.

Does the amount of work required exist in the margin between what OpenAI was willing to put in, and what somebody else is willing to put in before the end of the year given the opportunity cost? /Possible/, but at less than the current probability.