At end of market, I will ask the top public general OpenAI LLM or AI assistant which is available online for <100$/month the following question.
The prompt
For each of the following media outlets, please imagine how, on its editorial page, it would score President Obama's 8 years of leadership in the US as president, from 0-100. For each one, output a number from 0-100 and a short phrase explaining your rating.
(The Economist, NY Post, The WSJ, The New Yorker, BBC, Al Jazeera English, Time Magazine, Fox News, Der Spiegel, China Daily, Yomiuri Shimbun, DW, France 24, RT, The Hindu, El Pais, Haaretz, O Globo (brazil), Hindustan Times, The Times UK, Le Figaro, The Jerusalem Post).
Scoring
I will run this prompt 5 times on whatever is the best general model as specified above. In each run I will take the score. Then I will average all the scores.
We will define a range around today's score of 62.4
<61 is a clear DECREASE => NO
from 61 to 64, inclusive, is NO CHANGE => 50%
>64 is INCREASE => YES
Figuring out which LLM/AI/etc to use
Overall I will try to find the best "general purpose tool to answer free-form questions" from OpenAI. This might be an agent, an LLM, an AGI, or some new kind of thing. But I will approach it as a customer saying "I have general questions about history, which require something like the ability to think, synthesize arguments, find data, evaluate questions etc; what should I use?" and I will try to use the best available option. Today that option is GPT-4; in the future that might change.
I am saying OpenAI now, and in general I will try to stick to OpenAI even if another company such as xAI takes the lead dramatically, for stability and judging reasons. If OpenAI is out of business I'll try to identify their nearest conceptual successor (e.g. microsoft if they continue the same system) or any previous public open source models based on gpt4 or later, which also have new data. If no such thing is available, unfortunately we may NA. Overall I'm going to try to design the resolution system to be maximally useful at evaluating the actual content of the question - whether or not global media opinion has changed during this time.
Results today
All 5 initial run results are here: https://docs.google.com/spreadsheets/d/12uAnnq7uPTjYw8qYAEYwdkhIlWM6ZIUXYo3zWOl-A0s/edit?usp=sharing
The average across them all was 62.44
The results were very consistent. It remains to be seen whether LLMs opinions will be this stable over longer periods of time.
Here is one sample result which averages 61.05
The Economist: 75 - "Pragmatic and effective in certain areas, but limited in global impact."
NY Post: 40 - "Too liberal and ineffective, particularly on economic policies."
The Wall Street Journal: 55 - "Mixed effectiveness, good on some economic policies but too progressive in others."
The New Yorker: 85 - "Progressive and transformative, a champion of liberal values."
BBC: 70 - "Generally positive, with effective domestic policies but some international shortcomings."
Al Jazeera English: 65 - "Positive on domestic issues but mixed in foreign policy, especially in the Middle East."
Time Magazine: 75 - "Generally effective and progressive, with notable achievements."
Fox News: 30 - "Too liberal and often ineffective, particularly in foreign policy and economy."
Der Spiegel: 70 - "Respectable leadership, balancing domestic success with international challenges."
China Daily: 50 - "Mixed; some positive domestic policies, but critical of foreign policy, especially in Asia."
Yomiuri Shimbun: 60 - "Moderate effectiveness, with a focus on U.S.-Japan relations."
DW (Deutsche Welle): 70 - "Positive view on progressive policies and international cooperation."
France 24: 68 - "Generally effective, with a positive view on diplomacy and international relations."
RT (Russia Today): 35 - "Critical, focusing on perceived failures in foreign policy."
The Hindu: 65 - "Positive on global diplomacy and economic initiatives, but some criticisms."
El Pais: 75 - "Favorable view on progressive policies and international stance."
Haaretz: 60 - "Mixed, praising domestic achievements but critical of some aspects of Middle East policy."
O Globo: 65 - "Positive on domestic social policies, mixed on international relations."
Hindustan Times: 65 - "Generally positive, with a focus on India-U.S. relations."
The Times UK: 60 - "Acknowledges achievements but critical of certain aspects of foreign policy."
Le Figaro: 55 - "Mixed; appreciation for some policies but critical of others, especially in foreign affairs."
The Jerusalem Post: 50 - "Mixed; positive on some domestic issues but critical of Middle East policy."
Some cursory Google Trends data suggests a clear long-term skew in favor of Obama. If I were to normalize this to 0-100, then the rating would be (22 + 22)/(22 + 10 + 22 + 10)*100 = 68, which is definitely higher than the score that the LLM method got.
(And if I restrict this to the USA, then the skew increases to (28 + 28)/(28 + 10 + 28 + 11)*100 = 72, which is even higher.)