
The Japanese Language Proficiency Test (JLPT) is a standardized test used to evaluate the language proficency of non-native Japanese speakers. There are five levels of the test administered, ranging from N5 (the lowest; "ability to understand some basic Japanese") to N1 (the highest; "ability to understand Japanese used in a variety of circumstances"). Each level is subdivided into a "language knowledge" section (including vocabulary, kanji, and grammatical expressions), a "reading" section, and a "listening" section. The questions are entirely multiple-choice.
Sometime within the next couple of weeks, I will transcribe the practice test available on jlptsensei.com and feed it into ChatGPT (through the web interface, all in one thread, in order, including instructions and sample questions where available). I will use my best judgment to interpret its responses as multiple-choice answers and score its performance based on the answer key available on that same site. This market will resolve to the percentage of questions it gets correct across all three sections (i.e., 100 * number of correct answers / total number of questions).
Note that the real JLPT is scored on a 180-point scale, with the scores based on complex statistics that compare the average raw score among examinees with raw scores from native speakers. I don't have access to those statistics, so I'm just using a simple percentage-correct metric here.
Some notes on the process:
Obviously, I can't feed the audio files used for the "listening" section directly into ChatGPT. jlptsensei.com does include text transcripts of the audio, so I'll just include those in the prompt as well. This might make these sections easier for ChatGPT than they would be for a human examinee.
Some questions in the "listening" and "reading" sections involve interpreting images. ChatGPT can't actually look at images either. So on those questions, I'll do my best to adapt the images into a textual description (in Japanese) that captures their salient features, and use that adapted question in place of the original. This may make some of these questions easier for ChatGPT than they would be for a human examinee.
Any other special formatting, I'll try to adapt it into ChatGPT's plaintext interface as best as I can (e.g., replacing underlined text with brackets, breaking up tabular data). Depending on how clear I can make that stuff, these questions may end up harder for ChatGPT than they would be for a human examinee.
Since I may need to do some smaller experiments with ChatGPT before the "official" run, and thereby get some advance information on its performance, I won't be participating in the market.
After I resolve the market, I'll post the complete thread in a Google doc or something.
I'll probably put up similar markets for the higher JLPT levels as well eventually.
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ0 | |
2 | Ṁ0 |
It actually did a lot more poorly on the easy test than I thought it would! Total score was 57/89 = 64.0%.
The way the individual sections broke down was:
Language: 24/33 = 72.7%
Grammar/reading: 17/32 = 53.1%
Listening: 16/24 = 66.7%
The poor performance on the grammar section is probably due to the large number of fill-in-the-blank type questions, and it seems to have a hard time with that format. In particular, there are several questions complex questions that require you to rearrange a group of words to form a valid sentence, and then select a particular word in the sequence for the multiple choice answer. ChatGPT didn't get any correct answers in that section.
Most of the "listening" questions it missed were in the final part of that section, which is mostly about recognizing the correct pronunciations of numbers and counters/units (hours vs days vs humans). I decided to present that section entirely in kana, since using kanji counters seemed like it would just make the answers too obvious. (The rest of the listening section I just used the kanji script, since I didn't expect ChatGPT to have any trouble with understanding kanji.)
On questions that were more about reading comprehension and reasoning, it missed a few, but mostly did okay. (But, of course, these are designed to be very easy questions.)
The revised test questions I used and logs of the chat sessions are available for posterity on Google Drive.