Will GPT-4 work mostly well with the Morse code?

Resolves YES on credible reports that GPT-4 is mostly able to reply to and produce long Morse Code messages.
Resolves NO on credible reports that over 1000 external users have access to GPT-4 && it still makes significant errors with Morse.

I’ll accept the use of very simple technical workarounds (like separating characters with spaces to circumvent tokenization, IDK).

Somewhat better-specified criteria, starting with the most precise:

“Long messages” – at least 20 arbitrary English words of input and output; hand-crafted cherry-picked examples like 20דHello” don’t count.
“Mostly able” – as major-error-free as in English in at least 50% of trials.
“Simple workarounds” – I can still recognize the actual message without using any tools other than eyes and a Morse transcriber.
“Credible reports” – at least a couple of sources that are not strongly disputed within two weeks.

I expect the arbitrary and imprecise quantification may bite me; I’ll wait for more information and use my best judgment if necessary.

Inspired by: https://www.lesswrong.com/posts/hDePh3KReBMNBJfzx/gpt-3-catching-fish-in-morse-code.

Close date updated to 2024-12-31 12:00 am

Sort by:
Mira avatar
Mirais predicting YES at 97%

@Mira That link didn't embed... In particular reports of it summarizing an article written in Morse code.

Mira avatar
Mirais predicting YES at 93%

Anyways, here's 20 arbitrary English words from your market description. This is my first test of morse code on GPT-4(though I also tested Bing before buying shares a few days ago.)

Encoding Prompt is:

Translate the following to Morse Code
Resolves YES on credible reports that GPT-4 is mostly able to reply to and produce long Morse Code messages.

Encoding Response is:

Here is the translation of the provided text into Morse Code:

Resolves YES on credible reports that GPT-4 is mostly able to reply to and produce long Morse Code messages.
.-. . ... --- .-.. ...- . ... -.-- . ... --- -. -.-. .-. . -.. .. -... .-.. . .-. . .--. --- .-. - ... - .... .- - --. .--. - -....- ....- .. ... -- --- ... - .-.. -.-- .- -... .-.. . - --- .-. . .--. .-.. -.-- - --- .- -. -.. .--. .-. --- -.. ..- -.-. . .-.. --- -. --. -- --- .-. ... . -.-. --- -.. . -- . ... ... .- --. . ... .-.-.-

.-. . ... --- .-.. ...- . ...

Decoded message is:

The decoded Morse code message is as follows:


I think it resolves YES.

Nikola avatar
Nikolabought Ṁ100 of NO

Fed in some random words. It totally messed up the encoding step this time

Nikola avatar
Nikolais predicting NO at 96%

@Nikola sorry, I mean decoding. It's good at encoding, bad at decoding as far as I can tell

Nikola avatar
Nikolasold Ṁ96 of NO

I'm very impressed! It's able to encode and decode long texts with almost zero typos. I guess the resolution is mostly subjectieve at this point because there are some typos

Nikola avatar

How does this resolve if it's not able to translate in one step, but it is able to consistently translate if you prompt it to think step-by-step?

yaboi69 avatar

@Nikola What do you and other people here think I should do?

I have to draw the line somewhere, and it only gets more difficult with more doors left open. If we go toward the extreme, at some point, we get to the level of hand-holding and answer-feeding that everyone would agree is not fair.

That said, the original description did leave doors open with a vague standard of “simple workarounds”. If the sequence of prompts was a short fixed script to follow, I feel like that should probably be acceptable – not yet committed to it, though, in the slightest.

Please provide feedback.

More loose thoughts – what else should count as a “simple workaround”?

  • what if success requires a very long prompt, e.g. an entire book’s worth of rules and priming?

  • what if success requires a very specific combination of parameters (like temperature) to be replicable enough?

jonsimon avatar
Jon Simonis predicting NO at 66%

@yaboi69 I think it needs to do it in one step. Maybe a second step to clean it up, but no more than that. Although a command explicitly telling it to "think step by step" a la chain-of-thought prompting is ok.

QuantumObserver avatar
Quantum Observer

Presumably your market is not actually intended to close today.

ML avatar
MLbought Ṁ69 of NO

Assuming GPT-4 is architecturally similar to GPT-3 (which is to say, a big Transformer) it seems like this will turn largely on whether GPT-4 uses an input encoding (like GPT-3's BPE) that prevents the model from seeing individual characters, which itself is an engineering trade-off whether to get a bigger input buffer at the cost of being able to solve character-level problems like word spelling or morse encoding.

For a general-purpose language model you would probably want the bigger input buffer, since that helps in a large variety of situations while the other problems could be seen as just parlor games.

It is possible that GPT-4 is different enough from GPT-3 in architecture that this reasoning doesn't hold, that OpenAI chooses to take on character-level problems as a priority for whatever reason (perhaps there are instances of such problems that are actually impactful), or that GPT-4 through sheer scale is able to overcome such limitations and learn word spellings through the fog of token encoding.

I still think 74% is a bit overpriced though and will sell down a tad, myself.

ML avatar
MLis predicting YES at 94%

@ML Well, looks like I underestimated GPT-4. It was able to encode long paraphs into Morse on the first try, and follow instructions (provided by me in Morse code) to decode my instructions, print them out, and then write FizzBuzz in Python.

yaboi69 avatar

M$5 worth of thoughts: I guess you can just add a (much cheaper) transcription layer on top, and both Morse, and more importantly, Braille will work perfectly fine, so it’s not like this question has much direct practical value. I still find it a moderately interesting indicator of what the model got to learn and what it can do.