Which "random" bit string is GPT-generated?
79
1.8K
1.1K
resolved Nov 22
Resolved
NO

I asked ChatGPT, "I need you to do the following for me: Generate a random binary string 100 bits long." ChatGPT almost did this for me, except that the result was only 97 bits long. I then generated my own 97 bit random string with numpy.random. The beginning of both strings is below. I put them in a randomized order, based on numpy.random.randint(2). I will reveal more of the strings over time, at my discretion (e.g., I might reveal it more quickly if people seem to have exhausted everything they can do to analyze the current strings, or if the market already seems certain of the result). The close date is subject to change, depending on how long it takes to reveal the full string and how confident the market is, but I will not close it unless the full string has been revealed for at least 48 hours.

A: 1100001010000100010101010110001010101101101000001101101110000111100010000011101100110100010010010

B: 1101101010010101110100111101011100100011101011011100100110110111110101110010000101101111100110001

This market resolves YES if A is the ChatGPT-generated string, and NO if B is. I will not bet.

Inspired by /Loppukilpailija/which-random-bit-string-is-humangen

Get Ṁ200 play money

🏅 Top traders

#NameTotal profit
1Ṁ663
2Ṁ332
3Ṁ199
4Ṁ160
5Ṁ25
Sort by:

The seed I used to generate the random number was actually just the first few bits of the GPT number (though treated as base ten digits):

import numpy as np

np.random.seed(110110)

rand = ""

for i in range(97):

rand += str(np.random.randint(2))

print(rand)

1100001010000100010101010110001010101101101000001101101110000111100010000011101100110100010010010

After that, I used one more call to randint to determine if the GPT string would be A or B.

Alright, the full strings are revealed now. I was trying to time it so that the final update would come a little over two days before closing time, but a combination of me being busy and my wifi deciding to stop working ruined it. So I will have to extend the close date a little.

bought Ṁ1 YES at 9%

The penultimate update

predicted YES

If I transcribed correctly, an Aaronson oracle gets:

45% on A
60% on B

predicted YES

(I bet well before I thought to put it through the oracle, tbc. I still haven't been ~95% convinced it's B, so I'm not selling yet)

More bits revealed. Only 17 unknown bits left.

Certainly A is real.

Entropy for A: 0.9905577004075261
Entropy for B: 0.9709505944546686

@31ff Interesting, so it has flipped.

More bits revealed

bought Ṁ100 NO from 8% to 7%
predicted NO

(I think it would be nice to have updates, the market seems to have stabilized and there hasn't been much action.)

@Loppukilpailija added 10 more

predicted NO

@JosephNoonan I again think that it would be nice to have more bits. Overall, there doesn't seem to be much action here anymore - people seem to have mostly made up their minds, so unless something surprising happens there won't be much movement anymore. (I personally would like a resolution sometime soon, as I have a lot of mana in here, but others may disagree.)

bought Ṁ0 of NO

For your information, I'm still confident that it's B. Put up a limit order, feel free to bet against me.

(Either I'm wrong, or this is going to be one of those "the market can remain irrational longer than you can remain solvent", isn't it?)

bought Ṁ100 of NO

@Loppukilpailija In short, why I believe what I believe:

  • See my earlier comment at https://manifold.markets/JosephNoonan/which-random-bit-string-is-gptgener#pPIBHigKFw6MIQADRDGc

  • I haven't really looked at the market after the first 10 or 15 bits or so - it didn't seem like there was much low-hanging evidence then

  • I'm not particularly impressed with the gzip result, though I do admit that it's evidence for A. (EDIT: though it seems that the length difference in the zips has decreased from 5 to 4 - I consider this a slight advance prediction of my model.)

  • I have taken over 100 samples from ChatGPT with the given prompt. I have not gotten a single output that starts with "11000", but have got at least 8 that start with "11011".

  • The substring "0000" is around three times less frequent in those outputs than "1111". This statistic points strongly towards B, even more so given the new 00000 in A.

I didn't particularly look at any other statistics (except checked basic "frequencies of blocks of length 2?"), but rather the only obvious ones where I see a difference between A and B.

Quantitatively the 8 vs. 0 samples that start with 11011 vs. 11000 seems like at least a factor of 8 : 1 evidence for B over A. The substring thing is, I don't know, maybe a bit or two on top of that? My within-model-probability is even smaller due to harder-to-communicate reasons about the sampling process. Don't have much model-uncertainty either, so I'm comfortable pushing the market very down.

bought Ṁ20 NO at 6%
bought Ṁ1 YES at 9%

New revelation coming soon

bought Ṁ5 of YES

@JosephNoonan Alright, I revealed the new bits. We now have more than half of each string.

bought Ṁ30 of YES

String A compresses more with gzip, indicating less entropy.

$ echo -n 1100001010000100010101010110001010101101 > a
$ echo -n 1101101010010101110100111101011100100011 > b
$ gzip a b
$ wc -c a.gz b.gz
36 a.gz
41 b.gz
77 total

@ManifoldMarketsUser Interesting, so the idea is that B has more entropy and is therefore more likely to be random. Though it's also possible that fake random strings would actually be harder to compress than real random ones, since a real random string may by chance have some segments that can be compressed a little, while a fake one could be designed to be as hard as possible to compress (though obviously I didn't specifically ask ChatGPT to make a string that's impossible to compress with gzip).

predicted NO

@ManifoldMarketsUser This is how uncommon this is:

Blue are the gzip lengths of randomly generated numbers. Only ~1.9% of random numbers have a gzip length of 36 or less.

predicted NO

@nanob0nus (This on its own is a strong indication for YES, dont be confused by my bet)

Revealed more bits

Hmm, no one noticed that I added more bits.

predicted NO

@JosephNoonan I don't like the new bits

predicted NO

We could need some help over here: https://manifold.markets/Loppukilpailija/which-of-these-random-bit-strings-a

Which of these "random" bit strings are human generated?
Short description. You are presented with various bit strings, some generated by me (by hand) and some generated by a "true" source of randomness. Your task is to figure out which are which. In each pair exactly one of the strings has been generated by me. An answer resolves NO if the former string has been generated by me and YES if the latter one has been. Long description. I have generated 40 bit strings, each 25 bits long, by tapping the "0" and "1" keys on my keyboard. This took me 260 seconds. I have also generated 40 bit strings with a short Python code using a "true" source of randomness. The ith option contains the ith string I generated and the ith string from the randomness source in some order. Your task is to figure out which one is which. The ordering of the human-generated bit strings have not been changed, i.e. the first option contains the first string I generated, the second one the second and so on. I will be resolving the markets one by one starting from the first one. After each resolution there will be some time to re-evaluate one's probabilities for the remaining markets. (A market will resolve NO if the former string has been generated by me and YES if the latter one has been. Memory trick: NO = 0 = 0th string is mine, YES = 1 = 1st string is mine, when indexing strings with 0-indexing. Another memory trick: there is a colorful bar in each of the options. Try to push its endpoint to the left if you think the left-most string has been generated by me and to the right if the right-most string has been. Understandably people find this confusing, be careful to make your bets in the correct direction! I also included an example pair, where the all-zeros string is meant to be the human-generated one and hence that market resolves YES.) Scheduling. I will be resolving the first market on October 28th or so. After that, I'll be resolving the markets one by one, probably at a pace of 1-2 per day. (I am open to requests to speed up or slow down.) So beware - this will be slow. Good reason to automate and write programs! See also. What events will happen on this market? Data. All 40 pairs in a convenient copyable form. 1111010000100010000011100 1110110001011101010001000 0011010000101100110001110 0001101010001101011101001 1110010110001101011000110 0101000111001110100101001 1000101110100011000111010 0110011110111010010111000 1000000000010110011100100 1110010100001101010011110 1110100010000110101110111 1010100100001011010000010 1100110110000110110110100 0010111100101100101100000 0100010001000010011110101 1111001010011001111101010 1101111001001001101110100 0001000011001001011000100 1110100111010001110111101 0110100001101110100110001 0000110111010100001101011 0110001011011101110011110 0101100010010101101110110 1100011010100110100111100 0011010111111100111010110 0001101110000101101000110 0111101000010100011011001 1100001110110100110001011 1110010001101100000010110 0111110000101100000010011 1000101110000111011011011 1110000011110100100111011 1111001000111011000110110 1010101101101011001011000 1100001010100110011010000 0001100011000101110001001 0101011100011100111100100 1100101100011001111011101 0110010101111010000101010 0101111000101100101011111 0011011011110111000101000 0001111100000001010101101 1110011001110001111111100 1101111010001000110110111 1111111001110101011101011 0000110110010111000101011 0011101111010011001111000 1001010101010001001010100 1101111001001000110001101 0000110010000110011000011 0111010111010101001100111 1000100001100100000110110 1011101010011010111110001 1111100100101000110110000 0001011101100011101100111 1011000111011100000111100 1010010111001000011010101 0110011100011100011000100 0011110011101010110011100 0001101101110010011001100 1100010100101010000010001 1101000100001101101101111 1101001000100101101000100 0110101100011110110000001 1100111001011110111011000 0100111101000100000101010 0001101100011000100010011 0110011101000011100111111 1100100011011000111101011 0100110011101011001011010 1001000110100111010010111 0111010011010111011100000 0001101010001101110001101 0110101011100110001110110 1101010100110110001101010 1000111111000000011101111 1010110111001010010110000 1010100110000100001100001 1111101100001000000100110 0001011010110001001001101