Short description
In the image below, there are two sets of points, green and red. One of them is "random" while another one has been produced by me via clicking points into the square. Market resolves to YES if the green one has been produced by me and NO otherwise.
All 100 points have now been revealed
Details
One of the point sets has been produced by sampling from a uniform distribution over a square. The length of this square is strictly less than the bounding box in the square above. Call this the Hidden Square. I won't tell you the side length of the Hidden Square.
Another of these point sets has been produced by sampling from me. Some details of the data generation process:
The points were generated by clicking a location in a UI interface on a mousepad.
I generated 100 points in roughly 90 seconds.
I have practiced.
If a point landed outside of the Hidden Square, I discarded that point and generated a new one.
I have applied one or several entropy-increasing transformations to the final point set.
To elaborate on the last point, here is a hypothetical example of such a transformation: "Given a point, round its x-coordinate to the nearest multiple of 10, and then add a uniformly random integer between 0-9 to it." Intuitively, this makes the set of points "more random", i.e. harder to distinguish from the "actually random set".
I won't tell which transformations I've used.
EDIT: Here are the transformations I used:
Reflect the points along the two perpendicular bisectors of the sides and one of the diagonals (giving 8 possible destinations for the point), in a uniformly random way.
Reveal the points in a random order, in a uniformly random way.
The length of the hidden square is 400 - 6*2, with the same center as the bounding box.
Other
Here is a market on how many points it will take for traders to figure out the answer: https://manifold.markets/Loppukilpailija/how-many-points-are-needed-to-disti-74e061c28bb6
The data
The data in a more convenient form. The bounding box in the image has corners at (100, 100) and (500, 500).
GREEN:
226 280
253 454
150 465
405 217
456 203
370 268
373 260
454 170
166 474
274 247
272 377
446 459
281 434
116 385
464 250
299 396
446 271
271 473
354 281
149 304
424 303
180 220
130 454
344 304
409 154
231 256
462 195
311 459
299 272
465 374
434 220
157 268
227 230
446 182
337 416
244 478
280 284
203 240
249 482
279 356
154 463
435 288
275 357
305 252
357 200
219 166
379 390
451 438
453 436
303 270
488 165
112 440
414 233
386 269
314 415
306 324
308 447
372 202
386 391
404 247
452 277
364 204
176 399
281 462
156 227
290 410
327 245
310 140
378 220
445 455
124 300
491 417
365 441
116 277
311 205
494 166
290 375
276 314
357 151
322 392
452 345
370 162
327 437
336 337
308 275
221 430
423 242
324 238
300 305
160 209
203 127
444 333
349 486
293 239
345 370
211 228
154 240
221 108
349 440
242 179
RED:
285 229
142 422
250 224
468 169
410 215
301 178
232 247
371 480
417 294
410 196
128 122
213 302
400 399
444 373
163 213
155 454
340 319
477 386
149 369
302 442
204 470
396 152
123 306
118 454
106 110
362 399
127 209
448 346
188 286
442 135
263 124
452 351
183 429
285 150
209 405
283 236
453 132
332 374
296 176
227 202
251 480
325 322
350 242
270 341
147 368
469 266
281 167
418 165
449 416
223 158
265 171
239 433
462 398
110 241
345 318
210 167
291 477
295 125
381 236
487 362
359 117
381 283
289 205
349 393
183 244
479 447
200 480
224 144
433 427
133 346
374 206
127 413
246 399
457 155
247 251
299 292
163 142
367 344
276 349
364 302
167 340
423 133
424 427
204 245
150 385
170 258
133 361
357 200
361 307
176 175
304 148
223 163
366 169
437 437
239 123
355 465
211 283
341 469
235 486
287 488
(deleted)
Anyone up for some more randomized fun?
https://manifold.markets/BenjaminShindel/which-set-of-words-is-random?r=QmVuamFtaW5TaGluZGVs
I am doing post mortem on my chi-square test. The main result I considered was obtained using 80 points. The source code is here: https://pastebin.com/n56c80fA
Results:
P-value green: 0.0116
P-value red: 0.2757
P-value green: 0.0909
P-value red: 0.8942
Now, I used the generator published by @Loppukilpailija to generate 80 points many times, reconstructing the histogram of p-values for middle digit and last digit.
So, I am still confused. I would have to be extremely unlucky for the test fail as it did. The source code to the histograms, including the random generator I copied in, can be found here: https://pastebin.com/zR5wpkBd
Just to be clear: The histograms of the single-digits are not random uniform, there are mild differences (I can past the histograms, calculation is included in the code). But 80 points repeatedly produce almost evenly distributed p-values, so the difference is probably not detectable on 80 points.
P.S. Yes, the code is stupidly ineffective, I was lazy to polish it.
Fantastic market @Loppukilpailija. Was a lot of fun.
I got eaten up by this market because I was underconfident and thought that others had more knowledge than me. Lucky to still come out with a profit. Was generally confused and still am.
I struggled between the arguments that easy human-generated errors like average distance, angles, and quadrant flips all pointed to RED, while GREEN was far too clumpy and RED look very random on many metrics
@PC I think the tale of this market was not that anyone should have been more confident.
To make money here, you should have been less confident. We just didn’t have enough evidence to support either YES or NO with high confidence.
For example, my private estimate was 51% for YES. What I should have done is buy NO when the market was above 90% YES, not because I thought the outcome would be NO, but because I estimated a 49% chance of NO and buying NO was cheap.
Anyone who spent a very large fraction of their bankroll here was overconfident. Even if they bet correctly on NO. Because about half the time they’d had lost.
I am glad that this market bet so strongly against the actual outcome. This way one can feel the overconfidence and the associated loss. If the market had resolved YES instead, it would have been harder to recognise the mistake, given that one has won.
All of my point! Lost!
Very nice markets, big thanks to OP
Thank you @Loppukilpailija for this great game! I learned something, and following the comments was a lot of fun.
when you over-rely on p-values.
I don't think p-values are the specific issue? Just over-relying on methods of analysis you don't necessarily understand? "likelihood that the null hypothesis distribution would produce a sample with a property greater than or equal to this" is a reasonable thing to look at here, right?
@jacksonpolack I'm not sure it is, it makes you very vulnerable to accidental p-hacking when throwing properties at the wall and trying to get one to stick, with no preregistration and pretty much zero clues as to what properties have a high chance of actually being correlated
That's just 'running multiple tests and believing in the result if only one is positive', which can happen with or without p-values
How it went for me: I was very enthusiastic about the market from the start, but I had the uneasy feeling "I hope OP didn't go overboard with the secret transformation stuff, they may just render a ton of work completely worthless". I tried lots and lots of statistics and became convinced green was a way to go. Then at about 80 points it all collapsed, also OP revealed that yes, everything I was trying was in fact bullshit. I was left in a state of total doubt, sold all my green and bought 20 red thinking that I no longer had any idea what was going on, but the market was probably overconfident. I made some profit, but it's clear I'm not good enough at stats to have solved it. Not sure if anybody got this one right in a rigorous, principled way, but if you did, wow, you have my respect.
@Tasty_Y Maybe someone who was not betting a lot or engaging with the comments knew something that we didn't, but I don't think myself, Phil, or Capy really knew what was going on. Even if I ended up being right, my reasoning was all wrong.
@Shump I know you just don’t believe me, but I think the knowledge that one set had a transform that added radial-only spread was critical information that could be used.
@capybara Great job! Can you share more? I couldn’t figure it out myself. Got scared that Shump was selling
@capybara I tried to convince ppl in the comments below. If it doesn’t make sense, I can try again.
@Shump I actually wonder why you sold. I imagine it was because you did the analysis of distance from the middle. I think your reasoning was sound.
@PC Each point gets sent to one of 8 positions, but all 8 positions are roughly the same at distance from (300,300) for each point. So by spread I mean spread around at whatever radial distance they started.
@PC For example, distance to neighbor. Points near (300,300) effectively won’t move. Points far from (300,300) will move a lot. Areas with few points but away from (300,300) are more likely to get a point move into it than an area with few points near (300,300). For example.
@PC I left a comment in a thread below but my reasoning is that basically, if we apply the transformations again to each dataset, if the difference in metrics is real then you would expect the difference to remain, but if it's a fluke you will see regression to the mean. All of the indicators for red failed on this test, so I was left with no argument for red, and still several for green. The distance from the middle is one thing but some other metrics that indicate green also seem to be resistant to the transformation.
@capybara That doesn't sound right: the OPtransform is only erasing information, it can't introduce any noticeably abnormalities. It's not going to make things randomer than random, or more evenly spread than random in any way. A uniformly random set of points to which OPtransform was applies would be indistinguishable from another random one.
@PC Yes. All of my indicators relied on order. Reshuffle the dataset, and they become insignificant. Some green indicators like discrepancy also fail the analogous test for the other transformations, but not all.
@PC so it might have been luck. But the distribution of potential values for RED is more extreme, and it happened that the transformation created one that was extreme
sample 2 points from uniform in 389x389 square. And run the following test: are two points within a 10x10 test square. The probability is uniform over larger square. Now sample 2 points and apply transform. The probability of both points being within test square is now higher near the center compared to far from center.
@capybara Finally, sample 2 points and run transform twice. Nothing changes from the situation where you applied it just once.
@capybara 1 point is simpler than 2, will your idea work with 1 point? 10x10 positioned however I want?
@Tasty_Y Because the quality that chances is point density. It changes differently according to distance from centre.
@capybara This has already been said, but the uniform random distribution, with the transformations applied to it, is still the uniform random distribution. Point density doesn't change. That's because for every point, there are 8 points that can be mapped to it after transformation, each with 1/8 chance. If you're getting anything different you might have a bug.
@capybara Do I understand the test you are proposing right?
Generate 2 points. Notice how often both end up in a 10x10 square, call this P1.
Generate 2 points and apply the transform, see how often both end up in the same 10x10 square, call it P2.
And the claim is that P1!=P2? Is that right?
@Shump Yes, but 100 points sampled from uniform distribution is it’s own thing.
Example: expected neighbor distance differs for 50 vs 100 point samples. So clearly there are properties of 100 point samples that are not properties of the uniform distribution.
@Tasty_Y Yes. Choose an test location that’s away from centre for maximum effect, or just average over all possible locations of 10x10 box.
@capybara I went ahead, did the test as described and P1 and P2 became equal with enough trials, as was expected. I think you are very much mistaken, feel free to write some simulations for yourself and see.
@capybara it should be equal if you perform this experiment many times, because you are not changing the distribution.
@Shump I think that is this might be the point. I believe that the average is the same across many tests, but the variance increases
I tried a few metrics and couldn't find any evidence to support the idea about radial spread and the box test, so that looks to be false intuition!
Having said that, I would like to present an alternative perspective. What is useful to do when you know the transforms have been applied is to realise that from the single given rand_transform(human) we can compute all possible transformed point sets. No single one is better, and given that we can, we should consider all of them instead of just the one we are given. Below is the scipy.discrepancy for 10k random samples (blue), 10k transforms of reds (orange) and 10k transforms of greens (light green). The original red and green discrepancy are the dashed red and green lines.
You should do this so as it is free information as to whether the single given human generated sample has had an unlikely transform.
With this, I think a lot of metics that are 9/1 in favour of green being human will regress to say 6/5. In the end, this idea can't be used to favour No, but it can certainly be used as an argument as to why odds should be less than say 70% Yes.
So I think there was a signal to which I was ascribing a very wrong interpretation, and @Shump's argument about regression to the mean is probably a better line of intuition.