Suppose the following, or something very similar, was done:
Sampling
K1 (with K1 >= 4) control horoscopes are sampled by picking K1 randomish people who are good writers, don't believe horoscopes work and feel like they could write convincing false horoscopes are challenged to write pseudo-horoscopes for each day in a week. (They don't have to write one for each star sign, it's enough to just write 5 or 7 horoscopes each. So we would have somewhere between K1×5 to K1×7 horoscopes.)
K2 (with K2 >= 4, probably K2 ~= K1) astrologists, horoscope websites, or whatever are sampled, and will be asked to each provide one weeks worth of horoscopes for each star sign.
N (with N >= 12×2×max(K1, K2)×2 or so, to cover 12 star signs×2 intervention conditions×max(K1, K2) horoscope generators×a bit extra to have within-group variance) people of varying star signs and attitudes to horoscopes are recruited for the purpose of testing the validity of horoscopes. They do not receive information about the details of the study, other than that they will be given horoscopes and asked how accurate they were.
Intervention
Each person is assigned a condition and a set of horoscopes. Either they will get their entire week of horoscopes at the beginning of the intervention, or they will get one horoscope every day to read.
Measures
At the end of the intervention, they will be asked to rate how helpful they found the horoscopes in some way (maybe just on an arbitrary 1-10 scale).
Realistically if I run this study myself, I would probably also collect way more data, e.g. their attitudes and experiences with horoscopes prior to the intervention, their opinion on how good each horoscope was along the way, qualitative data on which ways the horoscope was or was not a hit, etc.., to be able to characterize the exact details of how well horoscopes work.
Resolution criterion
The average helpfulness of each horoscope generator is computed. Then the Cohen d between the real horoscopes' helpfulness vs the fake horoscopes' helpfulness is computed. If d>1, this market resolves YES. Otherwise this market resolves NO.
Notes
I unironically ended up considering running this study. However, I am wayy overbooked right now, so I really shouldn't. I'd encourage someone to run it. I've been a lifelong skeptic, so for most of my life I wouldn't have believed this would work. However recently someone challenged me to steelman horoscopes, and after thinking for a bit I came up with a reason that this could actually work out (even if the outcome criterion was made more rigorous than "maybe just on an arbitrary 1-10 scale").
The study must be reasonably preregistered so we know it's not just publication bias.