How will my vibe-coded rewrite of a SoTA genomics program perform?
10
125Ṁ113
Jan 8
13%
More than 3% better
6%
Within 3%, better or worse
9%
More than 3% worse, less than 10% worse
59%
More than 10% worse
13%
Re-write is too non-functional to compute accuracy

When I test my vibe-coded Rust rewrite of Beagle, an industry-standard SoTA genotype imputation program, how will the accuracy change?

Accuracy will be measured by R^2 (estimated dosage vs. actual dosage) on the 1KG+HGDP samples (80/20 test train split) in which the test set is downsampled to microarray markers.

The R^2 of Beagle is already pretty good, usually a bit above 0.8 for microarray-style data. The "%" in the question refers to the absolute difference in R^2 between standard Beagle and the rewritten version. (If Beagle has R^2 = 0.80 and the rewrite has R^2 = 0.78, this would correspond to 2% worse.)

Vibe-coded definition:
- 100% of the LoC written by AI (I can still edit config files like .gitignore or Cargo.toml).

- I will ask broad, open-ended questions to AIs (like "find logic mistakes" or "figure out why X happens," usually to enable communication between Claude and Gemini. I can also ask them to complete tasks, like "add speed benchmarks" or "add difficult integration tests that measure XYZ."

- I'm not planning on reading all of the code, though I'll skim the diffs.

- So far, I've been mostly using Gemini 3 Pro for review and Opus 4.5 for implementation, but I am free to use any AI tool (e.g. Codex).

Feel free to ask questions to gain more information.

Market context
Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy