Will a LLM/elicit be able to do proper causal modeling (identifying papers that didn't control for covariates) in 2024?

Especially the melatonin=>longevity paper Mike Lustgarten tweeted

