I took an undergrad organic chemistry class shortly after ChatGPT came out. It was extremely bad at it. Now I am taking a graduate class in organic chemistry. I will be impressed if I cannot come up with a range of simple problems at which it fails. I do not expect it to be good at drawing or reading diagrams/structures, so will be super-impressed if it can, but do not require it.
I will only use the publicly available (possibly premium, but not the very expensive ones) LLMs. Resolves no later than start of June.
@skibidist is it fine if it fails sometimes but succeeds sometimes? basically it's consistently succeeding at pass@10 but doesn't consistently succeed pass@1?
@Bayesian I gave gemini 2.5 pro exp a few screenshots of textbook problems with very simple structures. It is actually good at recognizing simple structures and reaction mechanisms. The answers to problems were between impressive and passable and wrong. It also tried to produce diagrams in response, which kind of sucked, but weren't wrong. So quite promising first step.
I will get deep into it and will report back in a few weeks. I will also look for an LLM and prompting approach that works best and judge based on that.
is it fine if it fails sometimes but succeeds sometimes? basically it's consistently succeeding at pass@10 but doesn't consistently succeed pass@1?
That should be fine, particularly for problems that require several reasoning steps to do.