This market asks the question of whether OpenAI's results for RAG did indeed reach 98% accuracy as is claimed in the tweet or not. I currently do not have more information about this, but apparently this is from a talk.
Top level comment with relevant OpenAI Dev Day talk:
https://www.youtube.com/watch?v=ahnGLM-RC1Y
Also including the rest of the Dev Day talks: https://community.openai.com/t/openai-dev-day-2023-breakout-sessions/505213
@e_gle I sold for a loss after watching this. They demonstrated accuracy with a real (private) customer
Since no one else has actually explored this question, I did some additional digging. RAG is not a benchmark. https://twitter.com/mayowaoshin/status/1721837978840895843
OpenAI's talk displays a roadmap to 2% hallucination rate (98% accuracy) by using Retrieval-Augmented Generation with a number of additional techniques layered on top. The test data was not shared, so we do not know if this was actually achieved in a real world setting or just on some small specific data set.
I think the way you have asked this question makes the market unable to resolve. "OpenAI's results for RAG... reach 98%" is not a real question.
"Did OpenAI display a roadmap to 2% hallucination rate by using RAG techniques?" That question probably resolves as Yes.
"Did OpenAI publicly release an LLM that achieves 98% accuracy in real-world data sets?" That resolves to No.
@e_gle Well, of course we don't know what they tested it on, which is why its speculation. If correct, I think it's reasonable to assume that they didn't just overfit a dataset or use a contaminated test set and call it 98% accuracy. It's based on that assumption that we can speculate, but of course, as jack says below, it's meaningless without context.
P.S. Can you link to the talk you're refering to?
I'm interpreting the question narrowly as "Did they really get the claimed results, 98% on whatever RAG benchmark that they talked about", based on the previous clarifications. I think other interpretations don't really make sense with the question.
@jack Yeah, given the limited amount of information there is publicly available, that's what makes the most sense to ask
To be clear, I think this question is largely pointless. I'm very confident that the slide is likely as accurate as any other slide of its nature, and that the tweet that this has anything to do with the OpenAI/Altman drama is not accurate.
@jack That's what I'm leaning towards as well. The slide's validity is what the market was asking and, it seems likely accurate, but i also agree with this being a pointless question now
@SneakySly Yep! I think the only way this resolves NO is if the performance claim of 98% was just made up, or fraud, or wrong due to an innocent mistake. And I see no reason to expect that.
@firstuserhere tweet makes a pair of claims: the performance graph, and the claim that openai got spooked as a result. Your title implies the question is only about the former but the description implies it's about the whole tweet. Which is it?
@firstuserhere reminder on the above question. Completely different questions depending on if I go by the title or the tweet.