
https://huggingface.co/spaces/tomg-group-umd/Binoculars
Over a wide range of document types, Binoculars detects over 90% of generated samples from ChatGPT (and other LLMs) at a false positive rate of 0.01%, despite not being trained on any ChatGPT data
Is there a correlation between Binoculars score and sequence length? Such correlations may create a bias towards incorrect results for certain lengths. In Figure 12, we show the joint distribution of token sequence length and Binocular score. Sequence length offers little information about class membership
I ran my own test here and here and it was very effective. Will there be a way for the general public to evade it? Quality must be similar to gpt3.5/gemini pro, it can be a finetuned model, something you put gpt3.5/gemini pro text into, etc. This applies to the current version of binoculars, not just future improved versions.