Will an LLM be able to match the ground truth >85% of the time when performing PII detection by 2024 end?
Dec 31

PII - personal identification information

Stuff like people's names, numbers and codes that identify stuff (SSN, phone number, passport etc), places, locations, names of orgs, attributes that can be used to identify a person, etc.

GPT-4 outperforms Presidio, Microsoft's custom built tool for PII detection. GPT-4 matches ground truth ~77.4% of the times, while it misses a single PII element ~13% of the time.

Assume this includes both false positives and false negatives? What's the denominator?

predicts YES

Just a complete side question, what are the legalities or what are the complicating factors in using a GPT against PII? So, it has to be trained on dummy PII, right? How much dummy PII is needed to train that 85% level you are referring to?

bought Ṁ110 of YES

@PatrickDelaney I think microsoft tested against their in house system, which does detect PII on real data

