Data Shapley values https://arxiv.org/abs/1904.02868 have been introduced as a way to measure how valuable each training sample is to a machine learning model. This allows to sift through training samples, identifying the most/least valuable ones according to a suitable metric.
In the context of data poisoning, where some samples in the training data have been intentionally manipulated by an adversary to reduce the performance of a model trained on them, will data Shapley values prove pivotal to an efficient strategy to counter poisoning?
My judgment of the publicly available literature plus discussion with colleagues. Resolves at the end of 2025.
iiuc, data poisoning attacks don't need to make a model learn any less, just learn an incorrect output for certain inputs. If this is done sneakily enough, it won't even show up on standard model evaluation metrics. A well-poisoned sample might increase model performance overall (by some objective metric), so it seems hard to counter it like this.
@mariopasquato A sample that says something true about the world, but isn't repeated thousands of times in the data.
@VitorBosshard I see. Assuming that poisoned samples can be detected upon closer inspection but it’s too costly to inspect all training samples, this would still be valuable