Will there be a reliable technology to sign videos such that it reliably resists AI fakes?
12
97
250
2028
21%
chance

Whether the private key is held in software or hardware chips.
The video-image should be authenticated.
Whether it is done threw watermarks or Steganography, or audio channels.
(It does not use meta data and is signed on the media itself).



It should not be necessary to sign every bit of data.
But to authenticate features well enough to be practical. Such as faces, signs and text and color.
Basically the standard is that it would prevent any reasonable misrepresentation such that a human could be deceived about the facts of a situation.
(Think security footage in a court of law for example.)

I'm giving this until 01JULY2028

It should use probably cryptographical principals at-least in part or have some other standard that gives confidence that it is not subject to AI size/intelligence measuring contest at-least by the prevailing opinion of the time.

I will refrain from betting on this market.

Get Ṁ600 play money
Sort by:

We already have foolproof digital signature algorithms, (RSA, ECDSA, ML-DSA, etc. etc.), which can be applied to any data format including but not limited to videos. They can detect any form of tampering with 100% reliability. In courts of law, cryptographic hashes (not even signatures) are used to fingerprint digital evidence, and have been for many years. What stops these existing solutions from resolving this market YES?

@Retr0id No that signs a video. Until you can embed that signature in video stenography, an overlay or in an an audio clip such that that file is moved around the internet then the mission isn't complete.

Then what you want is provably impossible :)

Then produce the proof.

@PaulDwyer Sure. You have a file. You sign it. You embed the signature into the original file. Any modification to the file invalidates the signature, so anyone wanting to verify the signature needs to know how to separate the signature from the original file.

Thus, if you're embedding signatures steganographically, that requires the ability to strip the watermarked signature before you can verify it. If the watermark cannot be stripped, then nobody can verify the signature. If the watermark can be stripped then your system is broken.

If you wanted to construct a signature scheme that is resilient to benign modifications of the input file, what you'd need is a "fuzzy hash" https://en.wikipedia.org/wiki/Fuzzy_hashing

Unfortunately, fuzzy hashes are explicitly not cryptographically secure.

"Fuzzy hashing algorithms specifically use algorithms in which two similar inputs will generate two similar hash values. This property is the exact opposite of the avalanche effect desired in cryptographic hash functions."

You can already decode messages with embedded video data. Via Video stenography techniques.

There is nothing that requires to to sign the full domain of the video either. You could simply collate data from the video image as sign on key frames.

If you google my name you might find news articles talking about my image steganography research ;)

Fuzzy hashing is still secure to the mapped abstracted data is it not? Whether or not minor changes are made that's not whats being secured. If you are signing arbitrary small color groups for example. you may be able to forge minor details about a given color blob. But not construct someone elses face on another face for example.

For those purposes it would be secure still. For that reason 100% data mapping wasn't part of the problem set described, artifacts and perturbations are acceptable so long the fake cannot construct meaningful features.

I think your in the mindset of File = sign. Not sub-features of the dataset being signed to other domains of the image, or frames, keyframes or just signing it to embedded data on the audio.

Re: "it should not be necessary to sign every bit of data..." the standard way of digitally signing a file is to take a hash of the file and then encrypt the hash + certificate with a private key (with the certificate being metadata and a public key). Thus, modifying the file is detectable, and all of the file has been digitally signed, but the asymmetric encryption doesn't have to run on the full file. This is OK and if this is part of the solution it could resolve yes, right?

Why it is done this way is because asymmetric encryption of large files is computationally expensive, but hashing them is not.

@equinoxhq
Essentially we are exploiting any combination P!=NP Such that the implementation prevents any significant misrepresentation of the data.
Recompressing a video can change the raw data not doesn't misrepresent the features within it.

However the domain in which the information is signed should not be metadata. So that the video can be verified from the video data. (Video files get remade in raw data as they get pushed around the internet from platform to platform, resizing and recompressing, switching codecs and what not).

The expectation isn't that there is a fully robust, fully signed video. The expectation is essentially there is a product that works for practical purposes on online platforms to which a end user should be able to verify with a high confidence that it comes from a certain source without having the original file.

I'll add some clarify in the description in a little bit.

@equinoxhq Sorry the description around.

"Whether it is done threw watermarks or Steganography, or audio channels."
Was intended to describe that the signature exists on the media. I have done a poor job of being concise.

@PaulDwyer

The expectation isn't that there is a fully robust, fully signed video. The expectation is essentially there is a product that works for practical purposes on online platforms to which a end user should be able to verify with a high confidence that it comes from a certain source without having the original file.

Hm. I'm not sure how you are thinking that will work. Like, if you allow modification of the file while still calling it "not a fake", you're opening a giant can of worms and I don't see how to close that can up again well enough for people to have confidence in the conclusion. Anything I can think of (admittedly I've only been thinking for 5 minutes), could be completely defeated, as in the deepfake generating program could take an original file that has been verified, modify it to appear to show whatever, while preserving the bits that indicate it's authentic. Watermarks and steganography, while allowing the file to be modified, wouldn't work. You could just modify the file, and then put the watermark on or the steganography in the new file. It is super not hard to put a visual watermark or a bit of steganography into whatever file you want, this would be no barrier to a deepfaker.

An idea I had, to allow for transcoding and such, would be, sign the full original. Then, any modifications are made by a trusted program which, in its signature of the modified file, contains a link to a place where the original is stored. You'd still have to keep and sign the original true copy the way you sign a domain name, but this could allow you to muck about with multiple descendant copies.


Ah, here's an idea I didn't have when I started typing: Are you thinking maybe the camera does some analysis of the video, maybe a text transcript and description of what is shown, and that transcript and description are stored as text metadata, and that is signed and can't be modified? Then, you couldn't trust the visuals or audio in the video, but you could compare them to the trustable summary.

@equinoxhq
encoded message in the Steganography, audio or watermark contains the signature, verified by a public key in some external database. like DNS or something.

Since the watermark, stenography or audio is not a static message but a signature dependant on the features that you are signing in. The deep fake would have to edit the data such that the feature is still the same.

If the features are selected properly this should mean that the domain in which the AI could modify data should be constrained enough so as to not do anything interesting.

The transcript metadata idea is new to me and interesting, but for the purposes of this market will be out of scope.

@PaulDwyer Actually I there should be conditions in which the transcript may be functionally the same. I will reserve judgement as to whether or not the spirit is essentially the same. Essentially the only trusted person should be the source and not any intermediate platform. If the platform holds the metadata that describes the features well enough and that is signed within the media then the conditions sound complete.

I'm not trying to constrain the technological solution. If the elements are verifiable to the source that should be enough.

@PaulDwyer I will note that the "have the transcoding software sign its descendant copy" solution has problems. Normally you want anything that can sign anything to be pretty well protected (because in order to create a certificate, it's got to have a private key, which you don't want getting out). I could see such transcoding software running in a protected environment in a Youtube server farm or something, and that being fairly trustworthy - but you couldn't have a situation where you modify a file using Shotcut or some other widely available software program and that's trustworthy.

On the topic of security camera video... it is not in fact trustworthy, and there is not anything you can borrow from what is currently done with (most) security cameras that would be useful for preventing deepfakes. (Caveat: There might be some extra expensive camera that would provide some assurance. But most security cameras are well, not great). I mean, security camera footage will hold up in a court of law because the people involved (judge, jury) don't know the ways trusting it can go wrong and the defense doesn't have access to an expert who can pick it apart (there aren't that many), but it takes an expert witness to analyze a digital video clip and confirm they haven't seen any evidence of tampering, otherwise there totally could be tampering. You can't just assume video files haven't been tampered with because they came from a security camera. I've actually sat in on sessions where such expert witnesses talk investigators through what they check for (at this conference: https://www.htcia.org/2023-international-conference-and-expo/), and other sessions where they talk us through how things like efforts to compress the video introduce artifacts that can cause problems when trying to use video to prove a conclusion. One I remember, they were trying to prove that someone had sped through a red light before a car accident, but the camera had compressed the video such that the number of frames per second was constantly varying so the distance travelled per frame, which the prosecution was using to prove the speed the car was going, was meaningless, and sometimes features of the background would jump around because if not too many pixels had changed the camera would discard the changes to save space. This wasn't even the effort of anyone to tamper with anything, it was just how the security camera worked, because the people who chose the codec when designing the camera prioritized keeping as much footage in as little hard drive space as possible so they could market their device as being able to keep X length of video at Y price, over making sure the video was good quality and accurately corresponded to reality.

Since the watermark, stenography or audio is not a static message but a signature dependant on the features that you are signing in. The deep fake would have to edit the data such that the feature is still the same.

This is the key idea, and hashing is the standard solution for "making your signature dependent on the features that you are signing". Hashing the whole thing makes the signature dependent on every detail of the video. If you wanted, you could hash some subset of features, I guess, but any features you digitally sign, can't change - bit for bit, they have to stay the same, whereas transcoding often changes all the bits. Change one bit in the data that has been hashed as a part of the signature, and the hash will change completely, and your signature checking algorithm will say "no match". There isn't a technology that will get the nod of approval from cryptographers that will let you make a video that is mostly the same to the human eye while having been modified subtly.

Also important, if you want this to happen by 2028, and also to get the nod from security professionals in the way that our standard cryptographic algorithms are seen as secure, the solution can't involve inventing anything new, has to re-use things we currently know how to do. Any new cryptographic technique or algorithm takes 5-10 years of testing at minimum before the cryptographers will call it secure. So, if you want something that professionals will tell the public to trust, you can't hope we invent something between now and 2028, you've got to hope someone puts together pieces of technology that exist currently to achieve the goal you want.

I wouldn't encode a message in steganography. Just use a certificate, signing whatever you want to trust, in the standard way. Steganography (in either video or audio) or encoding a message in a watermark adds complication without adding trustworthiness.

I've had a lot of thoughts about this, hope some of them are useful 😆

Isn't this very similar to the DRM problem? It's definitely possible to make it harder to copy stuff but impossible to completely prevent. Hard to imagine any solution that wouldn't be fooled by filming a screen showing a fake with a "trusted" camera.

@Thomas42 I actually think "prevent people from copying this, or playing an unauthorized copy, while allowing them to read it and play an authorized copy" is a different and harder problem than "sign this digitally so we know which camera made it".

Possible solution to filming a screen: Trusted cameras have binocular vision, and can tell the difference between filming a screen and filming a 3d object, by analyzing parallax? I only actually have monocular vision, but I understand that for most people, 3d movies and 2d movies are quite different.

Alternatively, Lidar or infrared. The same sort of technology that Apple is using to make sure you can't just hold up a photo of someone's face to unlock their phone? Or multiple microphones so that if sounds are coming from a speaker rather than a natural environment, the difference is noticeable? Gyroscopes on the camera, so that if the image moves by the gyroscope indicates the camera hasn't moved, a red flag can be raised?

Generalized approach: More sensors on the trusted cameras, to capture more of the environmental factors and make it harder for fakers to trick the camera into thinking fakes are real. Hackers could of course just hook AI generated inputs into all the sensors, given a sufficiently multimodal and trained model... I guess what matters is how "reliable" is needed. My guess is the problem isn't "people will be fooled by hyperrealistic fakes undetectable to humans", but rather more like the current situation "people will be fooled by fakes that are easily detectable with a little effort, which tell them things they'd like to be true". So probably this arms race tops out at "we've made it hard enough to generate undetectable fakes that people just don't bother, and use detectable fakes instead, which works well enough for their purposes. Nation-states go ahead and use undetectable fakes when needed of course, but standard cameras aren't meant to protect against someone trying as hard as possible with a lot of resources being put behind the effort".

2-3 years ago I thought I had 5 years to solve this problem before trivially assembled AI fakes would fool humans. Hah-hah-hah.

More related questions