Detecting "real" face vs picture through webcam #206
-
Let's say I wanted to build an authentication mechanism that detects faces through a webcam and matches them with reference pictures in a database (like a poor-mans version of faceid). I was wondering if there was a way to detect if someone is holding a printout of a picture in front of the camera instead of a real face? I know telling the difference from a still is probably hard, but analysing the differences in the frames from the webcam maybe we can tell the difference between a "static" face (like a printout or a mask) and a live, moving face? We could probably use information from other parts of the body as well. I don't know nearly enough about CV to know how one would go about this or how hard it would be. Any ideas? |
Beta Was this translation helpful? Give feedback.
Replies: 16 comments 10 replies
-
the problem seems to be hard, because it requires combining Computer Vision with Timeseries analysis and final model still is highly vulnerable for other ways of exploiting this authentication method (playing a video of user's face on phone screen). The silly idea that comes to mind is -- try recreating the android's 3x3 slide grid unlock screen in a way that the "slides" are performed by users head movement or facial expression. |
Beta Was this translation helpful? Give feedback.
-
Real solutions require that camera input is not only RGB, but also has Depth map and IR channel Btw, there was an open challenge in 2019 on this topic, so searching for papers that enrolled is quite informative Anyhow, lets think outside of the box...
I like the topic, I think I'll do a demo with a simple heuristic ML model combined with blink detection... |
Beta Was this translation helpful? Give feedback.
-
Yes.
I'm thinking first pass until no "blink" is detected and second pass until "blink" is detected - so both have to be met over time. Also adding additional checks such as 'facing center' and 'looking center' and validation of face overall size, etc. |
Beta Was this translation helpful? Give feedback.
-
take a look at i'm doing a quick train of an additional model to check liveness of input and then i'll test that and add descriptor database matching to demo. |
Beta Was this translation helpful? Give feedback.
-
Poor-mans FaceID demo is now online :)
Both
Anti-spoofing ModuleChecks if input is realistic (e.g. computer generated faces) Liveness ModuleChecks if input has obvious artifacts due to recording (e.g. playing back phone recording of a face) Configuration: NoteThere is a lot of room for improvement, this is just a quick prototype |
Beta Was this translation helpful? Give feedback.
-
@MaKleSoft it looks for discontinuity between the foreground object (e.g. phone) and the background (what is behind the phone) |
Beta Was this translation helpful? Give feedback.
-
well, thats the magic of generic CNN networks - and what are weights? whatever it was trained on so if it is like you say that its nearly impossible to avoid obvious reflections, then its already trained on that (even if that was not my intention) :) biggest issue is that my training dataset for it was really small - 1000 frames from one video of me sitting in front of webcam and one video of the same played back through the phone - i need to get more sample videos and retrain it, this was a quick test and surprised how well it works as-is |
Beta Was this translation helpful? Give feedback.
-
Quick question: How did you arrive at the minimum box size of 224 (I assume that's in pixels)? https://github.com/vladmandic/human/blob/main/demo/facerecognition/index.ts#L33 Seems like an oddly specific number 😄. |
Beta Was this translation helpful? Give feedback.
-
every classifier model is trained on specific resolution and (yes, if face is larger, it will be sharper after resizing, but that is due to camera sensors precision. this is especially true in poor-light conditions) btw, detector models can be variable resolution, but that only means they perform a lot of matrix ops and resizing and at the end all internal work is again at fixed size. however, classifier models (such as i could retrain model at higher resolution which would result in higher accuracy, but its a question of performance and size - double the resolution and performance will tank 2x and size will grow 4x (those are just average numbers, real numbers depend on the model internal architecture) you can see signatures of all models used by human at https://github.com/vladmandic/human/tree/main/models |
Beta Was this translation helpful? Give feedback.
-
Please excuse my last message (deleted now). I wrote it quickly because I was in a rush to go somewhere, and now I'm reading this whole thread and it clarifies much of the concern. Your gestures implementation looks very interesting for photo detection. The valid part of that last message still is that doesn't seem to have caught obvious foreground/background inconsistencies, as shown in screenshot below. |
Beta Was this translation helpful? Give feedback.
-
@jacobg i've written liveness and antispoof modules more as test and they are very simple - there is a lot that can be improved there {
boxScore: 0.9,
faceScore: 1,
age: 25.4,
gender: 'male',
genderScore: 0.96,
real: 0.91,
live: 0.89,
} |
Beta Was this translation helpful? Give feedback.
-
@vladmandic Have you tested any other liveness detection models such as? |
Beta Was this translation helpful? Give feedback.
-
@jacobg I haven't spent much time on this as its mostly a side-thing in looking at two links you've shared, first one would be easy to implement (contributions are welcome!), and second one is not worth it - looking at model architecture, its extremely simple |
Beta Was this translation helpful? Give feedback.
-
@vladmandic Thanks for taking a look. |
Beta Was this translation helpful? Give feedback.
-
Are you finding success in the eye blink gesture detection? Lines 65 to 68 in dc20a4d For me, it's not detecting blinking at all. From reading more about facemesh, they describe it is a "rigid transformation of a canonical face model". My understand is what that means is it generally tries to approximate the entire face, but much of the detail would be speculated based on some canonical idea of what a face is supposed to be. For example, you would put your hand over an eye, it will still show an "eye". There are other face models out there that don't do that, but rather would acknowledge the eye disappearing. Similarly, when I blink my eyes, the eye contours don't change at all. Is that how you understand it? |
Beta Was this translation helpful? Give feedback.
-
i have no issues with blinking detection. btw, you might want to check but yes, you are correct in general how facemesh works - it uses predefined 468 point mesh which is first pre-transformed using hard-coded uvmap ( human/src/face/facemeshcoords.ts Line 73 in dc20a4d |
Beta Was this translation helpful? Give feedback.
Poor-mans FaceID demo is now online :)
demo/facerecognition
:antispoofing
optional moduleliveness
optional moduleBoth
antispoof
andliveness
models are tiny anddesigned to serve as a quick check when used together with other indicators:
antispoof
and 23 ops forliveness
)