Detecting "real" face vs picture through webcam #206

MaKleSoft · 2021-11-09T08:58:08Z

MaKleSoft
Nov 9, 2021

Let's say I wanted to build an authentication mechanism that detects faces through a webcam and matches them with reference pictures in a database (like a poor-mans version of faceid). I was wondering if there was a way to detect if someone is holding a printout of a picture in front of the camera instead of a real face? I know telling the difference from a still is probably hard, but analysing the differences in the frames from the webcam maybe we can tell the difference between a "static" face (like a printout or a mask) and a live, moving face? We could probably use information from other parts of the body as well. I don't know nearly enough about CV to know how one would go about this or how hard it would be. Any ideas?

Answered by vladmandic

Nov 9, 2021

Poor-mans FaceID demo is now online :)

demo/facerecognition:

Starts webcam
Waits until input video contains validated face or timeout is reached
- Number of people
- Face size
- Face and gaze direction
- Detection scores
- Blink detection (including temporal check for blink speed) to verify live input
- Runs antispoofing optional module
- Runs liveness optional module
Runs match against database of registered faces and presents best match with scores

Both antispoof and liveness models are tiny and
designed to serve as a quick check when used together with other indicators:

below 1MB
very quick inference times as they are very simple (11 ops for antispoof and 23 ops for liveness)
trained on low-re…

View full answer

Shandelier · 2021-11-09T10:00:14Z

Shandelier
Nov 9, 2021

the problem seems to be hard, because it requires combining Computer Vision with Timeseries analysis and final model still is highly vulnerable for other ways of exploiting this authentication method (playing a video of user's face on phone screen).

The silly idea that comes to mind is -- try recreating the android's 3x3 slide grid unlock screen in a way that the "slides" are performed by users head movement or facial expression.

1 reply

MaKleSoft Nov 9, 2021
Author

the problem seems to be hard, because it requires combining Computer Vision with Timeseries analysis and final model still is highly vulnerable for other ways of exploiting this authentication method (playing a video of user's face on phone screen).

Well I'm not looking for something completely secure or unspoofable - I don't believe that's possible with this technology anyway. It's more about making it harder to fool the system (using a video is already more sophisticated and harder to pull off than just a picture).

The silly idea that comes to mind is -- try recreating the android's 3x3 slide grid unlock screen in a way that the "slides" are performed by users head movement or facial expression.

Yeah, I'm guessing the easiest and most reliable way would probably be to ask the person to perform some kind of gesture randomly selected from a large enough set (e.g. thumbs up, victory sign, holding up a certain amount of fingers etc.). Still, it would be nice if no instructions or explicit action from the person would be required at all.

vladmandic · 2021-11-09T12:34:05Z

vladmandic
Nov 9, 2021
Maintainer

Real solutions require that camera input is not only RGB, but also has Depth map and IR channel
Without that, its not considered secure enough for certification

Btw, there was an open challenge in 2019 on this topic, so searching for papers that enrolled is quite informative
https://openaccess.thecvf.com/content_CVPRW_2019/html/CFS/Liu_Multi-Modal_Face_Anti-Spoofing_Attack_Detection_Challenge_at_CVPR2019_CVPRW_2019_paper.html

Anyhow, lets think outside of the box...

If camera allows for manual focusing, then comparing images with varied focus can be pretty good way to detect 2d vs 3d input
But that is not possible in JS as there is no API for that
Image liveness analysis: plenty of ways to do this
- Proper optical-flow algorithms (see Lucas Kanade method)
- Simple time variant input difference calculation (like I'm using for frame change detection in Human 2.5)
- Specialized texture analysis (calculating binary patters over face regions) or frequency analysis (possibly running FFT)
  And that's what most ML pretrained models do if they're not doing it properly (see 1st paragraph),
  but almost none are properly documented
Detect specific movement. And best and easiest I can think of it is eye blink (plus its already present in Human)
Blink happens on average every 3-4sec, so average time to validation is ~2sec which is ok
(and yes, I don't like the idea of asking user to perform any specific action)

I like the topic, I think I'll do a demo with a simple heuristic ML model combined with blink detection...
Also challenge will be dealing (or disallowing) face obstructions - for example, glasses have major impact on face descriptor

1 reply

MaKleSoft Nov 9, 2021
Author

Using blinking as a marker for authentic presence is a genius idea. The great thing about blinking is that it's a natural motion that is both easy to detect and hard to fake using pictures (obviously a video would still fool a blink detector).

So how would that work exactly? I'm guessing Human doesn't actually detect blinking but simply whether one or both eyes are closed in a given frame, correct? So I guess I'd simply look for a frame where at least on eye is closed within a certain time interval?

vladmandic · 2021-11-09T13:32:53Z

vladmandic
Nov 9, 2021
Maintainer

So how would that work exactly? I'm guessing Human doesn't actually detect blinking but simply whether one or both eyes are closed in a given frame, correct?

Yes.

So I guess I'd simply look for a frame where at least on eye is closed within a certain time interval?

I'm thinking first pass until no "blink" is detected and second pass until "blink" is detected - so both have to be met over time.

Also adding additional checks such as 'facing center' and 'looking center' and validation of face overall size, etc.

0 replies

vladmandic · 2021-11-09T15:40:13Z

vladmandic
Nov 9, 2021
Maintainer

take a look at demo/facerecognition for the initial template, all of the above checks are implemented

i'm doing a quick train of an additional model to check liveness of input and then i'll test that and add descriptor database matching to demo.

2 replies

MaKleSoft Nov 9, 2021
Author

Awesome! For the blinking detection, maybe also measure the duration of the blink and check if it's short enough to be a real blink and not just someone quickly swapping out pictures? I've done some experimenting myself and found that the following works well:

let blinkStart: number | null = null;

function loop() {
    const result = await this._human.detect(this._video);

    const gestures = new Set(result.gesture.map((g) => g.gesture));
    if (gestures.has("blink left eye") || gestures.has("blink right eye")) {
        if (!blinkStart) {
            blinkStart = Date.now();
        }
    } else {
        if (blinkStart) {
            console.log(
                `Caught you blinking! Your blink lasted ${((Date.now() - blinkStart) / 1000).toFixed(
                    2
                )} seconds.`
            );
            blinkStart = null;
        }
    }

    requestAnimationFrame(loop);
}

I'm getting values between 50 and 200ms. Anything > 500ms is almost definitely not a real blink.

vladmandic Nov 9, 2021
Maintainer

Good suggestion!

vladmandic · 2021-11-09T19:39:29Z

vladmandic
Nov 9, 2021
Maintainer

Poor-mans FaceID demo is now online :)

demo/facerecognition:

Starts webcam
Waits until input video contains validated face or timeout is reached
- Number of people
- Face size
- Face and gaze direction
- Detection scores
- Blink detection (including temporal check for blink speed) to verify live input
- Runs antispoofing optional module
- Runs liveness optional module
Runs match against database of registered faces and presents best match with scores

Both antispoof and liveness models are tiny and
designed to serve as a quick check when used together with other indicators:

below 1MB
very quick inference times as they are very simple (11 ops for antispoof and 23 ops for liveness)
trained on low-resolution inputs

Anti-spoofing Module

Checks if input is realistic (e.g. computer generated faces)
Configuration: human.config.face.antispoof.enabled
Result: human.result.face[0].real as score

Liveness Module

Checks if input has obvious artifacts due to recording (e.g. playing back phone recording of a face)

Configuration: human.config.face.liveness.enabled
Result: human.result.face[0].live as score

Note

There is a lot of room for improvement, this is just a quick prototype
Contributions are welcome!

1 reply

MaKleSoft Nov 10, 2021
Author

That's amazing! Would you mind elaborating briefly on how the "liveness" score is calculated? You mentioned artifacts - are we talking about things like screen reflection or something more subtle?

vladmandic · 2021-11-10T13:20:47Z

vladmandic
Nov 10, 2021
Maintainer

@MaKleSoft it looks for discontinuity between the foreground object (e.g. phone) and the background (what is behind the phone)
if the phone video presented covers webcam 100%, it doesn't really do anything
but given that on phone cams or webcams the average lens focal length is really wide, that is hard to do in practice

1 reply

MaKleSoft Nov 10, 2021
Author

Interesting! Is there a way to detect screen reflections as well? Because in my tests I found it almost impossible to hold the phone at an angle that didn't result in very obvious reflections. This should also catch instances where someone manages to cover the entire camera region with something like a tablet.

vladmandic · 2021-11-10T13:46:33Z

vladmandic
Nov 10, 2021
Maintainer

well, thats the magic of generic CNN networks - liveness is a very simple model that doesnt define anything really specific, just runs several weighted convolutions and batch normalizations (and activations at the end to get the score)

and what are weights? whatever it was trained on

so if it is like you say that its nearly impossible to avoid obvious reflections, then its already trained on that (even if that was not my intention) :)

biggest issue is that my training dataset for it was really small - 1000 frames from one video of me sitting in front of webcam and one video of the same played back through the phone - i need to get more sample videos and retrain it, this was a quick test and surprised how well it works as-is

1 reply

MaKleSoft Nov 10, 2021
Author

Thanks for taking the time to explain! Now excuse me while I read up on CNN networks so I can actually understand what you just said 😅

MaKleSoft · 2021-11-10T15:02:03Z

MaKleSoft
Nov 10, 2021
Author

Quick question: How did you arrive at the minimum box size of 224 (I assume that's in pixels)? https://github.com/vladmandic/human/blob/main/demo/facerecognition/index.ts#L33 Seems like an oddly specific number 😄.

0 replies

vladmandic · 2021-11-10T15:16:57Z

vladmandic
Nov 10, 2021
Maintainer

every classifier model is trained on specific resolution and faceres that does face descriptor analysis is trained on 224px
so no matter what the detected face size is, it will be resized to 224px anyhow by human before passed to for face descriptor processing

(yes, if face is larger, it will be sharper after resizing, but that is due to camera sensors precision. this is especially true in poor-light conditions)

btw, detector models can be variable resolution, but that only means they perform a lot of matrix ops and resizing and at the end all internal work is again at fixed size. however, classifier models (such as faceres are always fixed resolution)

i could retrain model at higher resolution which would result in higher accuracy, but its a question of performance and size - double the resolution and performance will tank 2x and size will grow 4x (those are just average numbers, real numbers depend on the model internal architecture)

you can see signatures of all models used by human at https://github.com/vladmandic/human/tree/main/models
(that file is auto-generated by my utility)

1 reply

MaKleSoft Nov 10, 2021
Author

Great, that answers my follow-up question as well, which was at which point higher input resolutions would start yielding diminishing returns in terms of accuracy. It also explains why I was getting a lot of false positives with low-resolution reference images (box sizes < 100px).

jacobg · 2021-12-06T13:17:47Z

jacobg
Dec 6, 2021

Please excuse my last message (deleted now). I wrote it quickly because I was in a rush to go somewhere, and now I'm reading this whole thread and it clarifies much of the concern. Your gestures implementation looks very interesting for photo detection.

The valid part of that last message still is that doesn't seem to have caught obvious foreground/background inconsistencies, as shown in screenshot below.

0 replies

vladmandic · 2021-12-06T14:00:43Z

vladmandic
Dec 6, 2021
Maintainer

@jacobg i've written liveness and antispoof modules more as test and they are very simple - there is a lot that can be improved there
still, i run your input and getting slightly different results:

{
  boxScore: 0.9,
  faceScore: 1,
  age: 25.4,
  gender: 'male',
  genderScore: 0.96,
  real: 0.91,
  live: 0.89,
}

2 replies

jacobg Dec 6, 2021

@vladmandic Thanks for the reply. The difference may be that you may have fed my entire screenshot as input, whereas I held the phone in front of the webcam whose video was the input, and then also took a screenshot of the entire screen in order to post to this discussion.

vladmandic Dec 6, 2021
Maintainer

Ive cropped it to remove the text

jacobg · 2021-12-06T14:18:30Z

jacobg
Dec 6, 2021

@vladmandic Have you tested any other liveness detection models such as?
https://github.com/Prem95/realtime-face-liveness-detector
https://github.com/sakethbachu/Face-Liveness-Detection

0 replies

vladmandic · 2021-12-07T02:33:05Z

vladmandic
Dec 7, 2021
Maintainer

@jacobg I haven't spent much time on this as its mostly a side-thing in human

looking at two links you've shared, first one would be easy to implement (contributions are welcome!),
but prepackaging it with human is a no-go as model is ~50MB
and one of design goals for human is to be light-weight as much as possible

and second one is not worth it - looking at model architecture, its extremely simple

0 replies

jacobg · 2021-12-07T02:42:18Z

jacobg
Dec 7, 2021

@vladmandic Thanks for taking a look.

0 replies

jacobg · 2021-12-09T10:21:08Z

jacobg
Dec 9, 2021

Are you finding success in the eye blink gesture detection?

human/src/gesture/gesture.ts

Lines 65 to 68 in dc20a4d

    
           const openLeft = Math.abs(res[i].mesh[374][1] - res[i].mesh[386][1]) / Math.abs(res[i].mesh[443][1] - res[i].mesh[450][1]); // center of eye inner lid y coord div center of wider eye border y coord 
        
           if (openLeft < 0.2) gestures.push({ face: i, gesture: 'blink left eye' }); 
        
           const openRight = Math.abs(res[i].mesh[145][1] - res[i].mesh[159][1]) / Math.abs(res[i].mesh[223][1] - res[i].mesh[230][1]); // center of eye inner lid y coord div center of wider eye border y coord 
        
           if (openRight < 0.2) gestures.push({ face: i, gesture: 'blink right eye' });

For me, it's not detecting blinking at all. From reading more about facemesh, they describe it is a "rigid transformation of a canonical face model". My understand is what that means is it generally tries to approximate the entire face, but much of the detail would be speculated based on some canonical idea of what a face is supposed to be. For example, you would put your hand over an eye, it will still show an "eye". There are other face models out there that don't do that, but rather would acknowledge the eye disappearing. Similarly, when I blink my eyes, the eye contours don't change at all. Is that how you understand it?

0 replies

vladmandic · 2021-12-09T19:13:30Z

vladmandic
Dec 9, 2021
Maintainer

i have no issues with blinking detection. btw, you might want to check demo/faceid, see validationLoop method which checks for multiple things (including blink that lasts reasonable time - not too short or too long) to validate webcam input before allowing face matching.

but yes, you are correct in general how facemesh works - it uses predefined 468 point mesh which is first pre-transformed using hard-coded uvmap (

human/src/face/facemeshcoords.ts

Line 73 in dc20a4d

export const UV468: [number, number][] = [

) to meet general 3d shape of a face and then distorted and rotated to fit much smaller set of actually detected points

0 replies

Detecting "real" face vs picture through webcam #206

MaKleSoft Nov 9, 2021

Poor-mans FaceID demo is now online :)

Replies: 16 comments · 10 replies

Shandelier Nov 9, 2021

MaKleSoft Nov 9, 2021 Author

vladmandic Nov 9, 2021 Maintainer

MaKleSoft Nov 9, 2021 Author

vladmandic Nov 9, 2021 Maintainer

vladmandic Nov 9, 2021 Maintainer

MaKleSoft Nov 9, 2021 Author

vladmandic Nov 9, 2021 Maintainer

vladmandic Nov 9, 2021 Maintainer

Poor-mans FaceID demo is now online :)

Anti-spoofing Module

Liveness Module

Note

MaKleSoft Nov 10, 2021 Author

vladmandic Nov 10, 2021 Maintainer

MaKleSoft Nov 10, 2021 Author

vladmandic Nov 10, 2021 Maintainer

MaKleSoft Nov 10, 2021 Author

MaKleSoft Nov 10, 2021 Author

vladmandic Nov 10, 2021 Maintainer

MaKleSoft Nov 10, 2021 Author

jacobg Dec 6, 2021

vladmandic Dec 6, 2021 Maintainer

jacobg Dec 6, 2021

vladmandic Dec 6, 2021 Maintainer

jacobg Dec 6, 2021

vladmandic Dec 7, 2021 Maintainer

jacobg Dec 7, 2021

jacobg Dec 9, 2021

vladmandic Dec 9, 2021 Maintainer

MaKleSoft
Nov 9, 2021

Replies: 16 comments 10 replies

Shandelier
Nov 9, 2021

MaKleSoft Nov 9, 2021
Author

vladmandic
Nov 9, 2021
Maintainer

MaKleSoft Nov 9, 2021
Author

vladmandic
Nov 9, 2021
Maintainer

vladmandic
Nov 9, 2021
Maintainer

MaKleSoft Nov 9, 2021
Author

vladmandic Nov 9, 2021
Maintainer

vladmandic
Nov 9, 2021
Maintainer

MaKleSoft Nov 10, 2021
Author

vladmandic
Nov 10, 2021
Maintainer

MaKleSoft Nov 10, 2021
Author

vladmandic
Nov 10, 2021
Maintainer

MaKleSoft Nov 10, 2021
Author

MaKleSoft
Nov 10, 2021
Author

vladmandic
Nov 10, 2021
Maintainer

MaKleSoft Nov 10, 2021
Author

jacobg
Dec 6, 2021

vladmandic
Dec 6, 2021
Maintainer

vladmandic Dec 6, 2021
Maintainer

jacobg
Dec 6, 2021

vladmandic
Dec 7, 2021
Maintainer

jacobg
Dec 7, 2021

jacobg
Dec 9, 2021

vladmandic
Dec 9, 2021
Maintainer