The goal of FaceDetection is to provide efficient and high-speed face detection solutions, including cutting-edge and classic models.
We use the WIDER FACE dataset to carry out the training and testing of the model, the official website gives detailed data introduction.
-
WIDER Face data source:
Loadswider_face
type dataset with directory structures like this:dataset/wider_face/ ├── wider_face_split │ ├── wider_face_train_bbx_gt.txt │ ├── wider_face_val_bbx_gt.txt ├── WIDER_train │ ├── images │ │ ├── 0--Parade │ │ │ ├── 0_Parade_marchingband_1_100.jpg │ │ │ ├── 0_Parade_marchingband_1_381.jpg │ │ │ │ ... │ │ ├── 10--People_Marching │ │ │ ... ├── WIDER_val │ ├── images │ │ ├── 0--Parade │ │ │ ├── 0_Parade_marchingband_1_1004.jpg │ │ │ ├── 0_Parade_marchingband_1_1045.jpg │ │ │ │ ... │ │ ├── 10--People_Marching │ │ │ ...
-
Download dataset manually:
To download the WIDER FACE dataset, run the following commands:
cd dataset/wider_face && ./download.sh
- Download dataset automatically: If a training session is started but the dataset is not setup properly (e.g, not found in dataset/wider_face), PaddleDetection can automatically download them from WIDER FACE dataset, the decompressed datasets will be cached in ~/.cache/paddle/dataset/ and can be discovered automatically subsequently.
-
Data-anchor-sampling: Randomly transform the scale of the image to a certain range of scales, greatly enhancing the scale change of the face. The specific operation is to obtain
$v=\sqrt{width * height}$ according to the randomly selected face height and width, and judge the value ofv
in which interval of[16,32,64,128]
. Assumingv=45
&&32<v<64
, and any value of[16,32,64]
is selected with a probability of uniform distribution. If64
is selected, the face's interval is selected in[64 / 2, min(v * 2, 64 * 2)]
. -
Other methods: Including
RandomDistort
,ExpandImage
,RandomInterpImage
,RandomFlipImage
etc. Please refer to DATA.md for details.
Supported architectures is shown in the below table, please refer to Algorithm Description for details of the algorithm.
Original | Lite 1 | NAS 2 | |
---|---|---|---|
BlazeFace | ✓ | ✓ | ✓ |
FaceBoxes | ✓ | ✓ | x |
[1] Lite
edition means reduces the number of network layers and channels.
[2] NAS
edition means use Neural Architecture Search
algorithm to
optimized network structure.
Todo List:
- HamBox
- Pyramidbox
Architecture | Type | Size | Img/gpu | Lr schd | Easy Set | Medium Set | Hard Set | Download |
---|---|---|---|---|---|---|---|---|
BlazeFace | Original | 640 | 8 | 32w | 0.915 | 0.892 | 0.797 | model |
BlazeFace | Lite | 640 | 8 | 32w | 0.909 | 0.885 | 0.781 | model |
BlazeFace | NAS | 640 | 8 | 32w | 0.837 | 0.807 | 0.658 | model |
FaceBoxes | Original | 640 | 8 | 32w | 0.875 | 0.848 | 0.568 | model |
FaceBoxes | Lite | 640 | 8 | 32w | 0.898 | 0.872 | 0.752 | model |
NOTES:
- Get mAP in
Easy/Medium/Hard Set
by multi-scale evaluation intools/face_eval.py
. For details can refer to Evaluation. - BlazeFace-Lite Training and Testing ues blazeface.yml
configs file and set
lite_edition: true
.
Architecture | Type | Size | DistROC | ContROC |
---|---|---|---|---|
BlazeFace | Original | 640 | 0.992 | 0.762 |
BlazeFace | Lite | 640 | 0.990 | 0.756 |
BlazeFace | NAS | 640 | 0.981 | 0.741 |
FaceBoxes | Original | 640 | 0.985 | 0.731 |
FaceBoxes | Lite | 640 | 0.987 | 0.741 |
NOTES:
- Get mAP by multi-scale evaluation on the FDDB dataset. For details can refer to Evaluation.
Architecture | Type | Size | P4(trt32) (ms) | CPU (ms) | Qualcomm SnapDragon 855(armv8) (ms) | Model size (MB) |
---|---|---|---|---|---|---|
BlazeFace | Original | 128 | 1.387 | 23.461 | 6.036 | 0.777 |
BlazeFace | Lite | 128 | 1.323 | 12.802 | 6.193 | 0.68 |
BlazeFace | NAS | 128 | 1.03 | 6.714 | 2.7152 | 0.234 |
FaceBoxes | Original | 128 | 3.144 | 14.972 | 19.2196 | 3.6 |
FaceBoxes | Lite | 128 | 2.295 | 11.276 | 8.5278 | 2 |
BlazeFace | Original | 320 | 3.01 | 132.408 | 70.6916 | 0.777 |
BlazeFace | Lite | 320 | 2.535 | 69.964 | 69.9438 | 0.68 |
BlazeFace | NAS | 320 | 2.392 | 36.962 | 39.8086 | 0.234 |
FaceBoxes | Original | 320 | 7.556 | 84.531 | 52.1022 | 3.6 |
FaceBoxes | Lite | 320 | 18.605 | 78.862 | 59.8996 | 2 |
BlazeFace | Original | 640 | 8.885 | 519.364 | 149.896 | 0.777 |
BlazeFace | Lite | 640 | 6.988 | 284.13 | 149.902 | 0.68 |
BlazeFace | NAS | 640 | 7.448 | 142.91 | 69.8266 | 0.234 |
FaceBoxes | Original | 640 | 78.201 | 394.043 | 169.877 | 3.6 |
FaceBoxes | Lite | 640 | 59.47 | 313.683 | 139.918 | 2 |
NOTES:
- CPU: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
- P4(trt32) and CPU tests based on PaddlePaddle, PaddlePaddle version is 1.6.1
- ARM test environment:
- Qualcomm SnapDragon 855(armv8)
- Single thread
- Paddle-Lite version 2.0.0
Training
and Inference
please refer to GETTING_STARTED.md
- NOTES:
BlazeFace
andFaceBoxes
is trained in 4 GPU withbatch_size=8
per gpu (total batch size as 32) and trained 320000 iters.(If your GPU count is not 4, please refer to the rule of training parameters in the table of calculation rules)- Currently we do not support evaluation in training.
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=$PYTHONPATH:.
python tools/face_eval.py -c configs/face_detection/blazeface.yml
- Optional arguments
-d
or--dataset_dir
: Dataset path, same as dataset_dir of configs. Such as:-d dataset/wider_face
.-f
or--output_eval
: Evaluation file directory, default isoutput/pred
.-e
or--eval_mode
: Evaluation mode, includewiderface
andfddb
, default iswiderface
.--multi_scale
: If you add this action button in the command, it will selectmulti_scale
evaluation. Default isFalse
, it will selectsingle-scale
evaluation.
After the evaluation is completed, the test result in txt format will be generated in output/pred
,
and then mAP will be calculated according to different data sets. If you set --eval_mode=widerface
,
it will Evaluate on the WIDER FACE.If you set --eval_mode=fddb
,
it will Evaluate on the FDDB.
- Download the official evaluation script to evaluate the AP metrics:
wget http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/support/eval_script/eval_tools.zip
unzip eval_tools.zip && rm -f eval_tools.zip
- Modify the result path and the name of the curve to be drawn in
eval_tools/wider_eval.m
:
# Modify the folder name where the result is stored.
pred_dir = './pred';
# Modify the name of the curve to be drawn
legend_name = 'Fluid-BlazeFace';
wider_eval.m
is the main execution program of the evaluation module. The run command is as follows:
matlab -nodesktop -nosplash -nojvm -r "run wider_eval.m;quit;"
FDDB dataset details can refer to FDDB's official website.
- Download the official dataset and evaluation script to evaluate the ROC metrics:
#external link to the Faces in the Wild data set
wget http://tamaraberg.com/faceDataset/originalPics.tar.gz
#The annotations are split into ten folds. See README for details.
wget http://vis-www.cs.umass.edu/fddb/FDDB-folds.tgz
#information on directory structure and file formats
wget http://vis-www.cs.umass.edu/fddb/README.txt
- Install OpenCV: Requires OpenCV library
If the utility 'pkg-config' is not available for your operating system, edit the Makefile to manually specify the OpenCV flags as following:
INCS = -I/usr/local/include/opencv
LIBS = -L/usr/local/lib -lcxcore -lcv -lhighgui -lcvaux -lml
-
Compile FDDB evaluation code: execute
make
in evaluation folder. -
Generate full image path list and groundtruth in FDDB-folds. The run command is as follows:
cat `ls|grep -v"ellipse"` > filePath.txt` and `cat *ellipse* > fddb_annotFile.txt`
- Evaluation Finally evaluation command is:
./evaluate -a ./FDDB/FDDB-folds/fddb_annotFile.txt \
-d DETECTION_RESULT.txt -f 0 \
-i ./FDDB -l ./FDDB/FDDB-folds/filePath.txt \
-r ./OUTPUT_DIR -z .jpg
NOTES: The interpretation of the argument can be performed by ./evaluate --help
.
Introduction:
BlazeFace is Google Research published face detection model.
It's lightweight but good performance, and tailored for mobile GPU inference. It runs at a speed
of 200-1000+ FPS on flagship devices.
Particularity:
- Anchor scheme stops at 8×8(input 128x128), 6 anchors per pixel at that resolution.
- 5 single, and 6 double BlazeBlocks: 5×5 depthwise convs, same accuracy with fewer layers.
- Replace the non-maximum suppression algorithm with a blending strategy that estimates the regression parameters of a bounding box as a weighted mean between the overlapping predictions.
Edition information:
- Original: Reference original paper reproduction.
- Lite: Replace 5x5 conv with 3x3 conv, fewer network layers and conv channels.
- NAS: use
Neural Architecture Search
algorithm to optimized network structure, less network layer and conv channel number thanLite
.
Introduction:
FaceBoxes which named A CPU Real-time Face Detector
with High Accuracy is face detector proposed by Shifeng Zhang, with high performance on
both speed and accuracy. This paper is published by IJCB(2017).
Particularity:
- Anchor scheme stops at 20x20, 10x10, 5x5, which network input size is 640x640, including 3, 1, 1 anchors per pixel at each resolution. The corresponding densities are 1, 2, 4(20x20), 4(10x10) and 4(5x5).
- 2 convs with CReLU, 2 poolings, 3 inceptions and 2 convs with ReLU.
- Use density prior box to improve detection accuracy.
Edition information:
- Original: Reference original paper reproduction.
- Lite: 2 convs with CReLU, 1 pooling, 2 convs with ReLU, 3 inceptions and 2 convs with ReLU. Anchor scheme stops at 80x80 and 40x40, including 3, 1 anchors per pixel at each resolution. The corresponding densities are 1, 2, 4(80x80) and 4(40x40), using less conv channel number than lite.
Contributions are highly welcomed and we would really appreciate your feedback!!