Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detection and Model Regression Overview and Roadmap #256

Open
headupinclouds opened this issue Jan 12, 2017 · 0 comments
Open

Detection and Model Regression Overview and Roadmap #256

headupinclouds opened this issue Jan 12, 2017 · 0 comments

Comments

@headupinclouds
Copy link
Collaborator

headupinclouds commented Jan 12, 2017

Status: Currently an OpenGL ES 2.0 compatible shader pipeline is used w/ platform specific optimized GPU->CPU input via the extended + hunterized ogles_gpgpu module [link] to facilitate real time face detection followed by face landmark and eye model regression stages. This supports a multi-resolution Aggregated Channel Feature texture/pyramid input for fast multi-scale gradient boosting object detection in the drishti::acf::Detector module via Piotr's matlab implmentation [link]. As reported in [link], this object detection approach is still reasonably close to state of the art for fairly unconstrained face detection (with the exception of large CNN approaches that are not so amenable to real time mobile device processing) -- good enough in any case. That face appearance is unique to fairly low resolution, and the ACF features effectively add 4x4 binning, so the input textures from the GPU are fairly low resolution, which helps minimize the GPU -> CPU overhead. The "pyramid" are also packed into texture to minimize wasted space [link]. In iOS land, texture retrieval via ios texture caches is very efficient, and multiple textures can be retrieved in parallel, so transfer overhead is fairly minimal [TODO: provide sample times on iPhones]. The OpenGL GraphicBuffer extensions are somewhat efficient, but do not seem to support parallel use, so there seems to be more GPU->CPU overhead for Android devices [TODO: sample times on Android].

The current drishti::hci::FaceFinder module introduces a 1 frame latency so that the required ACF pyramid texture computation can be kept busy, while a designated CPU thread retrieves and runs the reasonably lightweight multi-scale detection search. This processing uses sparse pixel lookups in combination with gradient boosting w/ level 2 trees for weak learners, and it is expected that such processing will be very slow on mobile GPU's. Most likely this will remain on the CPU to keep processing friendly for mobile devices. It might be worth adding some Renderscript/OpenCL/Metal unit tests to confirm this. There is a paper that investigates some clever approaches for running tree based gradient boosting on modern desktop GPU's w/ CUDA or OpenCL [LINK], but it is probably a stretch to envision this running on current mobile device GPU's.

Implementing ACF pyramids in pure OpenGL ES 2.0 shaders involves some compromise, mostly due to 8 32-bit 4x8-bit channel restrictions for output textures. For the ACF features (histograms, etc), this seems sufficient. Since there is no direct mapping from the ACF C++ code to OpenGL shaders, some (perhaps significant) numerical differences are expected. The current implementation provide a decent approximation, but the deviation is a little larger than I would like, although not too far off [#219]. Most likely some additional focused shader tuning will be required to make this 'close enough'. Packing float output in 32-bit output is one option, although I don't think it will be required for the detection application. Alternatively, I believe, higher precision could be achieved with platform specific GPU processing (OpenCL/RenderScript/Metal) [TOOD: cofirm], at the cost of higher transfer overhead -- again, I don't think this is worth it.

The low resolution ACF detection results (after NMS filtering) are scaled to higher resolution, and a face ROI is cropped from a grayscale texture to support fast landmark regression (again on the CPU) using a PCA variation of the Kazemi landmark estimation (dlib, dest, etc). Ideally this regression should leverage the richer/denser features in the ACF output, possibly using only single pixel tests, rather than the current normalized pixel differences. I expect this will be more accurate, faster, and will reduce the need for additional higher resolution raw/grayscale GPU->CPU texture downloads. Although the ACF features will violate the classic/traditional pose invariant features provided by pixel differences, the face pose is relatively constrained, and ACF features should provide a net win. PCA shape space regression, half precision floating point storage, and generic compression are all used to get the models down to a reasonable size (w/ a target of 2 or 3 MB). A WIP branch is currently exploring customized compression schemes that provide very large savings, based on redundancy in the concatenated leaf node values. Initial looks at time-sequence audio codecs, both lossless flac and lossy ogg vorbis look very encouraging [TODO: sample sizes]. This provides a rough initial landmark estimate for inner face features: eyes + eyebrows + nose. Those results are then used to initialize a final eye model fitting step: iris, pupil, eyelid contour, eyelid crease. That step itself proceeds in a coarse-to-fine model fitting sequence, with an initial global model being fit at low resolution, which is used to provide initial position and occlusion mask for a final iris ellipse estimate.

The landmark based head pose [LINK: eos] in combination with the eye/iris models will be used to feed a final CNN gaze estimate [LINKS] when sufficient training data is available [TODO/LINK: data collection process and online "research" datasets].

Miscellaneous:

  • SIMD (NEON/SSE) and GPU implementation of the ACF pyramids are available, but some of the low level routines would need to be implemented for a pure version C++ version [ACF w/o SIMD #111] -- in practice this is probably not needed.
  • Is Waldboost [waldboost + ACF #243] necessary for commercial use?
@headupinclouds headupinclouds changed the title Detection and Model Regression Roadmap Detection and Model Regression Overview and Roadmap Jan 12, 2017
@headupinclouds headupinclouds added this to the 0.5 milestone Mar 6, 2017
@headupinclouds headupinclouds removed this from the 0.5 milestone May 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant