Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perhaps consider an alternative proposal? #1

Open
erikmchut opened this issue Aug 18, 2020 · 5 comments
Open

Perhaps consider an alternative proposal? #1

erikmchut opened this issue Aug 18, 2020 · 5 comments

Comments

@erikmchut
Copy link

In general, a 'face-mesh' is only one component of a Face AR system. In addition to a mesh and uv coordinates, you need an occluder (face shape), pixel segmenter (Hair/eyebrow/lips, etc.), eye gaze detection, iris detection, emotion detection, face gesture detection (smile/frown/wink, etc.), blend shapes (animoji, etc.), background detection, shoulder/neck segmentation. In fact, what you need changes every six months as the technology and fidelity improves.

User-space access to the underlying camera feed in GPU is the only real requirement to be able to generate popular effects or create new ones,– and this is already provided by WebRTC as it currently exists. Consider each of the following software libraries that already powers FaceAR on the web without proprietary APIs added to a specific browser. Each of these does only on-device processing, keeping the user-data secure, and uses existing, well-established permission mechanisms for camera access.

FaceMesh by MediaPipe:
https://google.github.io/mediapipe/solutions/face_mesh.html

8th Wall Face Effects:
https://www.8thwall.com/8thwall/face-effects-aframe

Banuba WebAR for Chrome:
https://www.banuba.com/technology/webar

Zappar WebAR (faces):
https://zap.works/webar/

Jeeliz FaceFilter
https://github.com/jeeliz/jeelizFaceFilter

Consider implementing the face-mesh code with existing WebRTC streams or as an open-source library built on top of the the powerful and well-considered new w3c API proposal for WebRTC Insertable Streams. Ultimately, developers want to integrate and/or improve technology, and this existing proposal would limit them to one browser-specific implementation.

@alcooper91
Copy link
Owner

Thanks for the feedback.

I'm not (with this proposal) trying to enable a full FaceAR system; however, I think exposing a FaceMesh to the page unlocks a lot of really powerful opportunities, and could potentially allow for polyfilling in some other aspects which you call out (though that's not a main goal). For example a well-defined mesh (and as the goal here is cross-browser compatibility, the mesh would need to be well defined) would allow easily selecting vertices from the mesh that correspond to specific parts of the Face (e.g. the eyebrows/lips/nose).

The current WebRTC way of doing this and getting the camera feed (to my knowledge), requires the page to output a MediaStreamTrack to a canvas, run any processing (like one of the libraries that you mention) and then to have a new MediaStreamTrack to then send the data onward. To my knowledge, there are no existing WebRTC streams that allow this, and the WebRTC Insertable Streams are also insufficient as they only provide the encoded frame data. The ProcessingMediaStreamTrack that I propose integrating with would be a new WebRTC stream that would essentially replace the need for the page to do this writing/reading from the canvas, instead giving the page a way to access the UnencodedVideoFrame that it can then apply transforms to/modify. The full details of that track are outside the scope of my proposal, and once I can link to that proposal I will (Which I hope will clear up some confusion). My proposal is to further extend that ProcessingMediaStreamTrack by allowing additional metadata (e.g. FaceMesh data) to be pre-computed and passed along with the UnencodedVideoFrame, removing the need for some of the processing done by these libraries.

The goal for all of this would be for the browser to do local, on-device processing to keep user-data secure, and, as the data is only exposed via a WebRTC stream, it would be following that same well-established permission mechanisms for camera access. By allowing the browser to pre-compute metadata (such as the FaceMesh), pages wouldn't have to load fairly large models to compute this (saving both their and the users bandwidth), potentially eliminate/minimize some texture copies (improving performance), and potentially leverage multithreading and hardware or other accelerations (further improving performance). At it's simplest mobile implementation, you could imagine a ProcessingMediaStreamTrack that is wrapping an ARCore/ARKit camera stream and directly exposing both that camera frame and the computed FaceMesh data from ARCore/ARKit, meaning that all the page has to do is whatever specific texture compositing it wants to do with the FaceMesh.

I want to finish by stating that my goal is for this to become a W3C spec and not simply one browser-specific implementation.

@nbutko
Copy link

nbutko commented Aug 18, 2020

The strength of the Insertable Streams framework is that it provides a foundation for developers to write their own processors. A stated key usecase of Inesertable Streams is Funny Hats. What is missing from the current proposal to fulfill its first stated goal? What would the Insertable Streams group need to add, in terms of functionality, to support FaceMesh or other developer javascript that runs more efficiently than today?

@alcooper91
Copy link
Owner

alcooper91 commented Aug 19, 2020

From my research, in practice (as currently impemented) InsertableStreams are primarily being used to provide encryption. The biggest issue is that Insertable streams exposes the encoded data, which (to my knowledge) is not data that you could run any of the above libraries over nor do compositing on; though I will admit I have not tried it myself. You can think of the ProcessingMediaStreamTrack as a type of InsertableStream that provides access to the raw camera data. Again, work for an explainer for that is ongoing, and I will link to it when it is available.

Edit: This portion of the InsertableStreams explainer clarifies this even more: https://github.com/w3c/webrtc-insertable-streams/blob/master/explainer.md#use-cases

@nbutko
Copy link

nbutko commented Aug 20, 2020

So then in your new explainer, FaceMesh javascript could run much more efficiently than it does today by leveraging the ability to run inside a newly proposed insertable stream type?

@alcooper91
Copy link
Owner

They could; however, we believe that by allowing the browser to pre-process/provide some of this metadata that it would be more performant than running code in JavaScript and reduce the amount of data that the page/user has to load.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants