Implement a helper app to crowdsource AICore testing #6138

nicolas-raoul · 2025-01-17T04:30:17Z

Context

To implement AICore-based features (example), we need to test prompts on AICore.

Problem

AICore can not be tested on the emulator, and unfortunately no way to crowdsource this testing seem to exist (context).

Currently only Paul has a smartphone advanced enough to run AICore. Paul accepted to test for us, but we must make it as easy as possible for Paul.

Solution

Develop an app:

where Paul (and potentially other people who own these devices) can effortlessly run our prompts and send us the responses.
Ideally it would be easy for us to manage the prompts, and to see the responses categorized by model/OS version/AICore version.
Ideally the device owners would not need to do anything, just install it. Maybe run it once in a while if auto-running presents challenges.

The app would live in its own repository under the https://github.com/commons-app organization.

parneet-guraya · 2025-01-18T00:00:43Z

Very nice idea. I liked it a lot. And I might suggest we can also use meta's llama models locally. I'm showing a demo of the app that ran on my device. Plus point is we load the model through our app so it will work on any phone (as long as it can handle it) .

Device: Oneplust 9RT 5G

This is running completely offline and the model is ~ 1GB and is downloaded on initial install so no impact no apk size.

Record_2025-01-18-04-31-02.mp4

So, we can have both options, devices that support Gemini and also fallback to opensource Llama. Also from the numbers on the site their lightweight models are comparable to Gemini.

But, these models are chat based. So, some feature would require response in some format so parsing would be an issue unless someone train the model to do so. While other things like summary, explanation would work out of the box.

To implement AICore-based features (#5422), we need to test prompts on AICore.

Now this feature for instance

First step: Parse the caption into expressions could be asked to an LLM if one is available locally on the device. That means only on Pixel 8 Pro (or above) and Samsung Galaxy S24 (or above) for now but hopefully more makers will follow.

It could work but not sure how we can make the model return response in particular format like JSON or each entity in a list object. We can write prompt to set the boundaries with example input and output but it does break the rule sometimes.

For example:

Prompt ( not a great prompt engineer):
'Papilio machaon on Asteraceae flower in Croatia' from the above text extract each word that has a meaning separately as if this query goes through a recommendation engine and provide each entity separately and in json format.

Output: (completely offline)

Record_2025-01-18-04-56-00.mp4

On devices where no such technology is available, the app can fall back to some more traditional tokenization, maybe even split on each space character, or just skip entirely.

Right I have tried something similar a while back in a project. Basically I'd take the current caption that is showing in a video and tried to tokenize each word so user can make search through the internet without going back and forth. This required me to split the words but tradition splitting based on white space character didn't work sometimes. Then, I found a solution using a very lightweight NLP model. It worked great for my usecase. But, yes it didn't give me the meaningful entities just NLP based tokenization. Also, there are different binaries for different language but each file's size is < 500 KB.

All models: https://opennlp.apache.org/models.html

Lastly if I understood this correctly in order to implement and maintain on device AI based features we need some sort of to test prompts so that feature works as intended. But, there's a device limitation so we want a platform where one could upload required prompt and devices that can process it and will pick up this (auto or manually). Then post the result. It could be like a social media apps with posts as a prompt and answer as comments with device details.

And this would be like sort of internal tool for developers right?

Also, are we responsible for writing backend (API) for this too?

Thanks :-)

nicolas-raoul · 2025-01-18T14:03:06Z

the model is ~ 1GB and is downloaded on initial install so no impact no apk size

llama being more open source is a clear advantage, but we really should avoid taking the bandwidth and storage space required when downloading a model. The advantage of AICore is that it is already present on devices, no need to download anything.

whym · 2025-01-19T10:48:03Z

I think we need to define a broader policy, similar to https://phabricator.wikimedia.org/T336905 (Wikimedia Cloud Services AI policy discussion).
I would personally want to see options including one for those prefer ~~privacy~~ transparency and control over ease of access, as long as that is not prohibitively difficult to implement. (That means explicit choice by the user, if not automatic fallback.)

nicolas-raoul · 2025-01-19T14:25:05Z

@whym This particular issue is about the helper app, which will only be used by a few developers to improve prompts, so I don't think it is controversial, it could even be closed source or be run on a third-party hosted service (similar to how we use proprietary GitHub). However, the full GSoC is exploratory in nature so having these kinds of discussions is part of the process, thus I created #6143 to discuss such topics. :-)

nicolas-raoul · 2025-01-19T14:36:08Z

@parneet-guraya Thanks for the OpenNLP link, that can be an alternative!

And this would be like sort of internal tool for developers right?

Yes, so it can be very rough,

Also, are we responsible for writing backend (API) for this too?

Yes, ideally something simple can be found, maybe writing to a Google Form or something similar? Or a more robust backend if the GSoC candidate is used to writing such backends quickly. Hopefully on some cloud's free tier, or on our Wikimedia-hosted server (similar to https://commons-android-app.toolforge.org/tool-commons-android-app/wikidataedits.py?user=Syced https://github.com/commons-app/commonsmisc though not sure we have a database with write access).

nicolas-raoul added the gsoc Google Summer of Code label Jan 17, 2025

nicolas-raoul self-assigned this Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a helper app to crowdsource AICore testing #6138

Implement a helper app to crowdsource AICore testing #6138

nicolas-raoul commented Jan 17, 2025

parneet-guraya commented Jan 18, 2025

nicolas-raoul commented Jan 18, 2025

whym commented Jan 19, 2025 •

edited

Loading

nicolas-raoul commented Jan 19, 2025

nicolas-raoul commented Jan 19, 2025

Implement a helper app to crowdsource AICore testing #6138

Implement a helper app to crowdsource AICore testing #6138

Comments

nicolas-raoul commented Jan 17, 2025

Context

Problem

Solution

parneet-guraya commented Jan 18, 2025

nicolas-raoul commented Jan 18, 2025

whym commented Jan 19, 2025 • edited Loading

nicolas-raoul commented Jan 19, 2025

nicolas-raoul commented Jan 19, 2025

whym commented Jan 19, 2025 •

edited

Loading