Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API] Statistics API end point like those displayed on /stats #1080

Open
ForsakenRei opened this issue Sep 27, 2024 · 7 comments
Open

[API] Statistics API end point like those displayed on /stats #1080

ForsakenRei opened this issue Sep 27, 2024 · 7 comments

Comments

@ForsakenRei
Copy link
Contributor

ForsakenRei commented Sep 27, 2024

Suggestion

For a homelab environment where poeple usually has a dashboard, maybe extend the current API end point to add something like total archives, tags etc. display on the https://yourinstance.com/stats? Maybe add them to the /info end point since it already have total_pages_read there.

Additional Context

The dashboard I'm using now is https://github.com/gethomepage/homepage which support custom API so I can display the statics there. I have a Python script to grab the numbers from the /stats page the expose my homemade API but I felt it's not the most elegant way to do it lol.

@Difegue
Copy link
Owner

Difegue commented Sep 29, 2024

I presume you know of api/database/stats for the tag cloud data? This is what the "different tags existing" stat relies on.

For the others, I've added total_archives to /api/info since it was straightforward; Content folder size is a bit more work to API-fy since it currently hard-formats the number to GBs and it's probably be better to return the raw number.

Let me know if this enough for your needs.

@ForsakenRei
Copy link
Contributor Author

Thanks! And yeah I'm aware of the tag cloud data, though I didn't find the total count (or maybe I missed it somewhere?) so as lazy as I am I just use Python to scrap the statistics page lol. I guess if we already have a count for the statistics page it shouldn't be too complicated to add another end point...?

As for the total size, I didn't realized it's hard coded to GB so there is no need to rush, just when you have free time.

@nonamethanks
Copy link
Contributor

nonamethanks commented Oct 18, 2024

The correct way to do this would be to use prometheus metrics, which are the standard for monitoring applications nowadays. That way then it can be pulled into grafana/any other system out there without worrying about supporting different custom apps.

@ForsakenRei
Copy link
Contributor Author

The correct way to do this would be to use prometheus metrics, which are the standard for monitoring applications nowadays. That way then it can be pulled into grafana/any other system out there without worrying about supporting different custom apps.

Yes, my idea is we have some general API end points so people can decide what to do with them, I have prometheus and homepage supports custom API so there is no need for LANraragi to do extra development about dashboard support.

@psilabs-dev
Copy link
Contributor

I'll be playing around with prometheus for a bit.

@psilabs-dev
Copy link
Contributor

I've done some testing using the Net::Prometheus library, which is a common perl lib for implementing an exporter and have some thoughts. There’s also prometheus tiny, and an exporter specific to mojo which relies on net prom.

To give a summary, the point of using prometheus is to collect metrics and store them in a time series database, from which latest metrics/stats can be computed, and server observability can be achieved via something like grafana or a homelab dashboard. The prometheus server periodically scrapes the endpoint of interest via an HTTP call to collect metrics data that is cached somewhere in the server.

Also to note, prometheus offers no 100% accuracy guarantee: https://prometheus.io/docs/introduction/overview/.

Because LRR is a server which spawns multiple processes, an implementation is not straightforward, at least with an existing perl library. This is because metrics are not shared between processes, and any prom server will run an API request and obtain a different metrics cache for each PID.

From the maintainer of the Net::Prometheus:

For something multi-proc you'd want to have some kind of messaging system inbetween, whereby workers that cause concepts of counters to be incremented can send a message to the "main" exporter process to tell it which counters to increase and by how much, and that keeps the counts.

The bottom line is, we would need a single source of truth for metrics. This can be a dedicated process in LRR as mentioned, redis as a metrics cache, a separate web server that calls LRR APIs and transforms them to metrics (eg https://github.com/nginx/nginx-prometheus-exporter), or something else.

Apart from implementing the exporter itself, we’d also need to decide on metrics to collect. A strong case can be made for collecting metrics, but I’m not totally convinced about using prometheus for statistics collections. Metrics collection is something done passively as to not impact normal server operations, which is why usually either they are called only when something else is called (num requests), or their collection method involves cheap reads (cpu usage). Using prom to collect aggregation metrics, eg by attaching to the stats API, involves some level of voluntary, non-scaling computation, making the exporter a noticable process on the host.

Now I could be (and have been) wrong, and I'm not familiar with the current mojo architecture, so hopefully someone can correct me on many things. Anyways these are things that I can think of individually at the moment on this.

@Difegue
Copy link
Owner

Difegue commented Jan 21, 2025

If you need a dedicated process for metrics collection, I would suggest using Shinobu for this, as it's fairly static unlike Mojo itself or the Minion workers who will constantly prefork new processes to meet demand.
You would probably need to use Redis as the metrics cache/storage in some fashion.

I should also mention that currently, running the server in debug mode enables https://github.com/mojolicious/mojo-status , which is reachable at [SERVER_URL]/debug or [SERVER_URL]/debug.json for a JSON equivalent.
I have to say I don't find it very useful, but the code of the plugin is useful to see how you can grab metrics off the main mojolicious object, IMO.

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants