get_machines is too slow and does not use the book keeping data #2101

peregrineshahin · 2024-07-03T06:29:27Z

https://tests.stockfishchess.org/tests/machines
This route should use the latest optimizations of workers book-keeping if exists
It's also too slow under fleet load, ignoring the 10s cached http resposne

    def get_machines(self):
        active_runs = self.runs.find({"finished": False}, {"tasks": 1, "args": 1})
        machines = (
            task["worker_info"]
            | {
                "last_updated": (
                    task["last_updated"] if task.get("last_updated") else None
                ),
                "run": run,
                "task_id": task_id,
            }
            for run in active_runs
            if any(task["active"] for task in reversed(run["tasks"]))
            for task_id, task in enumerate(run["tasks"])
            if task["active"]
        )
        return machines

The text was updated successfully, but these errors were encountered:

vdbergh · 2024-07-03T07:21:35Z

Well there is an intrinsic complexity here: a list of all active tasks needs to be made.

On the primary instance the db access could be eliminated by using the cache and the set of unfinished runs (which is dynamically kept up to date), but I believe get_machines runs on a secondary instance.

What we could do (in addition perhaps) is to create a scheduled task to regenerate the machines list every minute and then serve this list.

peregrineshahin · 2024-07-04T05:18:17Z

there is an intrinsic complexity here: a list of all active tasks needs to be made.

how is that? the goal is to get the active machines, the "added" complexity that you need to loop through the tasks is an implementation detail..

vdbergh · 2024-07-04T05:31:24Z

Actually you are right. The main instance has a data structure wtt_map which is a dictionary which maps short worker names to tasks.

wtt_map_schema = {
    short_worker_name: (runs_schema, task_id),
}

So that could be used to get the machines list quickly on the main instance.

vdbergh · 2024-07-04T05:37:48Z

Either the machine list api should be moved to a primary instance, or else an internal api should be created where a secondary instance asks the primary instance for the list (this may involve serialization/deserialization of json). An example of such an api is in #1993.

vdbergh · 2024-07-04T05:39:59Z

And as I said, an additional enhancement could be to recreate the list every minute in a scheduled task (rather than creating it on demand).

vdbergh · 2024-07-04T05:45:20Z

Note that the wtt_map itself may contain some inactive tasks (avoiding this clutters the code too much IMHO). These inactive tasks are evicted in a scheduled task that runs every minute.

peregrineshahin · 2024-07-04T15:02:52Z

I think we moved this API to the secondary instance because of performance issues.. while I expected we have such a map in the flow that should provide us such information I'm not quite sure about the 1 min reschedule time build, do you mean that the numbers showing in the home page cards currently only renew every 1 min? This sounds too much TBH.. I'm not generally a fan of rescheduling api's.. neither does the nature of book-keeping approaches requires it.. so this caught me by surprise.

vdbergh · 2024-07-04T15:09:56Z

Well I do not know where the bottleneck is. It could also be in mako.

Currently the table is computed on demand but there is a delay (of the order of 30-60s) before the db is fully updated.

I like to schedule expensive computing tasks at regular intervals rather than on demand (unless accuracy is critical). In that way one get predictable performance independently of load.

Note that the new scheduler in Fishtest makes creating such tasks very easy.

peregrineshahin · 2024-07-04T15:22:25Z

Wel my point is if we look at book-keeping as a technique it is neither on demand or scheduled.. it none of both.. so my question was .. suppose each time a worker gets active you add it to the dict.. and a worker gets non active you remove it from the list.. I'm not sure I understand why would such a map require scheduling?

vdbergh · 2024-07-04T15:40:49Z

I do not know how much time it takes to generate the html from the data structure. If it takes 1s and 60 people want to view the machines lists then there is no time to do anything else...

EDIT: Simplified of course...

peregrineshahin · 2024-07-04T15:49:58Z

Sorry but I'm not following.. my question is related to the wttp_map, my question is why do we have it every 1 min.. I'm not understanding conceptually why such map doesn't exist always up to date just like how run results gets incrementally updated.. is it because we don't spot the worker while they leave?

vdbergh · 2024-07-04T15:57:10Z

The wtt_map is always up to date (except that it contains some inactive tasks which are periodically cleaned). But you still need to generate the html from it. That's what I am talking about (during a visit of Vondele's fleet this will typically be a table with 10000 rows). Of course you could fiddle with paging but then sorting by username becomes unpleasant (I think).

PS.

The reason why there are some inactive tasks is that I did not want to clutter the set_inactive_task() function with code to update the wtt_map. Besides such code being messy one would also need to be careful to avoid deadlock.

vdbergh · 2024-07-04T16:45:40Z

Perhaps I should clarify that the wtt_map was not created to support the machine list, but to very efficiently detect duplicate workers (even when Vondele's fleet is visiting).

For the machine list perhaps another data structure would be more appropriate: a dictionary with keys the run_id's of unfinished runs and with values the sets of corresponding active tasks. I.e. with schema

{
    run_id: {task_id}
}

This would be very easy to keep up to date.

peregrineshahin · 2024-07-06T08:04:47Z

I'm not sure where to look, perhaps it would be easier for you to code the new function please.

For the machine list perhaps another data structure would be more appropriate: a dictionary with keys the run_id's of unfinished runs and with values the sets of corresponding active tasks. I.e. with schema

{
run_id: {task_id}
}
This would be very easy to keep up to date.

vdbergh · 2024-07-06T08:21:33Z

The thing is that I am not 100% sure that generating the data is the bottleneck. I wonder if it is not the generation of a table with > 15000 rows from that data (a table that also needs to be transferred to the client).

If the latter would be the case then moving the machines api to the primary instance would be bad since there would less time to handle the worker apis.

vdbergh · 2024-07-06T08:35:48Z

I guess we could try. But it would require a fleet visit to see the effect.

dubslow · 2024-07-06T08:37:13Z

Good thing the fleet is visiting right this very moment :) (the server is even holding fairly steady at 100+k cores for two hours now!)

vdbergh · 2024-07-06T08:44:34Z

Well it takes more than 15min to implement this...

vdbergh · 2024-07-06T09:51:18Z

Good thing the fleet is visiting right this very moment :) (the server is even holding fairly steady at 100+k cores for two hours now!)

Yes the server seems to be doing fine. Local api.log times remain stable (a few ms/api call except pgn_upload which takes ~ 20ms).

vondele · 2024-07-06T09:55:42Z

Server is indeed doing fine under this load now. Current api locally:

# logging from [06/Jul/2024:09:52:14 +0000] to [06/Jul/2024:09:53:10 +0000] ########
# duration (seconds)            : 56
# calls in total                : 10000
# calls per second              : 178
# calls not reaching the backend: 1
# calls handled by the backend  : 9999
#                         route    calls      total    average    minimum    maximum maxcount
                      /api/beat     6810    744.814      0.109      0.001     30.001       17
               /api/update_task     3047    201.978      0.066      0.001     30.001        4
                         /tests       12      7.307      0.609      0.015      0.997        0
                  /tests/tasks/       12      6.094      0.508      0.292      1.007        0
                     /tests/run        2      3.938      1.969      1.180      2.758        0
                   /tests/view/       14      0.846      0.060      0.010      0.159        0
              /api/request_spsa       50      0.820      0.016      0.003      0.114        0
                  /api/get_elo/        7      0.478      0.068      0.017      0.184        0
                   /api/actions       11      0.199      0.018      0.003      0.037        0
                       /api/nn/       28      0.158      0.006      0.004      0.007        0
                /tests/finished        1      0.092      0.092      0.092      0.092        0
                 /tests/approve        1      0.054      0.054      0.054      0.054        0
               /api/active_runs        2      0.025      0.013      0.006      0.019        0
           /api/request_version        1      0.005      0.005      0.005      0.005        0
              /api/request_task        1      0.003      0.003      0.003      0.003        0

a bit earlier the 3 pgn_upload pserves were 100% loaded, but not hanging (eventually we should just reduce the amount of STC pgns we upload, IMO, e.g. only for the first N tasks or so).
We still have these small bursts of api calls that take 30s or so, even though the average timing is very low. The primary pserve is around 50% (30-70%) loaded.

vdbergh · 2024-07-06T09:57:19Z

Hmm the worker seems to log only successful api calls (which take a few ms), not those that time out (~30s I guess).

vondele · 2024-07-06T09:58:31Z

Note that these are rare (17 out of 6810) for the table above.

vdbergh · 2024-07-06T10:09:00Z

It would be nice to understand this though.

I wonder if the dead tasks scavenging has something to do with it. If this scheduled task (which runs every minute) has to suddenly handle 1000s of dead tasks (which involves acquiring active_run_locks) then perhaps this may temporarily clog things up.

As a safety measure I am thinking of only scavenging a limited number of dead tasks per turn. E.g. 500. Then clearing 17000 dead tasks would take 34 minutes. This is not entirely free though because it means runs may take longer to finish. But it seems like a worthy trade off.

vdbergh · 2024-07-06T10:43:19Z

See #2109

peregrineshahin · 2024-07-06T14:42:52Z

pagination needed in both cases

The thing is that I am not 100% sure that generating the data is the bottleneck. I wonder if it is not the generation of a table with > 15000 rows from that data (a table that also needs to be transferred to the client).

If the latter would be the case then moving the machines api to the primary instance would be bad since there would less time to handle the worker apis.

vdbergh · 2024-07-06T14:48:04Z

pagination needed in both cases

Yes but one should still be able to sort the whole table (not only the current page).

peregrineshahin · 2024-07-06T15:16:04Z

Actually vondele suggested this one which I forgot about:

or a smarter way to show the table. E.g. summary first per user, only after a click expand that user?
(effectively with 10k workers, just scrolling or clicking through pages is not very practical)

vdbergh · 2024-07-06T15:25:17Z

Sounds good. Sorting by username is indeed the only thing I do.

vondele · 2024-07-06T15:38:18Z

I actually do sort per core count, compiler version, python version etc...
So maybe the grouping should be 'per sorted item' and on a click expand that? In this case we have all functionality.

peregrineshahin · 2024-07-06T16:12:49Z

I actually do sort per core count, compiler version, python version etc... So maybe the grouping should be 'per sorted item' and on a click expand that? In this case we have all functionality.

Yeah doable.. will work on it

peregrineshahin changed the title ~~get_machines does not use the book keeping data~~ get_machines is too slow and does not use the book keeping data Aug 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_machines is too slow and does not use the book keeping data #2101

get_machines is too slow and does not use the book keeping data #2101

peregrineshahin commented Jul 3, 2024

vdbergh commented Jul 3, 2024

peregrineshahin commented Jul 4, 2024 •

edited

Loading

vdbergh commented Jul 4, 2024

vdbergh commented Jul 4, 2024 •

edited

Loading

vdbergh commented Jul 4, 2024

vdbergh commented Jul 4, 2024

peregrineshahin commented Jul 4, 2024

vdbergh commented Jul 4, 2024

peregrineshahin commented Jul 4, 2024

vdbergh commented Jul 4, 2024 •

edited

Loading

peregrineshahin commented Jul 4, 2024

vdbergh commented Jul 4, 2024 •

edited

Loading

vdbergh commented Jul 4, 2024

peregrineshahin commented Jul 6, 2024

vdbergh commented Jul 6, 2024

vdbergh commented Jul 6, 2024

dubslow commented Jul 6, 2024

vdbergh commented Jul 6, 2024

vdbergh commented Jul 6, 2024 •

edited

Loading

vondele commented Jul 6, 2024

vdbergh commented Jul 6, 2024

vondele commented Jul 6, 2024

vdbergh commented Jul 6, 2024 •

edited

Loading

vdbergh commented Jul 6, 2024

peregrineshahin commented Jul 6, 2024

vdbergh commented Jul 6, 2024

peregrineshahin commented Jul 6, 2024

vdbergh commented Jul 6, 2024

vondele commented Jul 6, 2024

peregrineshahin commented Jul 6, 2024

get_machines is too slow and does not use the book keeping data #2101

get_machines is too slow and does not use the book keeping data #2101

Comments

peregrineshahin commented Jul 3, 2024

vdbergh commented Jul 3, 2024

peregrineshahin commented Jul 4, 2024 • edited Loading

vdbergh commented Jul 4, 2024

vdbergh commented Jul 4, 2024 • edited Loading

vdbergh commented Jul 4, 2024

vdbergh commented Jul 4, 2024

peregrineshahin commented Jul 4, 2024

vdbergh commented Jul 4, 2024

peregrineshahin commented Jul 4, 2024

vdbergh commented Jul 4, 2024 • edited Loading

peregrineshahin commented Jul 4, 2024

vdbergh commented Jul 4, 2024 • edited Loading

vdbergh commented Jul 4, 2024

peregrineshahin commented Jul 6, 2024

vdbergh commented Jul 6, 2024

vdbergh commented Jul 6, 2024

dubslow commented Jul 6, 2024

vdbergh commented Jul 6, 2024

vdbergh commented Jul 6, 2024 • edited Loading

vondele commented Jul 6, 2024

vdbergh commented Jul 6, 2024

vondele commented Jul 6, 2024

vdbergh commented Jul 6, 2024 • edited Loading

vdbergh commented Jul 6, 2024

peregrineshahin commented Jul 6, 2024

vdbergh commented Jul 6, 2024

peregrineshahin commented Jul 6, 2024

vdbergh commented Jul 6, 2024

vondele commented Jul 6, 2024

peregrineshahin commented Jul 6, 2024

peregrineshahin commented Jul 4, 2024 •

edited

Loading

vdbergh commented Jul 4, 2024 •

edited

Loading

vdbergh commented Jul 4, 2024 •

edited

Loading

vdbergh commented Jul 4, 2024 •

edited

Loading

vdbergh commented Jul 6, 2024 •

edited

Loading

vdbergh commented Jul 6, 2024 •

edited

Loading