New Celery queue to debug CPU/memory issues #2693
Labels
area/general
Related to whole service, not a specific part/integration.
complexity/single-task
Regular task, should be done within days.
kind/internal
Doesn't affect users directly, may be e.g. infrastructure, DB related.
Related to #2522
Context
Based on the discussion on the arch meeting with regards to the CPU and memory issues. The issue is tied to the production deployment (higher load) and short-running workers (tasks are run concurrently).
As for the memory issues the best “guess” is failing clean up, therefore let's set up a new queue that will run in the same way as the short-running one, but just with a subset of tasks to try to pinpoint specific handlers that are causing issues.
TODO
Create a new queue
Pick a task (e.g.,
process_message
since there's a high amount of those) or subset of tasks (e.g., less frequent tasks that could be filtered further on) that will run in that queue(optionally) Unify the way tasks are split between the queues; currently we declare the queue both in the decorator:
packit-service/packit_service/worker/tasks.py
Line 413 in 2c46677
and also in the global Celery config:
packit-service/packit_service/celery_config.py
Lines 18 to 23 in 2c46677
(optionally) Improve the docs on what tasks are supposed to be run where; currently by default everything gets run in the short-running unless specified otherwise, also there are some tasks that stand out, e.g., VM Image build being triggered from the short-running
Based on the time spent on previous points, either stalk the OpenShift/Celery metrics, or create a follow-up card
The text was updated successfully, but these errors were encountered: