Skip to content

Operations Playbook: Celery Backlog

Trigger Conditions

  • queue depth grows continuously
  • async outcomes delayed beyond SLA
  • repeated retries/failures in worker logs

Response Workflow

  1. Confirm worker liveness.
  2. Inspect registered tasks and ping.
  3. Validate Redis health/connectivity.
  4. Identify failing task class patterns.
  5. Restart/scale worker if needed.
  6. Validate backlog drain trend.
docker compose logs -f celery redis
docker compose run --rm celery celery -A app inspect ping
docker compose run --rm celery celery -A app inspect registered

Recovery Validation

  • backlog trend stabilizes or declines
  • failed/retried task rates improve
  • user-visible async outcomes recover
Follow-up

Add targeted tests for recurring failure signatures and track root causes in post-incident notes.