Operations Playbook: Celery Backlog¶
Trigger Conditions¶
- queue depth grows continuously
- async outcomes delayed beyond SLA
- repeated retries/failures in worker logs
Response Workflow¶
- Confirm worker liveness.
- Inspect registered tasks and ping.
- Validate Redis health/connectivity.
- Identify failing task class patterns.
- Restart/scale worker if needed.
- Validate backlog drain trend.
docker compose logs -f celery redis
docker compose run --rm celery celery -A app inspect ping
docker compose run --rm celery celery -A app inspect registered
Recovery Validation¶
- backlog trend stabilizes or declines
- failed/retried task rates improve
- user-visible async outcomes recover
Follow-up
Add targeted tests for recurring failure signatures and track root causes in post-incident notes.