Odoo Cron Job Pile-Up and Transaction Contention Runbook
A production-safe runbook to stabilize Odoo when scheduled actions pile up, cron lag grows, and PostgreSQL lock contention starts impacting user traffic.
When Odoo scheduled actions (cron jobs) start piling up, the failure pattern is sneaky: users first see random slowness, then write operations block behind long-running cron transactions.
This runbook gives a deterministic sequence: identify backlog and lock sources, drain safely, recover interactive throughput, then harden recurrence controls.
Incident signals that justify immediate response
- Scheduled actions in Settings → Technical → Scheduled Actions show repeated
Next Execution Datedrift or missed runs. - User-facing transactions (confirm order, post invoice, stock operations) become slow or intermittently timeout.
- PostgreSQL shows growing lock waits and long transactions tied to Odoo sessions.
- Odoo logs show repeated cron starts without successful completion or repeated retries.
- Queue depth in business processes grows while CPU is not fully saturated (contention, not pure compute exhaustion).
Step 0 — Stabilize the system before changing anything
- Freeze deploys/module upgrades.
- Pause non-essential heavy scheduled actions (mass mailing, large recompute jobs, sync jobs).
- Keep one operator on commands and one person writing timeline/decisions.
- Do not restart all Odoo workers yet; gather lock/transaction evidence first.
Step 1 — Confirm cron pile-up and identify blast radius
1.1 Inspect cron backlog directly from PostgreSQL
psql "$ODOO_DB_URI" -c "
select
count(*) as total_crons,
count(*) filter (where active) as active_crons,
count(*) filter (where active and nextcall < now()) as overdue_active_crons,
min(nextcall) filter (where active and nextcall < now()) as oldest_overdue_nextcall
from ir_cron;
"
psql "$ODOO_DB_URI" -c "
select
id,
name,
active,
nextcall,
interval_number,
interval_type,
numbercall,
doall
from ir_cron
where active
order by nextcall asc
limit 30;
"
1.2 Check lock pressure and waiting queries
psql "$ODOO_DB_URI" -c "
select
a.pid,
a.usename,
a.application_name,
a.state,
now() - a.query_start as query_age,
a.wait_event_type,
a.wait_event,
left(a.query, 180) as query
from pg_stat_activity a
where a.datname = current_database()
and (a.wait_event_type = 'Lock' or (a.state = 'active' and now() - a.query_start > interval '30 seconds'))
order by a.query_start asc
limit 30;
"
psql "$ODOO_DB_URI" -c "
select
blocked.pid as blocked_pid,
now() - blocked.query_start as blocked_for,
blocker.pid as blocker_pid,
now() - blocker.query_start as blocker_age,
left(blocked.query, 120) as blocked_query,
left(blocker.query, 120) as blocker_query
from pg_locks bl
join pg_stat_activity blocked on blocked.pid = bl.pid
join pg_locks kl
on kl.locktype = bl.locktype
and kl.database is not distinct from bl.database
and kl.relation is not distinct from bl.relation
and kl.page is not distinct from bl.page
and kl.tuple is not distinct from bl.tuple
and kl.virtualxid is not distinct from bl.virtualxid
and kl.transactionid is not distinct from bl.transactionid
and kl.classid is not distinct from bl.classid
and kl.objid is not distinct from bl.objid
and kl.objsubid is not distinct from bl.objsubid
and kl.pid != bl.pid
join pg_stat_activity blocker on blocker.pid = kl.pid
where not bl.granted
order by blocked.query_start asc
limit 20;
"
If overdue crons are rising and blocked query age is climbing, you are in cron-induced contention collapse.
Step 2 — Safe remediation sequence (cancel before terminate)
2.1 Pause only the highest-impact cron jobs first
Disable clearly non-critical crons temporarily so interactive traffic can recover:
# Replace IDs with incident-confirmed heavy jobs from Step 1 output
psql "$ODOO_DB_URI" -c "update ir_cron set active = false where id in (<cron_id_1>, <cron_id_2>);"
Record every disabled cron ID for rollback.
2.2 Cancel long-running blocker sessions
# Safer first action
psql "$ODOO_DB_URI" -c "select pg_cancel_backend(<blocker_pid>);"
# Escalate only if cancellation fails within your incident timeout
psql "$ODOO_DB_URI" -c "select pg_terminate_backend(<blocker_pid>);"
Avoid bulk termination. Preserve known healthy app, admin, and replication sessions.
2.3 Put temporary guardrails on lock wait behavior
-- Use with app-owner approval. Prefer role-level controls over global panic changes.
ALTER ROLE odoo SET lock_timeout = '8s';
ALTER ROLE odoo SET statement_timeout = '90s';
Validate effective values:
psql "$ODOO_DB_URI" -c "show lock_timeout; show statement_timeout;"
Step 3 — Drain backlog in controlled lanes
- Keep critical user-facing transactions prioritized.
- Re-enable paused cron jobs in small batches (one class of workload at a time).
- Watch lock waits and overdue cron count between each re-enable.
Verification loop during drain:
watch -n 15 "psql \"$ODOO_DB_URI\" -Atc \"select count(*) from ir_cron where active and nextcall < now();\""
watch -n 15 "psql \"$ODOO_DB_URI\" -Atc \"select count(*) from pg_stat_activity where datname=current_database() and wait_event_type='Lock';\""
If either metric climbs after enabling a cron group, roll back that group immediately and investigate that job's SQL path.
Step 4 — Rollback and incident exit criteria
Rollback actions:
- Re-enable any cron jobs you disabled once system is stable:
psql "$ODOO_DB_URI" -c "update ir_cron set active = true where id in (<cron_id_1>, <cron_id_2>);"
- If temporary role timeouts break legitimate long business processes, revert to baseline:
ALTER ROLE odoo RESET lock_timeout;
ALTER ROLE odoo RESET statement_timeout;
Close incident only when these hold for at least 15 minutes:
- Overdue active cron count is trending down or stable near baseline.
wait_event_type='Lock'sessions stay low and do not trend upward.- No fresh timeout spikes in Odoo logs.
- Core write flows succeed (SO confirm, invoice post, stock transfer validate).
Hardening checklist (post-incident)
- Audit every high-frequency cron for runtime and lock footprint.
- Split heavy crons by shard/domain/time window instead of one giant transaction.
- Add per-cron runtime SLOs and alert on
nextcalllag. - Ensure critical and non-critical crons run in separate windows/concurrency lanes.
- Add/verify indexes used by cron-driven update domains.
- Keep
pg_stat_statementsenabled and review top total-time queries weekly. - Rehearse staged cron backlog recovery in staging with production-like data volume.
Practical references
- Odoo scheduled actions (knowledge + operations context): https://www.odoo.com/knowledge/article/790
- Odoo deployment and worker/process guidance: https://www.odoo.com/documentation/17.0/administration/on_premise/deploy.html
- PostgreSQL lock monitoring via
pg_stat_activityand stats views: https://www.postgresql.org/docs/current/monitoring-stats.html - PostgreSQL explicit locking behavior: https://www.postgresql.org/docs/current/explicit-locking.html
- PostgreSQL client timeouts (
lock_timeout,statement_timeout): https://www.postgresql.org/docs/current/runtime-config-client.html
Principle: during cron pile-up incidents, protect interactive transactions first, then reintroduce automation in measured lanes with lock telemetry guiding each step.