WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS

Sat Mar 21 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Odoo PostgreSQL XID Wraparound Emergency Runbook

A production-safe runbook to detect, triage, and remediate PostgreSQL transaction ID wraparound risk in Odoo workloads before forced downtime.

If PostgreSQL transaction ID (XID) age grows too high, your Odoo database can enter anti-wraparound emergency behavior and eventually refuse writes to protect data integrity.

This runbook is for live incidents: confirm risk level quickly, remove blockers, vacuum oldest relations in safe order, verify recovery, and harden so it does not recur.

Incident signals that justify immediate action

  • PostgreSQL logs mention wraparound risk or anti-wraparound vacuum pressure.
  • Autovacuum is running constantly but XID age keeps increasing.
  • Odoo write paths degrade (checkout, invoicing, stock operations), while long transactions accumulate.
  • age(datfrozenxid) is approaching autovacuum_freeze_max_age for the Odoo database.

Step 0 — Stabilize application pressure first

  1. Pause non-critical write-heavy lanes (bulk imports, mass recomputes, historical backfills).
  2. Pause non-essential Odoo cron jobs; keep only revenue/accounting-critical jobs active.
  3. Freeze deploys and module upgrades until database age is stabilized.
  4. Assign one operator for DB actions and one for incident timeline/decision logging.

Do not restart PostgreSQL blindly; first confirm XID age and blockers.

Step 1 — Confirm wraparound risk and blast radius

1.1 Check database-level XID age against freeze threshold

psql "$ODOO_DB_URI" -c "
select
  datname,
  age(datfrozenxid) as xid_age,
  current_setting('autovacuum_freeze_max_age')::bigint as freeze_max_age,
  round(
    100.0 * age(datfrozenxid)
    / current_setting('autovacuum_freeze_max_age')::bigint,
    2
  ) as pct_of_freeze_max_age
from pg_database
order by xid_age desc;
"

Prioritize action when the Odoo database is high and still climbing (for many teams, this is already urgent above 80%).

1.2 Find oldest tables (where freezing effort should start)

psql "$ODOO_DB_URI" -c "
select
  n.nspname as schema,
  c.relname as table,
  age(c.relfrozenxid) as rel_xid_age,
  pg_size_pretty(pg_total_relation_size(c.oid)) as total_size
from pg_class c
join pg_namespace n on n.oid = c.relnamespace
where c.relkind = 'r'
  and n.nspname not in ('pg_catalog', 'information_schema')
order by age(c.relfrozenxid) desc
limit 30;
"

1.3 Check autovacuum blockers (usually long transactions)

psql "$ODOO_DB_URI" -c "
select
  pid,
  usename,
  application_name,
  client_addr,
  state,
  now() - xact_start as xact_age,
  wait_event_type,
  wait_event,
  left(query, 160) as query
from pg_stat_activity
where xact_start is not null
order by xact_start asc
limit 30;
"

Any very old idle in transaction session can prevent cleanup progress and must be handled deliberately.

Step 2 — Safe remediation sequence

2.1 Remove blockers with cancel-first, terminate-second discipline

Start by canceling oldest non-critical blocking sessions:

select pg_cancel_backend(<pid>);

If the session remains stuck or immediately re-blocks cleanup, terminate it:

select pg_terminate_backend(<pid>);

Order of operations:

  1. Non-critical BI/reporting sessions.
  2. Stale app workers (idle in transaction) not serving active user requests.
  3. Last resort: long-running business transactions after application owner sign-off.

2.2 Run targeted manual vacuum freeze on highest-risk tables

Vacuum oldest tables first from Step 1.2 output:

VACUUM (VERBOSE, FREEZE, ANALYZE) public.<table_name>;

For large tables, do them one by one and reassess between runs. Avoid launching many manual vacuums in parallel during user-traffic peaks.

2.3 Confirm autovacuum workers are not starved

psql "$ODOO_DB_URI" -c "
select
  pid,
  datname,
  relid::regclass as relation,
  phase,
  heap_blks_total,
  heap_blks_scanned,
  heap_blks_vacuumed
from pg_stat_progress_vacuum;
"

If no progress appears while age is critical, revisit blocker sessions and lock contention before tuning knobs.

2.4 Optional temporary tuning during emergency window

Use only under change control and revert after incident:

ALTER SYSTEM SET autovacuum_vacuum_cost_limit = '4000';
ALTER SYSTEM SET autovacuum_vacuum_cost_delay = '2ms';
SELECT pg_reload_conf();

These can speed cleanup but may increase foreground I/O pressure. Track Odoo latency while applying.

Step 3 — Verification loop before normalizing traffic

Run every 5–10 minutes during response:

watch -n 30 "psql \"$ODOO_DB_URI\" -Atc \"
select datname,
       age(datfrozenxid),
       round(100.0*age(datfrozenxid)/current_setting('autovacuum_freeze_max_age')::bigint,2)
from pg_database
order by age(datfrozenxid) desc;
\""

Also verify:

  • Oldest relfrozenxid table ages trend downward.
  • Odoo create/update flows recover (sales order confirm, invoice post, stock move validate).
  • New long-lived idle in transaction sessions are not reappearing.

Rollback and safety checks

If emergency tuning increases user-facing latency too much:

  1. Pause the last tuning change first.
  2. Revert ALTER SYSTEM overrides one variable at a time.
  3. Keep targeted VACUUM (FREEZE) on highest-age tables running in controlled sequence.
  4. Re-check age trend before re-enabling all paused job lanes.

Rollback example:

ALTER SYSTEM RESET autovacuum_vacuum_cost_limit;
ALTER SYSTEM RESET autovacuum_vacuum_cost_delay;
SELECT pg_reload_conf();

Hardening checklist (post-incident)

  • Alert on database age(datfrozenxid) percentage of autovacuum_freeze_max_age.
  • Alert on oldest-table age(relfrozenxid) trend for top Odoo tables.
  • Kill or auto-timeout stale app transactions (idle in transaction) with policy guardrails.
  • Set and enforce idle_in_transaction_session_timeout for app roles where safe.
  • Review cron/import patterns that create long transactions and split into smaller commits.
  • Keep regular vacuum observability dashboards (pg_stat_all_tables, pg_stat_progress_vacuum).
  • Include XID-age checks in pre-maintenance and pre-upgrade readiness checklists.

Practical references

Operational rule: in wraparound-risk incidents, prioritize age reduction and blocker removal over broad restarts or schema-level changes. Controlled vacuum progress is the recovery path.

Back to blog