Sat Mar 21 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Odoo PostgreSQL Dead Tuple and Autovacuum Bloat Incident Runbook

A production-safe runbook to detect dead tuple buildup, unblock autovacuum, and recover Odoo query latency before bloat becomes outage-level.

When Odoo gets slower over hours or days (without obvious CPU spikes), dead tuple accumulation and stalled autovacuum are common causes. This runbook gives a deterministic response: confirm bloat pressure, remove blockers safely, recover table health, and prevent recurrence.

Incident signals

Treat this as an active reliability incident when multiple signals persist:

Odoo list views/searches degrade across multiple modules (sale.order, stock.move, account.move).
Database size grows quickly while live row counts do not.
n_dead_tup keeps rising on high-write tables.
Long-running transactions or idle in transaction sessions are present.
Autovacuum workers are active but not reducing dead tuples over repeated checks.

Step 0 — Stabilize write pressure first

Before touching vacuum settings or running manual vacuum, reduce incoming churn:

Pause non-critical imports/batch writes/large cron jobs.
Keep checkout, invoicing, and warehouse-critical flows online if possible.
Assign one operator to DB commands and one to incident notes/timeline.

This prevents chasing a moving target while dead tuples keep accumulating.

Step 1 — Baseline the blast radius

# Top dead-tuple tables in current database
psql "$ODOO_DB_URI" -c "
select
  schemaname,
  relname,
  n_live_tup,
  n_dead_tup,
  round(100.0 * n_dead_tup / nullif(n_live_tup + n_dead_tup, 0), 2) as dead_pct,
  last_autovacuum,
  vacuum_count,
  autovacuum_count
from pg_stat_user_tables
order by n_dead_tup desc
limit 20;
"

# Database-wide tuple churn signal
psql "$ODOO_DB_URI" -c "
select datname,
       numbackends,
       xact_commit,
       xact_rollback,
       tup_inserted,
       tup_updated,
       tup_deleted,
       deadlocks
from pg_stat_database
where datname = current_database();
"

Capture this output in incident notes so you can verify trend reversal later.

Step 2 — Check whether autovacuum is blocked

# Active vacuum progress
psql "$ODOO_DB_URI" -c "
select
  pid,
  relid::regclass as relation,
  phase,
  heap_blks_total,
  heap_blks_scanned,
  heap_blks_vacuumed,
  index_vacuum_count,
  max_dead_tuples,
  num_dead_tuples
from pg_stat_progress_vacuum;
"

# Long transactions that can block cleanup horizons
psql "$ODOO_DB_URI" -c "
select
  pid,
  usename,
  application_name,
  state,
  now() - xact_start as txn_age,
  now() - query_start as query_age,
  wait_event_type,
  wait_event,
  left(query, 160) as query
from pg_stat_activity
where datname = current_database()
  and xact_start is not null
order by xact_start asc
limit 20;
"

Triage checklist

Are oldest transactions older than your normal request/worker profile?
Is state = 'idle in transaction' present for many minutes?
Is autovacuum repeatedly scanning the same hot table with little progress?
Are lock waits (wait_event_type = 'Lock') preventing vacuum completion?

If yes, clear blockers before forcing aggressive manual cleanup.

Step 3 — Remove blockers in safest order

Use escalation, not bulk termination:

Let short-running, near-complete transactions finish.
Cancel obvious runaway statements.
Terminate only clearly stuck sessions that keep cleanup blocked.

# Safer first action
psql "$ODOO_DB_URI" -c "select pg_cancel_backend(<pid>);"

# Escalate only when cancellation fails and impact is widening
psql "$ODOO_DB_URI" -c "select pg_terminate_backend(<pid>);"

Avoid terminating replication, backup, or migration sessions unless confirmed as incident root cause.

Step 4 — Controlled manual vacuum on hottest tables

After blocker cleanup and write-pressure reduction, run targeted vacuum on top offender tables.

# Example: run one table at a time, reassess between each
psql "$ODOO_DB_URI" -c "vacuum (verbose, analyze) public.stock_move;"
psql "$ODOO_DB_URI" -c "vacuum (verbose, analyze) public.account_move_line;"

Guidance:

Prefer VACUUM (ANALYZE) first; it is online and safer during incidents.
Do not run VACUUM FULL during peak traffic unless outage severity justifies an AccessExclusive lock.
If index bloat is dominant and urgent, use REINDEX INDEX CONCURRENTLY for targeted indexes (slower but less blocking).

Step 5 — Verification and rollback guardrails

Verify recovery trend

# Re-check dead tuples on the same top tables every 5–10 minutes
psql "$ODOO_DB_URI" -c "
select relname, n_live_tup, n_dead_tup, last_autovacuum
from pg_stat_user_tables
where relname in ('stock_move', 'account_move_line', 'mail_message')
order by n_dead_tup desc;
"

# Check Odoo error/latency symptoms improving
odoocli logs tail --service odoo --since 15m --grep "timeout|canceling statement|could not serialize|deadlock"

Incident can be closed when:

Dead-tuple counts trend downward across consecutive samples.
User-facing Odoo flows recover to normal latency.
No sustained lock/timeout burst appears in logs.
Autovacuum activity resumes normal cadence on hot tables.

Rollback/safety actions if recovery worsens

Stop manual vacuum cycle and reassess blockers/lock graph.
Re-enable paused workloads gradually (one lane at a time) only after trend improves.
If vacuum attempts cause unacceptable latency, defer heavy operations and schedule off-peak maintenance window.

Hardening and prevention checklist

Alert on high n_dead_tup for critical Odoo tables.
Alert on oldest transaction age and idle in transaction session count.
Enforce idle_in_transaction_session_timeout for leak-prone clients.
Tune autovacuum per hot table (scale factor and threshold) instead of global-only tuning.
Review large batch jobs to commit in smaller chunks.
Run periodic bloat review (tables + indexes) and planned maintenance before incidents.

The operating principle: clear blockers first, then vacuum surgically, then tune with evidence. This avoids turning bloat recovery into a lock-amplified outage.

Back to blog