Sat Mar 21 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Odoo PostgreSQL Checkpoint I/O Spike Mitigation Runbook

A practical incident runbook for stabilizing Odoo when PostgreSQL checkpoint bursts cause latency spikes, write stalls, and cascading worker slowdowns.

When Odoo latency suddenly jumps during traffic bursts, and PostgreSQL disk writes spike in the same window, checkpoint pressure is a common root cause.

This runbook gives a safe response order: confirm checkpoint-driven pressure, reduce write burst intensity, recover throughput, then harden settings so the pattern does not return.

Incident signals that justify immediate action

Odoo endpoints with write-heavy flows (checkout, invoicing, stock moves) become slow at the same time.
PostgreSQL logs show frequent checkpoint messages (checkpoints are occurring too frequently).
Disk I/O utilization (%util) stays high with growing request latency.
Odoo worker timeout/slow-request rates increase even when CPU is not fully saturated.
pg_stat_bgwriter checkpoint counters climb faster than normal baseline.

Step 0 — Stabilize before changing PostgreSQL settings

Freeze non-critical write amplification lanes (bulk imports, recompute jobs, historical backfills).
Pause deploys/module upgrades and incident-unrelated cron lanes.
Keep one operator running commands and one operator tracking timeline/actions.
Do not restart PostgreSQL first; capture evidence before any tuning or reload.

Step 1 — Confirm checkpoint pressure and quantify severity

1.1 Inspect checkpoint cadence and write pressure

psql "$ODOO_DB_URI" -c "
select
  checkpoints_timed,
  checkpoints_req,
  checkpoint_write_time,
  checkpoint_sync_time,
  buffers_checkpoint,
  buffers_backend,
  maxwritten_clean
from pg_stat_bgwriter;
"

Interpretation during incident:

Fast growth in checkpoints_req suggests WAL pressure is forcing early checkpoints.
High checkpoint_sync_time suggests fsync flush pressure (storage can’t absorb checkpoint burst smoothly).
High buffers_backend means backend processes are doing writes themselves (background write smoothing is insufficient).

1.2 Check if WAL growth is repeatedly hitting limits

psql "$ODOO_DB_URI" -c "
select
  name,
  setting,
  unit
from pg_settings
where name in (
  'checkpoint_timeout',
  'max_wal_size',
  'min_wal_size',
  'checkpoint_completion_target'
)
order by name;
"

# Optional host-level pressure check (Linux)
iostat -xz 1 10
vmstat 1 10

1.3 Validate query and lock side-effects in Odoo traffic

psql "$ODOO_DB_URI" -c "
select
  count(*) filter (where state = 'active') as active,
  count(*) filter (where state = 'idle in transaction') as idle_in_txn,
  count(*) filter (
    where state = 'active'
      and now() - query_start > interval '30 seconds'
  ) as active_over_30s
from pg_stat_activity
where datname = current_database();
"

If long-running active statements rise only during checkpoint bursts, the incident is likely I/O pacing failure, not a pure SQL-plan regression.

Step 2 — Apply production-safe remediation order

2.1 Reduce write burst amplitude first

Temporarily pause high-write non-critical jobs (mass stock valuation recomputes, large import batches, bulk mail state updates).
Keep revenue-critical and accounting-critical writes alive where possible.
If queue workers exist, reduce concurrency instead of hard stop.

2.2 Smooth checkpoint behavior (safe-first)

If your current values are aggressive for current write volume, prefer gradual tuning:

-- Persist carefully, then reload/restart based on your platform policy.
ALTER SYSTEM SET checkpoint_completion_target = '0.9';
ALTER SYSTEM SET max_wal_size = '8GB';
ALTER SYSTEM SET min_wal_size = '2GB';

Apply config changes:

SELECT pg_reload_conf();

Notes:

checkpoint_completion_target=0.9 spreads checkpoint I/O over more of the interval.
Increasing max_wal_size reduces forced checkpoints during short write spikes.
Validate disk headroom before increasing WAL retention limits.

2.3 Protect user traffic from indefinite waits during incident

-- Use only if not already enforced by policy; prefer role-level for Odoo user.
ALTER ROLE odoo SET statement_timeout = '60s';
ALTER ROLE odoo SET lock_timeout = '10s';

This limits blast radius from stalled write paths while storage pressure is being stabilized.

Step 3 — Verification loop before reopening full load

Run a short loop while gradually re-enabling paused lanes:

watch -n 10 "psql \"$ODOO_DB_URI\" -Atc \"
select
  now(),
  checkpoints_timed,
  checkpoints_req,
  checkpoint_write_time,
  checkpoint_sync_time
from pg_stat_bgwriter;
\""

And confirm user-path health at the same time:

Login, quotation confirm, invoice post, and stock transfer flows succeed.
Odoo timeout/error rate returns near baseline.
Disk %util and await metrics trend down from incident peak.

Rollback and safety checks

If latency worsens after tuning changes:

Re-freeze the last write lane you re-enabled.
Revert the most recent parameter change first (single-variable rollback).
Re-check pg_stat_bgwriter counters and disk pressure.
Escalate to storage/IOPS capacity path if checkpoint smoothing alone does not recover service.

Example rollback:

ALTER SYSTEM RESET checkpoint_completion_target;
ALTER SYSTEM RESET max_wal_size;
ALTER SYSTEM RESET min_wal_size;
SELECT pg_reload_conf();

Only reset values that were changed during incident; preserve known-good baseline overrides.

Hardening checklist (post-incident)

Alert on checkpoint frequency and checkpoints_req acceleration.
Track checkpoint_write_time/checkpoint_sync_time trend, not just one-off values.
Keep large imports/backfills behind throttled job lanes and off peak interactive windows.
Validate WAL/disk headroom before seasonal traffic events.
Enable and review pg_stat_statements to separate checkpoint issues from SQL-plan regressions.
Run a staging load drill with burst writes and verify checkpoint metrics stay controlled.

Practical references

PostgreSQL WAL and checkpoint settings: https://www.postgresql.org/docs/current/runtime-config-wal.html
PostgreSQL monitoring statistics (pg_stat_bgwriter, pg_stat_activity): https://www.postgresql.org/docs/current/monitoring-stats.html
PostgreSQL logging and checkpoint warnings: https://www.postgresql.org/docs/current/runtime-config-logging.html
Odoo deployment guidance: https://www.odoo.com/documentation/17.0/administration/on_premise/deploy.html

Operational rule: do not treat checkpoint incidents as “just increase IOPS” by default. First reduce write burst pressure, then smooth checkpoint pacing, then scale capacity with measured evidence.

Back to blog