Sat Mar 21 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Odoo PostgreSQL Connection Saturation Incident Runbook

A practical CLI-first recovery workflow for when Odoo starts hitting max PostgreSQL connections and user traffic begins to fail.

When Odoo incidents start with too many clients already, panic actions usually make it worse. This runbook gives operators a deterministic sequence: confirm saturation, stop the leak, recover service, then harden.

Incident signals worth immediate action

Odoo logs show FATAL: sorry, too many clients already.
Login, save, or checkout flows fail intermittently while CPU looks normal.
Queue workers stop progressing even though jobs remain pending.
Monitoring shows rapid connection growth without matching request throughput.

Step 0 — Stabilize before tuning anything

Freeze non-critical batch traffic (imports, low-priority cron jobs, heavy reports).
Keep one operator running commands and one operator tracking timeline/decisions.
Do not raise max_connections as first response unless memory headroom is verified.

Increasing connection limits under pressure can shift the failure from connection errors to RAM exhaustion.

Step 1 — Confirm saturation and where sessions come from

# Current usage vs configured max_connections
psql "$ODOO_DB_URI" -c "
with limits as (
  select setting::int as max_connections
  from pg_settings
  where name = 'max_connections'
), usage as (
  select
    count(*) as current_connections,
    count(*) filter (where state = 'active') as active_connections,
    count(*) filter (where state = 'idle') as idle_connections,
    count(*) filter (where state = 'idle in transaction') as idle_in_txn
  from pg_stat_activity
  where datname = current_database()
)
select
  usage.current_connections,
  limits.max_connections,
  round(100.0 * usage.current_connections / nullif(limits.max_connections, 0), 1) as pct_used,
  usage.active_connections,
  usage.idle_connections,
  usage.idle_in_txn
from usage cross join limits;
"

# Attribute sessions by application/user/client
psql "$ODOO_DB_URI" -c "
select
  coalesce(application_name, '(unset)') as app,
  usename,
  client_addr,
  state,
  count(*) as sessions
from pg_stat_activity
where datname = current_database()
group by 1,2,3,4
order by sessions desc
limit 30;
"

Step 2 — Remove the highest-risk leak safely

Prioritize long idle in transaction sessions first: they hold resources, block vacuum, and can amplify lock incidents.

psql "$ODOO_DB_URI" -c "
select pid, application_name, usename, client_addr,
       now() - xact_start as txn_age,
       now() - state_change as idle_for,
       left(query, 140) as last_query
from pg_stat_activity
where datname = current_database()
  and state = 'idle in transaction'
order by xact_start asc
limit 20;
"

Cancel before terminate:

# Safer first move
psql "$ODOO_DB_URI" -c "select pg_cancel_backend(<pid>);"

# Use terminate only if cancel does not clear within your incident timeout
psql "$ODOO_DB_URI" -c "select pg_terminate_backend(<pid>);"

Never bulk-kill every session. Keep Odoo app workers and replication/admin sessions alive unless explicitly identified as the leak source.

Step 3 — Recover throughput in controlled increments

Re-check pct_used and confirm it is trending down.
Re-enable paused traffic one lane at a time (queue workers, then cron/reporting).
Watch for immediate re-growth (that indicates leak still active).

# Fast repeated check during recovery
watch -n 10 "psql \"$ODOO_DB_URI\" -Atc \"select count(*) from pg_stat_activity where datname = current_database();\""

Step 4 — Exit criteria before closing incident

Connection usage stable below emergency threshold (for example <75%) for at least 15 minutes.
No new too many clients already entries in logs.
Critical user transactions (login, quote, invoice posting) complete successfully.
Queue depth is decreasing, not flatlining.

Hardening actions after recovery

Put an alert on % of max_connections and idle in transaction count.
Review Odoo worker and connection-pool settings in code/config (not ad-hoc shell edits).
Add timeouts for client-side leaked sessions (idle_in_transaction_session_timeout where appropriate).
Add a weekly replay drill: intentionally saturate staging and rehearse this runbook.

The principle: connection incidents are usually leak problems, not capacity problems. Fix the leak path first, then resize with evidence.

Back to blog