WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS

Sat Mar 21 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Odoo PgBouncer Pool Exhaustion Incident Runbook

A deterministic response guide for Odoo outages caused by PgBouncer pool/client exhaustion, with safe triage, recovery order, and hardening steps.

If Odoo starts failing with no more connections allowed or request latency explodes while PostgreSQL still has headroom, PgBouncer is often the real bottleneck.

This runbook gives you a production-safe sequence: confirm pool pressure, relieve it without random restarts, then harden the system so the incident does not recur.

Incident signals that justify immediate response

  • Odoo users see intermittent 500/timeout errors while app hosts are otherwise healthy.
  • PgBouncer logs include messages like no more connections allowed (max_client_conn).
  • Odoo queue/cron throughput collapses even though PostgreSQL CPU is not pegged.
  • Connection errors appear at app tier before PostgreSQL reaches max_connections.

Step 0 — Stabilize and prevent self-inflicted damage

  1. Freeze non-critical traffic first (heavy reports, bulk imports, low-priority cron lanes).
  2. Keep checkout/invoicing/auth paths prioritized.
  3. Do not restart PostgreSQL as a first move.
  4. Do not blindly increase every pool limit at once.

Why: random restarts drop in-flight transactions and hide root cause. You need a controlled pressure release.

Step 1 — Confirm whether the choke point is PgBouncer or PostgreSQL

First, inspect PgBouncer internals through the admin console.

# Replace values as needed
export PGBOUNCER_HOST=127.0.0.1
export PGBOUNCER_PORT=6432
export PGBOUNCER_ADMIN_USER=pgbouncer

psql "host=$PGBOUNCER_HOST port=$PGBOUNCER_PORT dbname=pgbouncer user=$PGBOUNCER_ADMIN_USER" -c "SHOW POOLS;"
psql "host=$PGBOUNCER_HOST port=$PGBOUNCER_PORT dbname=pgbouncer user=$PGBOUNCER_ADMIN_USER" -c "SHOW STATS;"
psql "host=$PGBOUNCER_HOST port=$PGBOUNCER_PORT dbname=pgbouncer user=$PGBOUNCER_ADMIN_USER" -c "SHOW CLIENTS;"
psql "host=$PGBOUNCER_HOST port=$PGBOUNCER_PORT dbname=pgbouncer user=$PGBOUNCER_ADMIN_USER" -c "SHOW SERVERS;"

Then confirm PostgreSQL actual pressure:

psql "$ODOO_DB_URI" -c "
with limits as (
  select setting::int as max_connections
  from pg_settings where name='max_connections'
), usage as (
  select count(*) as current_connections,
         count(*) filter (where state='active') as active_connections,
         count(*) filter (where state='idle in transaction') as idle_in_txn
  from pg_stat_activity
  where datname=current_database()
)
select usage.*, limits.max_connections,
       round(100.0*usage.current_connections/nullif(limits.max_connections,0),1) as pct_used
from usage cross join limits;
"

Interpretation:

  • If PgBouncer clients are saturated but PostgreSQL is well below max, PgBouncer limits are your immediate bottleneck.
  • If both are saturated, you have coupled exhaustion and need strict traffic shedding plus leak cleanup.

Step 2 — Identify who is consuming client slots

psql "$ODOO_DB_URI" -c "
select
  coalesce(application_name,'(unset)') as app,
  usename,
  client_addr,
  state,
  count(*) as sessions
from pg_stat_activity
where datname=current_database()
group by 1,2,3,4
order by sessions desc
limit 30;
"

Check for:

  • unexpected client sources (ad-hoc scripts, BI jobs, stale workers),
  • excessive idle in transaction,
  • noisy background jobs that can be paused without user-facing impact.

Step 3 — Relieve pressure in a safe order

Use this order to avoid unnecessary transaction loss.

  1. Pause non-critical Odoo background throughput.

  2. Cancel long-running low-priority SQL first:

psql "$ODOO_DB_URI" -c "
select pid, now()-query_start as run_for, left(query,120) as sample
from pg_stat_activity
where datname=current_database()
  and state='active'
  and now()-query_start > interval '2 minutes'
order by query_start asc
limit 20;
"

# Safer first action
psql "$ODOO_DB_URI" -c "select pg_cancel_backend(<pid>);"
  1. Terminate only confirmed offenders that do not clear:
psql "$ODOO_DB_URI" -c "select pg_terminate_backend(<pid>);"
  1. If PgBouncer config was tuned during incident, reload PgBouncer (not Postgres):
psql "host=$PGBOUNCER_HOST port=$PGBOUNCER_PORT dbname=pgbouncer user=$PGBOUNCER_ADMIN_USER" -c "RELOAD;"

Never bulk-kill all sessions. Preserve critical app traffic and replication/admin channels.

Step 4 — Recover in controlled increments

Bring capacity back lane by lane:

  1. Web traffic stable
  2. Critical queues
  3. Remaining cron/reporting workloads

During recovery, watch both layers every 10s:

watch -n 10 '
psql "host='$PGBOUNCER_HOST' port='$PGBOUNCER_PORT' dbname=pgbouncer user='$PGBOUNCER_ADMIN_USER'" -Atc "SHOW POOLS;" | head -n 20
psql "$ODOO_DB_URI" -Atc "select count(*) from pg_stat_activity where datname=current_database();"
'

If pressure immediately re-appears, re-freeze low-priority load and continue leak/source isolation.

Step 5 — Exit criteria before incident closure

  • No new PgBouncer no more connections allowed events in logs for at least 15 minutes.
  • Odoo critical transactions succeed (login, quote/sale, invoice post, payment capture).
  • Queue depth is decreasing, not flat.
  • PostgreSQL and PgBouncer connection metrics are stable and below emergency thresholds.

Hardening checklist (after service is stable)

  • Tune PgBouncer deliberately: max_client_conn, default_pool_size, reserve_pool_size, pool_mode.
  • Keep pool_mode=transaction for most Odoo workloads unless a validated reason exists otherwise.
  • Add alerting on:
    • PgBouncer client saturation
    • wait queue growth / pool starvation
    • PostgreSQL idle in transaction age and count
  • Enforce timeout guardrails (statement_timeout, lock_timeout, idle_in_transaction_session_timeout) with app-aware defaults.
  • Review Odoo worker/cron concurrency versus actual DB capacity; avoid unconstrained worker growth.
  • Add a staging game day that intentionally saturates PgBouncer and rehearses this runbook.

Fast diagnostic query pack

# PostgreSQL oldest idle-in-tx
psql "$ODOO_DB_URI" -c "
select pid, usename, application_name, now()-xact_start as txn_age, left(query,120)
from pg_stat_activity
where datname=current_database()
  and state='idle in transaction'
order by xact_start asc
limit 20;
"

# PostgreSQL blockers/waiters
psql "$ODOO_DB_URI" -c "
select a.pid as waiting_pid,
       a.query as waiting_query,
       b.pid as blocking_pid,
       b.query as blocking_query
from pg_locks wa
join pg_stat_activity a on a.pid=wa.pid
join pg_locks wb on wb.locktype=wa.locktype
  and wb.database is not distinct from wa.database
  and wb.relation is not distinct from wa.relation
  and wb.page is not distinct from wa.page
  and wb.tuple is not distinct from wa.tuple
  and wb.virtualxid is not distinct from wa.virtualxid
  and wb.transactionid is not distinct from wa.transactionid
  and wb.classid is not distinct from wa.classid
  and wb.objid is not distinct from wa.objid
  and wb.objsubid is not distinct from wa.objsubid
  and wb.pid <> wa.pid
join pg_stat_activity b on b.pid=wb.pid
where not wa.granted and wb.granted;
"

The operational rule: treat PgBouncer incidents as queueing failures with upstream causes, not as a restart problem. Stabilize load, remove offenders safely, then tune with measurements.

Back to blog