Odoo Outbound Mail Queue Incident Runbook (When Customer Emails Stop)
A practical recovery flow for diagnosing stuck or failed Odoo outbound emails without blindly retrying every message.
When invoice emails, portal invites, or reset links stop arriving, operators often discover the problem only after customers complain. This runbook gives a deterministic response path: measure queue state, isolate failure mode, recover safely, and prevent recurrence.
Incident signals
Treat outbound mail as an incident when one or more signals persist for 10+ minutes:
mail_mailrows inoutgoingorexceptionkeep increasing.- Oldest unsent email timestamp keeps getting older.
- SMTP logs show authentication, timeout, or policy failures.
- Business-critical flows (invoice delivery, portal access, approval notifications) are delayed.
Step 1 — Capture baseline before intervention
# Queue shape by state + oldest/newest message age
psql "$ODOO_DB_URI" -c "
select state,
count(*) as jobs,
min(create_date) as oldest_created,
max(create_date) as newest_created
from mail_mail
where state in ('outgoing','exception')
group by state
order by jobs desc;
"
# Most recent failure signatures
psql "$ODOO_DB_URI" -c "
select id,
email_from,
array_to_string(email_to, ',') as recipients,
substring(failure_reason from 1 for 180) as failure_reason,
create_date
from mail_mail
where state = 'exception'
order by create_date desc
limit 25;
"
Store this output in incident notes. You need a before/after reference to confirm real recovery.
Step 2 — Confirm whether failure is transport, credentials, or policy
Check Odoo logs and SMTP endpoint health before retrying.
# Odoo-side send errors
odoocli logs tail --service odoo --since 20m --grep "mail|smtp|email|exception"
# Basic SMTP connectivity test (adjust host/port)
nc -vz "$SMTP_HOST" "$SMTP_PORT"
Typical failure classes:
- Auth failure (
535, invalid credentials, expired app password) - Transport/network (timeouts, DNS/TLS handshake errors)
- Provider policy (rate limits, sender/domain policy rejection)
Step 3 — Recover in safe order (do not replay everything blindly)
- Fix root cause first (credentials/network/policy).
- Retry a small canary batch.
- Verify successful delivery trend.
- Retry remaining backlog in controlled chunks.
# Retry only recent exceptions first (example: last 100)
psql "$ODOO_DB_URI" -c "
update mail_mail
set state = 'outgoing',
failure_reason = null
where id in (
select id
from mail_mail
where state = 'exception'
order by create_date desc
limit 100
);
"
Avoid bulk-resending everything at once if some messages trigger duplicate side effects in downstream systems (ticketing, notification bridges, webhooks).
Step 4 — Verify queue drains and user-facing flows recover
# Re-check queue state trend every few minutes
psql "$ODOO_DB_URI" -c "
select state, count(*)
from mail_mail
where state in ('outgoing','exception')
group by state;
"
# Confirm mail worker/cron keeps running
psql "$ODOO_DB_URI" -c "
select name, nextcall, active
from ir_cron
where name ilike '%mail%'
order by nextcall asc;
"
Recovery is real when outgoing and exception counts decline across consecutive checks and new business events generate successfully delivered messages.
Exit criteria
- Queue depth (
outgoing+exception) is trending down and stabilizes near normal. - No repeating SMTP/auth failure signature in fresh logs.
- At least one test message from each critical flow is confirmed delivered.
- Any temporary mitigations (throttles, reroutes) are documented and either removed or tracked.
Hardening after incident
- Add alerting on oldest unsent
mail_mail.create_dateand exception rate. - Keep SMTP credential rotation procedure in a versioned runbook.
- Define a safe retry policy (batch size + pacing) for large backlogs.
- Rehearse mail outage recovery in staging using synthetic failures.
The key rule: optimize for trustworthy recovery, not fastest possible resend volume.