Do You Need to Tune Postgres Vacuum?

If you’ve been using Postgres for a while, you’ve probably heard someone mention "vacuuming" the database or use the term “bloat.” These both sound choresome and annoying — but they’re just part of life in a healthy database. In modern Postgres versions, autovacuum usually handles these issues for you behind the scenes. But as your database footprint grows, you might start wondering: Is the default setting enough? Do I need to vacuum Postgres manually? or Why is my database suddenly taking up way more disk space than it should?

Let’s dive into why we vacuum, how autovacuum works and when you actually need to step in and tune it.

Why does Postgres need to be vacuumed?

Postgres uses multiversion concurrency control (MVCC) to handle simultaneous transactions. When you update or delete a row, Postgres doesn't actually erase it from the disk immediately. Instead, it marks that row as "deleted" using a hidden transaction ID field, resulting in a “dead tuple.”

These deleted rows waiting for cleanup are called bloat. The Postgres vacuum process comes along every so often to find these dead tuples and makes their space available for reuse. If you don't vacuum, your tables and indexes just keep growing, consuming disk space and degrading performance, even if your actual data volume stays the same.

Beyond just space, vacuuming handles another critical task: preventing transaction ID (XID) wraparound. Postgres uses a 32-bit counter for transactions. If you run through 2 billion transactions without vacuuming to "freeze" old rows, you could be facing a wraparound that will shut Postgres down. Vacuuming marks old tuples as "frozen" so their IDs can be safely reused.

How does autovacuum work?

Postgres has had autovacuum since 2005/8.1, and it has been turned on by default since 2008/8.3. You can turn it off table by table, though in general this is not recommended.

ALTER TABLE very_large_table SET (autovacuum_enabled = false);

It uses a specific formula to decide when a table is "dirty" enough to need a cleanup. By default, it triggers a vacuum when:

Dead Tuples > (Threshold + Scale Factor * Total Tuples)

By default, autovacuum is triggered based on activity thresholds — specifically a base of 50 rows plus a 0.2 (20%) scale factor — meaning that a 1,000-row table is vacuumed after roughly 250 updates or deletes.

Postgres also uses a "safety valve" to prevent transaction ID wraparound: It monitors pg_class.relfrozenxid for every table, and once the age of the oldest transaction exceeds the autovacuum_max_freeze_age (usually 200 million transactions), a mandatory vacuum is forced regardless of how many rows have actually changed.

When to know if you need to tune Postgres autovacuum

You should consider tuning if you see your tables hitting high bloat levels. While bloat below 50% is often acceptable, higher levels may start to impact performance. We generally tell folks not to panic here — some bloat is expected and part of normal operations.

To get a precise view of your bloat, you can use the pgstattuple extension, but it can be resource intensive.

Here is the bloat query we use:

WITH preprettied_table_bloat AS (
    SELECT current_database(), nspname AS schemaname, tblname, idxname, bs*(relpages)::bigint AS real_size,
      bs*(relpages-est_pages)::bigint AS extra_size,
      100 * (relpages-est_pages)::float / relpages AS extra_pct,
      fillfactor,
      CASE WHEN relpages > est_pages_ff
        THEN bs*(relpages-est_pages_ff)
        ELSE 0
      END AS bloat_size,
      100 * (relpages-est_pages_ff)::float / relpages AS bloat_pct,
      is_na
      -- , 100-(pst).avg_leaf_density AS pst_avg_bloat, est_pages, index_tuple_hdr_bm, maxalign, pagehdr, nulldatawidth, nulldatahdrwidth, reltuples, relpages -- (DEBUG INFO)
    FROM (
      SELECT coalesce(1 +
             ceil(reltuples/floor((bs-pageopqdata-pagehdr)/(4+nulldatahdrwidth)::float)), 0 -- ItemIdData size + computed avg size of a tuple (nulldatahdrwidth)
          ) AS est_pages,
          coalesce(1 +
             ceil(reltuples/floor((bs-pageopqdata-pagehdr)*fillfactor/(100*(4+nulldatahdrwidth)::float))), 0
          ) AS est_pages_ff,
          bs, nspname, tblname, idxname, relpages, fillfactor, is_na
          -- , pgstatindex(idxoid) AS pst, index_tuple_hdr_bm, maxalign, pagehdr, nulldatawidth, nulldatahdrwidth, reltuples -- (DEBUG INFO)
      FROM (
          SELECT maxalign, bs, nspname, tblname, idxname, reltuples, relpages, idxoid, fillfactor,
                ( index_tuple_hdr_bm +
                    maxalign - CASE -- Add padding to the index tuple header to align on MAXALIGN
                      WHEN index_tuple_hdr_bm%maxalign = 0 THEN maxalign
                      ELSE index_tuple_hdr_bm%maxalign
                    END
                  + nulldatawidth + maxalign - CASE -- Add padding to the data to align on MAXALIGN
                      WHEN nulldatawidth = 0 THEN 0
                      WHEN nulldatawidth::integer%maxalign = 0 THEN maxalign
                      ELSE nulldatawidth::integer%maxalign
                    END
                )::numeric AS nulldatahdrwidth, pagehdr, pageopqdata, is_na
                -- , index_tuple_hdr_bm, nulldatawidth -- (DEBUG INFO)
          FROM (
              SELECT n.nspname, i.tblname, i.idxname, i.reltuples, i.relpages,
                  i.idxoid, i.fillfactor, current_setting('block_size')::numeric AS bs,
                  CASE -- MAXALIGN: 4 on 32bits, 8 on 64bits (and mingw32 ?)
                    WHEN version() ~ 'mingw32' OR version() ~ '64-bit|x86_64|ppc64|ia64|amd64' THEN 8
                    ELSE 4
                  END AS maxalign,
                  /* per page header, fixed size: 20 for 7.X, 24 for others */
                  24 AS pagehdr,
                  /* per page btree opaque data */
                  16 AS pageopqdata,
                  /* per tuple header: add IndexAttributeBitMapData if some cols are null-able */
                  CASE WHEN max(coalesce(s.null_frac,0)) = 0
                      THEN 8 -- IndexTupleData size
                      ELSE 8 + (( 32 + 8 - 1 ) / 8) -- IndexTupleData size + IndexAttributeBitMapData size ( max num filed per index + 8 - 1 /8)
                  END AS index_tuple_hdr_bm,
                  /* data len: we remove null values save space using it fractionnal part from stats */
                  sum( (1-coalesce(s.null_frac, 0)) * coalesce(s.avg_width, 1024)) AS nulldatawidth,
                  max( CASE WHEN i.atttypid = 'pg_catalog.name'::regtype THEN 1 ELSE 0 END ) > 0 AS is_na
              FROM (
                  SELECT ct.relname AS tblname, ct.relnamespace, ic.idxname, ic.attpos, ic.indkey, ic.indkey[ic.attpos], ic.reltuples, ic.relpages, ic.tbloid, ic.idxoid, ic.fillfactor,
                      coalesce(a1.attnum, a2.attnum) AS attnum, coalesce(a1.attname, a2.attname) AS attname, coalesce(a1.atttypid, a2.atttypid) AS atttypid,
                      CASE WHEN a1.attnum IS NULL
                      THEN ic.idxname
                      ELSE ct.relname
                      END AS attrelname
                  FROM (
                      SELECT idxname, reltuples, relpages, tbloid, idxoid, fillfactor, indkey,
                          pg_catalog.generate_series(1,indnatts) AS attpos
                      FROM (
                          SELECT ci.relname AS idxname, ci.reltuples, ci.relpages, i.indrelid AS tbloid,
                              i.indexrelid AS idxoid,
                              coalesce(substring(
                                  array_to_string(ci.reloptions, ' ')
                                  from 'fillfactor=([0-9]+)')::smallint, 90) AS fillfactor,
                              i.indnatts,
                              pg_catalog.string_to_array(pg_catalog.textin(
                                  pg_catalog.int2vectorout(i.indkey)),' ')::int[] AS indkey
                          FROM pg_catalog.pg_index i
                          JOIN pg_catalog.pg_class ci ON ci.oid = i.indexrelid
                          WHERE ci.relam=(SELECT oid FROM pg_am WHERE amname = 'btree')
                          AND ci.relpages > 0
                      ) AS idx_data
                  ) AS ic
                  JOIN pg_catalog.pg_class ct ON ct.oid = ic.tbloid
                  LEFT JOIN pg_catalog.pg_attribute a1 ON
                      ic.indkey[ic.attpos] <> 0
                      AND a1.attrelid = ic.tbloid
                      AND a1.attnum = ic.indkey[ic.attpos]
                  LEFT JOIN pg_catalog.pg_attribute a2 ON
                      ic.indkey[ic.attpos] = 0
                      AND a2.attrelid = ic.idxoid
                      AND a2.attnum = ic.attpos
                ) i
                JOIN pg_catalog.pg_namespace n ON n.oid = i.relnamespace
                JOIN pg_catalog.pg_stats s ON s.schemaname = n.nspname
                                          AND s.tablename = i.attrelname
                                          AND s.attname = i.attname
                GROUP BY 1,2,3,4,5,6,7,8,9,10,11
          ) AS rows_data_stats
      ) AS rows_hdr_pdg_stats
    ) AS relation_stats
    ),
    
    preprettied_index_bloat AS (
    SELECT current_database(), nspname AS schemaname, tblname, idxname, bs*(relpages)::bigint AS real_size,
      bs*(relpages-est_pages)::bigint AS extra_size,
      100 * (relpages-est_pages)::float / relpages AS extra_pct,
      fillfactor,
      CASE WHEN relpages > est_pages_ff
        THEN bs*(relpages-est_pages_ff)
        ELSE 0
      END AS bloat_size,
      100 * (relpages-est_pages_ff)::float / relpages AS bloat_pct,
      is_na
      -- , 100-(pst).avg_leaf_density AS pst_avg_bloat, est_pages, index_tuple_hdr_bm, maxalign, pagehdr, nulldatawidth, nulldatahdrwidth, reltuples, relpages -- (DEBUG INFO)
    FROM (
      SELECT coalesce(1 +
             ceil(reltuples/floor((bs-pageopqdata-pagehdr)/(4+nulldatahdrwidth)::float)), 0 -- ItemIdData size + computed avg size of a tuple (nulldatahdrwidth)
          ) AS est_pages,
          coalesce(1 +
             ceil(reltuples/floor((bs-pageopqdata-pagehdr)*fillfactor/(100*(4+nulldatahdrwidth)::float))), 0
          ) AS est_pages_ff,
          bs, nspname, tblname, idxname, relpages, fillfactor, is_na
          -- , pgstatindex(idxoid) AS pst, index_tuple_hdr_bm, maxalign, pagehdr, nulldatawidth, nulldatahdrwidth, reltuples -- (DEBUG INFO)
      FROM (
          SELECT maxalign, bs, nspname, tblname, idxname, reltuples, relpages, idxoid, fillfactor,
                ( index_tuple_hdr_bm +
                    maxalign - CASE -- Add padding to the index tuple header to align on MAXALIGN
                      WHEN index_tuple_hdr_bm%maxalign = 0 THEN maxalign
                      ELSE index_tuple_hdr_bm%maxalign
                    END
                  + nulldatawidth + maxalign - CASE -- Add padding to the data to align on MAXALIGN
                      WHEN nulldatawidth = 0 THEN 0
                      WHEN nulldatawidth::integer%maxalign = 0 THEN maxalign
                      ELSE nulldatawidth::integer%maxalign
                    END
                )::numeric AS nulldatahdrwidth, pagehdr, pageopqdata, is_na
                -- , index_tuple_hdr_bm, nulldatawidth -- (DEBUG INFO)
          FROM (
              SELECT n.nspname, i.tblname, i.idxname, i.reltuples, i.relpages,
                  i.idxoid, i.fillfactor, current_setting('block_size')::numeric AS bs,
                  CASE -- MAXALIGN: 4 on 32bits, 8 on 64bits (and mingw32 ?)
                    WHEN version() ~ 'mingw32' OR version() ~ '64-bit|x86_64|ppc64|ia64|amd64' THEN 8
                    ELSE 4
                  END AS maxalign,
                  /* per page header, fixed size: 20 for 7.X, 24 for others */
                  24 AS pagehdr,
                  /* per page btree opaque data */
                  16 AS pageopqdata,
                  /* per tuple header: add IndexAttributeBitMapData if some cols are null-able */
                  CASE WHEN max(coalesce(s.null_frac,0)) = 0
                      THEN 8 -- IndexTupleData size
                      ELSE 8 + (( 32 + 8 - 1 ) / 8) -- IndexTupleData size + IndexAttributeBitMapData size ( max num filed per index + 8 - 1 /8)
                  END AS index_tuple_hdr_bm,
                  /* data len: we remove null values save space using it fractionnal part from stats */
                  sum( (1-coalesce(s.null_frac, 0)) * coalesce(s.avg_width, 1024)) AS nulldatawidth,
                  max( CASE WHEN i.atttypid = 'pg_catalog.name'::regtype THEN 1 ELSE 0 END ) > 0 AS is_na
              FROM (
                  SELECT ct.relname AS tblname, ct.relnamespace, ic.idxname, ic.attpos, ic.indkey, ic.indkey[ic.attpos], ic.reltuples, ic.relpages, ic.tbloid, ic.idxoid, ic.fillfactor,
                      coalesce(a1.attnum, a2.attnum) AS attnum, coalesce(a1.attname, a2.attname) AS attname, coalesce(a1.atttypid, a2.atttypid) AS atttypid,
                      CASE WHEN a1.attnum IS NULL
                      THEN ic.idxname
                      ELSE ct.relname
                      END AS attrelname
                  FROM (
                      SELECT idxname, reltuples, relpages, tbloid, idxoid, fillfactor, indkey,
                          pg_catalog.generate_series(1,indnatts) AS attpos
                      FROM (
                          SELECT ci.relname AS idxname, ci.reltuples, ci.relpages, i.indrelid AS tbloid,
                              i.indexrelid AS idxoid,
                              coalesce(substring(
                                  array_to_string(ci.reloptions, ' ')
                                  from 'fillfactor=([0-9]+)')::smallint, 90) AS fillfactor,
                              i.indnatts,
                              pg_catalog.string_to_array(pg_catalog.textin(
                                  pg_catalog.int2vectorout(i.indkey)),' ')::int[] AS indkey
                          FROM pg_catalog.pg_index i
                          JOIN pg_catalog.pg_class ci ON ci.oid = i.indexrelid
                          WHERE ci.relam=(SELECT oid FROM pg_am WHERE amname = 'btree')
                          AND ci.relpages > 0
                      ) AS idx_data
                  ) AS ic
                  JOIN pg_catalog.pg_class ct ON ct.oid = ic.tbloid
                  LEFT JOIN pg_catalog.pg_attribute a1 ON
                      ic.indkey[ic.attpos] <> 0
                      AND a1.attrelid = ic.tbloid
                      AND a1.attnum = ic.indkey[ic.attpos]
                  LEFT JOIN pg_catalog.pg_attribute a2 ON
                      ic.indkey[ic.attpos] = 0
                      AND a2.attrelid = ic.idxoid
                      AND a2.attnum = ic.attpos
                ) i
                JOIN pg_catalog.pg_namespace n ON n.oid = i.relnamespace
                JOIN pg_catalog.pg_stats s ON s.schemaname = n.nspname
                                          AND s.tablename = i.attrelname
                                          AND s.attname = i.attname
                GROUP BY 1,2,3,4,5,6,7,8,9,10,11
          ) AS rows_data_stats
      ) AS rows_hdr_pdg_stats
    ) AS relation_stats
    )
    
    SELECT schemaname "Schema",
           tblname "Table",
           ' ' "Index",
           pg_size_pretty(real_size::bigint) "Active size",
           pg_size_pretty(bloat_size::bigint) "Bloat size",
           CASE WHEN bloat_pct <= 0 THEN '0%' ELSE to_char(bloat_pct, 'fm999D99%') END "Percentage Bloat"
    FROM preprettied_table_bloat 
    WHERE real_size > 1073741824
      AND schemaname NOT IN ('pg_catalog', 'information_schema')
      AND bloat_pct >= 10
    
    UNION
    SELECT schemaname "Schema", 
           tblname "Table", 
           idxname "Index",
           pg_size_pretty(real_size) "Active size",
           pg_size_pretty(bloat_size::bigint) "Bloat size",
           CASE WHEN bloat_pct <= 0 THEN '0%' ELSE to_char(bloat_pct, 'fm999D99%') END "Percentage Bloat"
    FROM preprettied_index_bloat
    WHERE real_size > 1073741824
      AND schemaname NOT IN ('pg_catalog', 'information_schema')
      AND bloat_pct >= 10
    ORDER BY "Percentage Bloat" desc nulls last;

For a quicker check, you can use a query to estimate how many dead rows are waiting for the "robot" to clean them up.

Use this query to see which tables are nearing their autovacuum thresholds:

SELECT 
    relname AS table_name,
    n_dead_tup AS dead_rows,
    last_autovacuum,
    last_vacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

Beyond the numbers, look for these signs:

Unusually high disk usage: Disk usage is higher than normal compared to row and table size
Slow queries: Performance is dipping and bloat is high
Stuck cleanup: If oldest xmin in your logs stays the same, something (like a long-running transaction) is blocking the vacuum from doing its job
IO contention: If you notice sudden IO pressure, you might need to adjust your cost-based delay settings.

What can you tune with autovacuum?

The defaults are usually fine for small tables, but they can be a bit sluggish as you scale. If you have a table with 100 million rows, waiting for 20 million dead rows (20%) before cleaning up can be too long for many workloads. A more typical starting point is tuning vacuum settings around ~2 million dead rows.

Here are the levers you can pull.

Scale factors

You can lower the autovacuum_vacuum_scale_factor so that vacuuming happens more frequently. In practice, many teams aim for a vacuum around every 2 million deleted or updated rows.

-- Example: Setting a scale factor for a 20M row table to trigger at 2M dead rows
ALTER TABLE your_table_name SET (autovacuum_vacuum_scale_factor = 0.1);

Cost-based throttling to slow down the vacuum

If a vacuum job is hogging all your disk I/O, you can force it to "pause" more frequently by increasing the delay or decreasing the cost limit. In Postgres, manual vacuum settings usually default to no delay, but autovacuum defaults to a 2 ms delay. You can use autovacuum_vacuum_cost_delay and autovacuum_vacuum_cost_limit to slow down the process so it doesn't starve your application of resources.

Use this to increase the delay to 5 ms for a specific table, which effectively slows it down 2.5x compared to the default:

-- Slows down vacuuming for this specific table to reduce I/O impact
ALTER TABLE your_table_name 
SET (autovacuum_vacuum_cost_delay = '5ms');

Alternatively, you can adjust the cost limit. Decreasing this number makes the vacuum "hit the limit" sooner, resulting in more frequent pauses:

-- Vacuum pauses more often because the "cost bucket" fills up faster
ALTER TABLE your_table_name 
SET (autovacuum_vacuum_cost_limit = 100);

Freeze max age to manage wraparound risk

When a table's oldest transaction ID (XID) exceeds a certain age, Postgres forces an "aggressive" vacuum that visits every page that’s not marked as frozen. While the default is 200 million, increasing this for very busy tables can prevent these heavy I/O operations from happening during peak times.

To increase this threshold to 400 million or 500 million transactions, do the following:

- Gives the table more time before a forced "anti-wraparound" vacuum occurs
ALTER TABLE your_table_name 
SET (autovacuum_freeze_max_age = 400000000);

Summary

For many use cases, the default autovacuum settings are sufficient, though they may need adjustment as workloads scale.

But as you scale, knowing which knobs to turn can save you from performance hits or disk space panics.

In short, if you have a few million dead rows every time you run a bloat check, it’s likely time to look at tuning autovacuum. And if you see bloat always above 50%, it might be worth checking and tuning.