Thursday, January 29, 2026

Compression for Postgres tables

In PostgreSQL you don’t create a “compressed table” in the Oracle sense. Rather, compression is achieved per-column compression via TOAST, plus (if desirable) a few extension/filesystem tricks.

Postgres compresses large variable-length columns using TOAST:
text, varchar, bytea, jsonb, xml, numeric, etc.
TOAST will be applied when a row is too big to fit in an 8kB page; large values are stored in a separate TOAST table, optionally compressed.

There is no built-in heap/row compression for normal fixed-width columns (e.g. integer, bigint) in vanilla Postgres.

From PostgreSQL 14 onward you can choose compression algorithm per column: pglz (classic) or lz4 (faster, generally preferred).

Here is how I did it in my test environment:

First, ensure your PostgreSQL server was compiled with LZ4 support (use pg_config --configure and look for --with-lz4)
pg_config --configure | grep lz4
It will show you a long list of options that was used when PostgreSQL was built. Look for '--with-lz4'

Set compression globally:
show default_toast_compression;
 default_toast_compression
---------------------------
 pglz
(1 row)

postgres=# ALTER SYSTEM SET default_toast_compression = 'lz4';
ALTER SYSTEM
postgres=# SELECT pg_reload_conf();
 pg_reload_conf
----------------
 t
(1 row)

postgres=# show default_toast_compression;
 default_toast_compression
---------------------------
 lz4
Optional: default to LZ4 for this session
  SET default_toast_compression = 'lz4';
Yet another option is to set LZ4 for a specific database:
 ALTER DATABASE mydb SET default_toast_compression = 'lz4';
Create the table:
CREATE TABLE app_logs (
    log_id      bigserial PRIMARY KEY,
    log_time    timestamptz NOT NULL,
    level       text        NOT NULL,
    message     text        COMPRESSION lz4,
    details     jsonb       COMPRESSION lz4
);
Note:
  • COMPRESSION lz4 / COMPRESSION pglz is a column option.
  • Only matters for TOAST-able types; it won’t change anything for integer, date, etc.
  • Compression only happens when the row gets large enough for TOAST to kick in (roughly when row > ~2kB).

    You can switch existing columns to LZ4 (or back to pglz):
     ALTER TABLE app_logs
        ALTER COLUMN message SET COMPRESSION lz4,
        ALTER COLUMN details SET COMPRESSION lz4;
    
     
    Note that an ALTER TABLE only changes the future TOAST entries. To actually recompress existing rows you need to cause a rewrite. Common options:
    -- 1) Table rewrite (heavy, but clean)
    ALTER TABLE app_logs SET (toast_tuple_target = 2040);  -- optional tweak
    VACUUM FULL app_logs;
    
    -- or 2) Cluster on some index (also rewrites)
    CLUSTER app_logs USING app_logs_pkey;
    ANALYZE app_logs;
    
    Any bulk rewrite (incl. CREATE TABLE AS ..., INSERT INTO new SELECT ... FROM old) will store new TOAST values using the new compression method.


  • Check that the table is using column compression for TOAST values:

    -- Main table vs TOAST table sizes
    SELECT
        relname,
        pg_size_pretty(pg_relation_size(oid))          AS heap_size,
        pg_size_pretty(pg_total_relation_size(oid))    AS total_with_indexes_toast
    FROM pg_class
    WHERE relname IN ('app_logs2','app_logs3','app_logs4');
    
    -- Look at TOAST table directly
    SELECT
        c1.relname       AS main_table,
        c2.relname       AS toast_table,
        pg_size_pretty(pg_total_relation_size(c2.oid)) AS toast_total
    FROM pg_class c1
    JOIN pg_class c2 ON c1.reltoastrelid = c2.oid
    WHERE c1.relname IN ('app_logs2','app_logs3','app_logs4');
    
    In a simple test, I created three tables with three different compression directives and created one long value that would make sure it was TOASTED:
    CREATE TABLE app_logs2 (
        log_id      bigserial PRIMARY KEY,
        log_time    timestamptz NOT NULL,
        level       text        NOT NULL,
        message     text,
        details     jsonb
    );
    
    CREATE TABLE app_logs3 (
        log_id      bigserial PRIMARY KEY,
        log_time    timestamptz NOT NULL,
        level       text        NOT NULL,
        message     text        COMPRESSION lz4,
        details     jsonb       COMPRESSION lz4
    );
    
    CREATE TABLE app_logs4 (
        log_id      bigserial PRIMARY KEY,
        log_time    timestamptz NOT NULL,
        level       text        NOT NULL,
        message     text        COMPRESSION pglz,
        details     jsonb       COMPRESSION pglz
    );
    
    INSERT INTO app_logs2 (log_time, level, message, details)
    VALUES (
        now(),
        'INFO',
        repeat('x', 100000),                -- make it large enough to be TOASTed
        jsonb_build_object('k', repeat('y', 100000))
    );
    
    INSERT INTO app_logs3 (log_time, level, message, details)
    VALUES (
        now(),
        'INFO',
        repeat('x', 100000),                -- make it large enough to be TOASTed
        jsonb_build_object('k', repeat('y', 100000))
    );
    
    INSERT INTO app_logs4 (log_time, level, message, details)
    VALUES (
        now(),
        'INFO',
        repeat('x', 100000),                -- make it large enough to be TOASTed
        jsonb_build_object('k', repeat('y', 100000))
    );
    
    
    As expected, the app_logs2 defaulted to lz4 (set globally):
    SELECT
        relname,
        pg_size_pretty(pg_relation_size(oid))          AS heap_size,
        pg_size_pretty(pg_total_relation_size(oid))    AS total_with_indexes_toast
    FROM pg_class
    WHERE relname IN ('app_logs2','app_logs3','app_logs4');
    
    -- Look at TOAST table directly
    SELECT
        c1.relname       AS main_table,
        c2.relname       AS toast_table,
        pg_size_pretty(pg_total_relation_size(c2.oid)) AS toast_total
    FROM pg_class c1
    JOIN pg_class c2 ON c1.reltoastrelid = c2.oid
    WHERE c1.relname IN ('app_logs2','app_logs3','app_logs4');
    
     relname  | heap_size  | total_with_indexes_toast
    -----------+------------+--------------------------
     app_logs2 | 8192 bytes | 32 kB
     app_logs3 | 8192 bytes | 32 kB
     app_logs4 | 8192 bytes | 48 kB
    (3 rows)
    
     main_table |   toast_table    | toast_total
    ------------+------------------+-------------
     app_logs2  | pg_toast_2510179 | 8192 bytes
     app_logs3  | pg_toast_2510188 | 8192 bytes
     app_logs4  | pg_toast_2510197 | 24 kB
    (3 rows)
    
    Remember, per-column compression via default_toast_compression doesn not show up in \d+ unless it was explicitly set in the column definition.

    So even if table app_logs2 uses compression for TOASTable columns, it does not reveal this fact when being described:
    CREATE TABLE app_logs2 (
        log_id      bigserial PRIMARY KEY,
        log_time    timestamptz NOT NULL,
        level       text        NOT NULL,
        message     text,
        details     jsonb
    );
    
     \d+ app_logs2
                                                                     Table "myschema.app_logs2"
      Column  |           Type           | Collation | Nullable |                  Default                  | Storage  | Compression | Stats target | Description
    ----------+--------------------------+-----------+----------+-------------------------------------------+----------+-------------+--------------+-------------
     log_id   | bigint                   |           | not null | nextval('app_logs2_log_id_seq'::regclass) | plain    |             |              |
     log_time | timestamp with time zone |           | not null |                                           | plain    |             |              |
     level    | text                     |           | not null |                                           | extended |             |              |
     message  | text                     |           |          |                                           | extended |             |              |
     details  | jsonb                    |           |          |                                           | extended |             |              |
     

    No comments:

    Post a Comment