Postgres compresses large variable-length columns using TOAST:
text, varchar, bytea, jsonb, xml, numeric, etc.TOAST will be applied when a row is too big to fit in an 8kB page; large values are stored in a separate TOAST table, optionally compressed.
There is no built-in heap/row compression for normal fixed-width columns (e.g. integer, bigint) in vanilla Postgres.
From PostgreSQL 14 onward you can choose compression algorithm per column: pglz (classic) or lz4 (faster, generally preferred).
Here is how I did it in my test environment:
First, ensure your PostgreSQL server was compiled with LZ4 support (use pg_config --configure and look for --with-lz4)
pg_config --configure | grep lz4It will show you a long list of options that was used when PostgreSQL was built. Look for '--with-lz4'
Set compression globally:
show default_toast_compression; default_toast_compression --------------------------- pglz (1 row) postgres=# ALTER SYSTEM SET default_toast_compression = 'lz4'; ALTER SYSTEM postgres=# SELECT pg_reload_conf(); pg_reload_conf ---------------- t (1 row) postgres=# show default_toast_compression; default_toast_compression --------------------------- lz4Optional: default to LZ4 for this session
SET default_toast_compression = 'lz4';Yet another option is to set LZ4 for a specific database:
ALTER DATABASE mydb SET default_toast_compression = 'lz4';Create the table:
CREATE TABLE app_logs (
log_id bigserial PRIMARY KEY,
log_time timestamptz NOT NULL,
level text NOT NULL,
message text COMPRESSION lz4,
details jsonb COMPRESSION lz4
);
Note:You can switch existing columns to LZ4 (or back to pglz):
ALTER TABLE app_logs
ALTER COLUMN message SET COMPRESSION lz4,
ALTER COLUMN details SET COMPRESSION lz4;
Note that an ALTER TABLE only changes the future TOAST entries. To actually recompress existing rows you need to cause a rewrite. Common options:
-- 1) Table rewrite (heavy, but clean) ALTER TABLE app_logs SET (toast_tuple_target = 2040); -- optional tweak VACUUM FULL app_logs; -- or 2) Cluster on some index (also rewrites) CLUSTER app_logs USING app_logs_pkey; ANALYZE app_logs;Any bulk rewrite (incl. CREATE TABLE AS ..., INSERT INTO new SELECT ... FROM old) will store new TOAST values using the new compression method.
Check that the table is using column compression for TOAST values:
-- Main table vs TOAST table sizes
SELECT
relname,
pg_size_pretty(pg_relation_size(oid)) AS heap_size,
pg_size_pretty(pg_total_relation_size(oid)) AS total_with_indexes_toast
FROM pg_class
WHERE relname IN ('app_logs2','app_logs3','app_logs4');
-- Look at TOAST table directly
SELECT
c1.relname AS main_table,
c2.relname AS toast_table,
pg_size_pretty(pg_total_relation_size(c2.oid)) AS toast_total
FROM pg_class c1
JOIN pg_class c2 ON c1.reltoastrelid = c2.oid
WHERE c1.relname IN ('app_logs2','app_logs3','app_logs4');
In a simple test, I created three tables with three different compression directives and created one long value that would make sure it was TOASTED:
CREATE TABLE app_logs2 (
log_id bigserial PRIMARY KEY,
log_time timestamptz NOT NULL,
level text NOT NULL,
message text,
details jsonb
);
CREATE TABLE app_logs3 (
log_id bigserial PRIMARY KEY,
log_time timestamptz NOT NULL,
level text NOT NULL,
message text COMPRESSION lz4,
details jsonb COMPRESSION lz4
);
CREATE TABLE app_logs4 (
log_id bigserial PRIMARY KEY,
log_time timestamptz NOT NULL,
level text NOT NULL,
message text COMPRESSION pglz,
details jsonb COMPRESSION pglz
);
INSERT INTO app_logs2 (log_time, level, message, details)
VALUES (
now(),
'INFO',
repeat('x', 100000), -- make it large enough to be TOASTed
jsonb_build_object('k', repeat('y', 100000))
);
INSERT INTO app_logs3 (log_time, level, message, details)
VALUES (
now(),
'INFO',
repeat('x', 100000), -- make it large enough to be TOASTed
jsonb_build_object('k', repeat('y', 100000))
);
INSERT INTO app_logs4 (log_time, level, message, details)
VALUES (
now(),
'INFO',
repeat('x', 100000), -- make it large enough to be TOASTed
jsonb_build_object('k', repeat('y', 100000))
);
As expected, the app_logs2 defaulted to lz4 (set globally):
SELECT
relname,
pg_size_pretty(pg_relation_size(oid)) AS heap_size,
pg_size_pretty(pg_total_relation_size(oid)) AS total_with_indexes_toast
FROM pg_class
WHERE relname IN ('app_logs2','app_logs3','app_logs4');
-- Look at TOAST table directly
SELECT
c1.relname AS main_table,
c2.relname AS toast_table,
pg_size_pretty(pg_total_relation_size(c2.oid)) AS toast_total
FROM pg_class c1
JOIN pg_class c2 ON c1.reltoastrelid = c2.oid
WHERE c1.relname IN ('app_logs2','app_logs3','app_logs4');
relname | heap_size | total_with_indexes_toast
-----------+------------+--------------------------
app_logs2 | 8192 bytes | 32 kB
app_logs3 | 8192 bytes | 32 kB
app_logs4 | 8192 bytes | 48 kB
(3 rows)
main_table | toast_table | toast_total
------------+------------------+-------------
app_logs2 | pg_toast_2510179 | 8192 bytes
app_logs3 | pg_toast_2510188 | 8192 bytes
app_logs4 | pg_toast_2510197 | 24 kB
(3 rows)
Remember, per-column compression via default_toast_compression doesn not show up in \d+ unless it was explicitly set in the column definition.So even if table app_logs2 uses compression for TOASTable columns, it does not reveal this fact when being described:
CREATE TABLE app_logs2 (
log_id bigserial PRIMARY KEY,
log_time timestamptz NOT NULL,
level text NOT NULL,
message text,
details jsonb
);
\d+ app_logs2
Table "myschema.app_logs2"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
----------+--------------------------+-----------+----------+-------------------------------------------+----------+-------------+--------------+-------------
log_id | bigint | | not null | nextval('app_logs2_log_id_seq'::regclass) | plain | | |
log_time | timestamp with time zone | | not null | | plain | | |
level | text | | not null | | extended | | |
message | text | | | | extended | | |
details | jsonb | | | | extended | | |
No comments:
Post a Comment