I previously blogged about ensuring that the “ON CONFLICT” directive is used in order to avoid vacuum from having to do additional work. I also later demonstrated the characteristics of how the use of the MERGE statement will accomplish the same thing.
Now in another recent customer case, I was chasing down why the application was invoking 10s of thousands of Foreign Key and Constraint violations per day and I began to wonder, if these kinds of errors also caused additional vacuum as described in those previous blogs. Sure enough it DEPENDS.
Let’s set up a quick test to demonstrate:
/* Create related tables: */
CREATE TABLE public.uuid_product_value (
id int PRIMARY KEY,
pkid text,
value numeric,
product_id int,
effective_date timestamp(3)
);
CREATE TABLE public.uuid_product (
product_id int PRIMARY KEY
);
ALTER TABLE uuid_product_value
ADD CONSTRAINT uuid_product_value_product_id_fk
FOREIGN KEY (product_id)
REFERENCES uuid_product (product_id) ON DELETE CASCADE;
/* Insert some mocked up data */
INSERT INTO public.uuid_product VALUES (
generate_series(0,200));
INSERT INTO public.uuid_product_value VALUES (
generate_series(0,10000),
gen_random_uuid()::text,
random()*1000,
ROUND(random()*100),
current_timestamp(3));
/* Vacuum Analyze Both tables */
VACUUM (VERBOSE, ANALYZE) uuid_product;
VACUUM (VERBOSE, ANALYZE) uuid_product_value;
/* Verify that there are no dead tuples: */
SELECT
schemaname,
relname,
n_live_tup,
n_dead_tup
FROM
pg_stat_all_tables
WHERE
relname in ('uuid_product_value', 'uuid_product');
schemaname | relname | n_live_tup | n_dead_tup
------------+--------------------+------------+------------
public | uuid_product_value | 10001 | 0
public | uuid_product | 201 | 0
Then, let’s issue a simple insert that will violate the FK and check to see if dead tuples were generated:
/* Insert a row that violates the FK, without the ON CONFLICT directive */
INSERT INTO public.uuid_product_value VALUES (
generate_series(10001,10001),
gen_random_uuid()::text,
random()*1000,
202, /* we know this product_id doesn't exist in the parent */
current_timestamp(3));
ERROR: insert or update on table "uuid_product_value" violates foreign key constraint "uuid_mod_test_product_id_fk"
DETAIL: Key (product_id)=(202) is not present in table "uuid_product".
Time: 3.065 ms
And now check the tuple stats:
SELECT
schemaname,
relname,
n_live_tup,
n_dead_tup
FROM
pg_stat_all_tables
WHERE
relname in ('uuid_product_value', 'uuid_product');
schemaname | relname | n_live_tup | n_dead_tup
------------+--------------------+------------+------------
public | uuid_product_value | 10001 | 1
public | uuid_product | 201 | 0
Sure enough, we now have a dead row as a result of the FK violation on the insert. But, will an “ON CONFLICT” directive help us in this scenario like in the others?
/* Insert a row that violates the FK, but with the ON CONFLICT directive */
INSERT INTO public.uuid_product_value VALUES (
generate_series(10001,10001),
gen_random_uuid()::text,
random()*1000,
202, /* we know this product_id doesn't exist in the parent */
current_timestamp(3)) ON CONFLICT DO NOTHING;
/* Verify the tuple stats: */
SELECT
schemaname,
relname,
n_live_tup,
n_dead_tup
FROM
pg_stat_all_tables
WHERE
relname in ('uuid_product_value', 'uuid_product');
schemaname | relname | n_live_tup | n_dead_tup
------------+--------------------+------------+------------
public | uuid_product_value | 10001 | 2
public | uuid_product | 201 | 0
Unfortunately, it does not solve this problem. So we need to really be cognizant of FK violations and its effect on vacuum. Now what about trying to insert a NULL into a NOT NULL column? Will that result in a dead row? Let’s check.
/* Alter a column to NOT NULL */
ALTER TABLE public.uuid_product_value
ALTER COLUMN pkid SET NOT NULL;
/* Check the table definition */
Table "public.uuid_product_value"
Column | Type | Collation | Nullable | Default
----------------+--------------------------------+-----------+----------+---------
id | integer | | not null |
pkid | text | | not null |
value | numeric | | |
product_id | integer | | |
effective_date | timestamp(3) without time zone | | |
Indexes:
"uuid_mod_test_pkey" PRIMARY KEY, btree (id)
"uuid_mod_test_product_id_idx" btree (product_id) WHERE id >= 1 AND id <= 1000
"uuid_mod_test_product_id_idx1" hash (product_id)
Foreign-key constraints:
"uuid_mod_test_product_id_fk" FOREIGN KEY (product_id) REFERENCES uuid_product(product_id) ON DELETE CASCADE
/* Insert a row that violates the NOT NULL constraint */
INSERT INTO public.uuid_product_value VALUES (
generate_series(10001,10001),
NULL,
random()*1000,
200,
current_timestamp(3));
ERROR: null value in column "pkid" of relation "uuid_product_value" violates not-null constraint
DETAIL: Failing row contains (10001, null, 613.162063338205, 200, 2026-03-13 14:25:28.758).
/* Verify the tuple stats: */
SELECT
schemaname,
relname,
n_live_tup,
n_dead_tup
FROM
pg_stat_all_tables
WHERE
relname in ('uuid_product_value', 'uuid_product');
schemaname | relname | n_live_tup | n_dead_tup
------------+--------------------+------------+------------
public | uuid_product_value | 10001 | 2
public | uuid_product | 201 | 0
As you can see, a violation of the NOT NULL constraint does not have the same behavior as a violation of the FK constraint. It’s always good to know and relay to the application development staff what operations are going to result in more work for the database and adjust the code accordingly. Enjoy!
The short answer is always “maybe”. However, in the following post, I hope to demonstrate what creates a sub-transactions and what happens to the overall transaction id utilization when they are invoked. I will also show how performance is affected when there are lots of connections creating and consuming sub-transactions.
First, it is important to understand what statements will utilize a transaction id and which ones may be more critical (expensive) than others:
Calling Nested Procedures: 🟢 Free. No new XIDs are used. They share the parent’s transaction.
BEGIN…END (No Exception Block): 🟢 Free. Just organization.
COMMIT: 🟡 Expensive. Burns a main Transaction ID (finite resource, leads to Vacuum Freeze).
EXCEPTION: 🔴 Dangerous. Creates a subtransaction (performance killer).
So why is misplaced EXCEPTION logic possibly a performance killer? PostgreSQL is optimized to handle a small number of open subtransactions very efficiently. Each backend process has a fixed-size array in shared memory (part of the PGPROC structure) that can hold up to 64 open subtransaction IDs (XIDs) for the current top-level transaction. As long as your nesting depth stays below 64, PostgreSQL manages everything in this fast, local memory array. It does not need to use the Subtrans SLRU (Simple Least Recently Used) subsystem, which is what pg_stat_slru tracks. The problem is that the utilization of subtransaction IDs (XIDs) can get out of hand rather quickly if you are not paying attention to your application flow and once PostgreSQL runs out of fast RAM slots (PGPROC array) and spills tracking data to the slow SLRU cache (pg_subtrans), performance degrades non-linearly (often 50x–100x slower) and causes global locking contention that can freeze other users.
One of the other byproducts of utilizing too many sub-transactions is the additional WAL that will be generated. All of these can be demonstrated by a simple block of code:
------------------------------------------------------------------
-- 1. PREP: Create the helper function
------------------------------------------------------------------
CREATE OR REPLACE FUNCTION generate_subtrans_load(p_id int, depth int)
RETURNS void AS $$
BEGIN
IF depth > 0 THEN
BEGIN
PERFORM generate_subtrans_load(p_id, depth - 1);
EXCEPTION WHEN OTHERS THEN NULL;
END;
ELSE
INSERT INTO penalty_test VALUES (p_id, 'Deep Stack');
END IF;
END;
$$ LANGUAGE plpgsql;
------------------------------------------------------------------
-- 2. THE COMPLETE TEST SUITE
------------------------------------------------------------------
DO $$
DECLARE
-- Timing Variables
v_start_ts timestamp;
v_t_base interval; v_t_single interval;
v_t_48 interval; v_t_128 interval;
-- Row Verification
v_rows_base int; v_rows_single int;
v_rows_48 int; v_rows_128 int;
-- WAL Variables
v_start_lsn pg_lsn; v_end_lsn pg_lsn;
v_wal_base numeric; v_wal_single numeric;
v_wal_48 numeric; v_wal_128 numeric;
-- XID Variables
v_start_xid xid8; v_end_xid xid8;
v_xid_base bigint; v_xid_single bigint;
v_xid_48 bigint; v_xid_128 bigint;
i int;
v_target_rows int := 5000;
-- Helper calc vars
v_us_48 numeric;
v_us_128 numeric;
BEGIN
---------------------------------------------------
-- A. SETUP
---------------------------------------------------
DROP TABLE IF EXISTS penalty_test;
-- Explicitly preserve rows so COMMIT doesn't empty the table
CREATE TEMP TABLE penalty_test (id int, val text) ON COMMIT PRESERVE ROWS;
---------------------------------------------------
-- B. TEST 1: BASELINE (Standard Loop)
---------------------------------------------------
-- 1. Force fresh start
COMMIT;
v_start_ts := clock_timestamp();
v_start_lsn := pg_current_wal_insert_lsn();
v_start_xid := pg_snapshot_xmax(pg_current_snapshot());
FOR i IN 1..v_target_rows LOOP
INSERT INTO penalty_test VALUES (i, 'Standard');
END LOOP;
-- 2. Force snapshot refresh to see XID consumption
COMMIT;
v_end_lsn := pg_current_wal_insert_lsn();
v_end_xid := pg_snapshot_xmax(pg_current_snapshot());
v_t_base := clock_timestamp() - v_start_ts;
SELECT count(*) INTO v_rows_base FROM penalty_test;
v_wal_base := v_end_lsn - v_start_lsn;
v_xid_base := (v_end_xid::text::bigint - v_start_xid::text::bigint);
---------------------------------------------------
-- C. TEST 2: SINGLE TRAP (Depth 1)
---------------------------------------------------
TRUNCATE TABLE penalty_test;
COMMIT; -- Clear stats
v_start_ts := clock_timestamp();
v_start_lsn := pg_current_wal_insert_lsn();
v_start_xid := pg_snapshot_xmax(pg_current_snapshot());
FOR i IN 1..v_target_rows LOOP
BEGIN
INSERT INTO penalty_test VALUES (i, 'Single Trap');
EXCEPTION WHEN OTHERS THEN NULL;
END;
END LOOP;
COMMIT; -- Force refresh
v_end_lsn := pg_current_wal_insert_lsn();
v_end_xid := pg_snapshot_xmax(pg_current_snapshot());
v_t_single := clock_timestamp() - v_start_ts;
SELECT count(*) INTO v_rows_single FROM penalty_test;
v_wal_single := v_end_lsn - v_start_lsn;
v_xid_single := (v_end_xid::text::bigint - v_start_xid::text::bigint);
---------------------------------------------------
-- D. TEST 3: SAFE ZONE (Depth 48)
---------------------------------------------------
TRUNCATE TABLE penalty_test;
COMMIT;
v_start_ts := clock_timestamp();
v_start_lsn := pg_current_wal_insert_lsn();
v_start_xid := pg_snapshot_xmax(pg_current_snapshot());
FOR i IN 1..v_target_rows LOOP
PERFORM generate_subtrans_load(i, 48);
END LOOP;
COMMIT;
v_end_lsn := pg_current_wal_insert_lsn();
v_end_xid := pg_snapshot_xmax(pg_current_snapshot());
v_t_48 := clock_timestamp() - v_start_ts;
SELECT count(*) INTO v_rows_48 FROM penalty_test;
v_wal_48 := v_end_lsn - v_start_lsn;
v_xid_48 := (v_end_xid::text::bigint - v_start_xid::text::bigint);
---------------------------------------------------
-- E. TEST 4: OVERFLOW ZONE (Depth 128)
---------------------------------------------------
TRUNCATE TABLE penalty_test;
COMMIT;
v_start_ts := clock_timestamp();
v_start_lsn := pg_current_wal_insert_lsn();
v_start_xid := pg_snapshot_xmax(pg_current_snapshot());
FOR i IN 1..v_target_rows LOOP
PERFORM generate_subtrans_load(i, 128);
END LOOP;
COMMIT;
v_end_lsn := pg_current_wal_insert_lsn();
v_end_xid := pg_snapshot_xmax(pg_current_snapshot());
v_t_128 := clock_timestamp() - v_start_ts;
SELECT count(*) INTO v_rows_128 FROM penalty_test;
v_wal_128 := v_end_lsn - v_start_lsn;
v_xid_128 := (v_end_xid::text::bigint - v_start_xid::text::bigint);
---------------------------------------------------
-- F. THE REPORT
---------------------------------------------------
RAISE NOTICE '===================================================';
RAISE NOTICE ' POSTGRESQL SUBTRANSACTION IMPACT REPORT ';
RAISE NOTICE '===================================================';
RAISE NOTICE 'Target Rows: %', v_target_rows;
RAISE NOTICE '---------------------------------------------------';
RAISE NOTICE 'METRIC 1: EXECUTION TIME & VERIFICATION';
RAISE NOTICE ' 1. Baseline: % (Rows: %)', v_t_base, v_rows_base;
RAISE NOTICE ' 2. Single Exception: % (Rows: %)', v_t_single, v_rows_single;
RAISE NOTICE ' 3. Safe Zone (48): % (Rows: %)', v_t_48, v_rows_48;
RAISE NOTICE ' 4. Overflow (128): % (Rows: %)', v_t_128, v_rows_128;
RAISE NOTICE '---------------------------------------------------';
v_us_48 := extract(epoch from v_t_48) * 1000000;
v_us_128 := extract(epoch from v_t_128) * 1000000;
RAISE NOTICE 'METRIC 2: AVERAGE COST PER SUBTRANSACTION';
RAISE NOTICE ' - Safe Zone (48): % us per subtrans', round(v_us_48 / (v_target_rows * 48), 2);
RAISE NOTICE ' - Overflow Zone (128): % us per subtrans', round(v_us_128 / (v_target_rows * 128), 2);
IF (v_us_128 / (v_target_rows * 128)) > (v_us_48 / (v_target_rows * 48)) THEN
RAISE NOTICE ' -> RESULT: Overflow subtransactions were % %% slower per unit.',
round( ( ((v_us_128 / (v_target_rows * 128)) - (v_us_48 / (v_target_rows * 48))) / (v_us_48 / (v_target_rows * 48)) * 100)::numeric, 1);
ELSE
RAISE NOTICE ' -> RESULT: Overhead appears linear.';
END IF;
RAISE NOTICE '---------------------------------------------------';
RAISE NOTICE 'METRIC 3: DISK USAGE (WAL WRITTEN)';
RAISE NOTICE ' 1. Baseline: % bytes', v_wal_base;
RAISE NOTICE ' 2. Single Exception: % bytes', v_wal_single;
RAISE NOTICE ' 3. Safe Zone (48): % bytes', v_wal_48;
RAISE NOTICE ' 4. Overflow (128): % bytes', v_wal_128;
RAISE NOTICE '---------------------------------------------------';
RAISE NOTICE 'METRIC 4: TRANSACTION ID CONSUMPTION';
RAISE NOTICE ' 1. Baseline: % XIDs', v_xid_base;
RAISE NOTICE ' 2. Single Exception: % XIDs', v_xid_single;
RAISE NOTICE ' 3. Safe Zone (48): % XIDs', v_xid_48;
RAISE NOTICE ' 4. Overflow (128): % XIDs', v_xid_128;
RAISE NOTICE '===================================================';
END $$;
The code block above shows 4 different scenarios:
Baseline: Simple insert that inserts a number of rows in a loop with no exception logic
Single Exception (Depth 1): The same insert loop but with an exception block that fires after every insert
48 Sub-transaction Loop (Depth 48): A function call that arbitrarily creates 48 subtransactions (well below the 64 limit) and then completes the same insert
128 Sub-transaction Loop (Depth 128): A function call that arbitrarily creates 128 subtransactions (well above the 64 limit) and then completes the same insert
What you will see is that when run by a single user the impact is not terribly great. Besides utilizing more transaction ids (XIDs) and generating much wal, the system and performance impact of this does not appear to be a big deal. When running for a 5000 row insert:
NOTICE: ===================================================
NOTICE: POSTGRESQL SUBTRANSACTION IMPACT REPORT
NOTICE: ===================================================
NOTICE: Target Rows: 5000
NOTICE: ---------------------------------------------------
NOTICE: METRIC 1: EXECUTION TIME & VERIFICATION
NOTICE: 1. Baseline: 00:00:00.020292 (Rows: 5000)
NOTICE: 2. Single Exception: 00:00:00.033419 (Rows: 5000)
NOTICE: 3. Safe Zone (48): 00:00:01.414132 (Rows: 5000)
NOTICE: 4. Overflow (128): 00:00:03.817267 (Rows: 5000)
NOTICE: ---------------------------------------------------
NOTICE: METRIC 2: AVERAGE COST PER SUBTRANSACTION
NOTICE: - Safe Zone (48): 5.89 us per subtrans
NOTICE: - Overflow Zone (128): 5.96 us per subtrans
NOTICE: -> RESULT: Overflow subtransactions were 1.2 % slower per unit.
NOTICE: ---------------------------------------------------
NOTICE: METRIC 3: DISK USAGE (WAL WRITTEN)
NOTICE: 1. Baseline: 48 bytes
NOTICE: 2. Single Exception: 43936 bytes
NOTICE: 3. Safe Zone (48): 2076216 bytes
NOTICE: 4. Overflow (128): 5536832 bytes
NOTICE: ---------------------------------------------------
NOTICE: METRIC 4: TRANSACTION ID CONSUMPTION
NOTICE: 1. Baseline: 1 XIDs
NOTICE: 2. Single Exception: 5001 XIDs
NOTICE: 3. Safe Zone (48): 240001 XIDs
NOTICE: 4. Overflow (128): 640002 XIDs
NOTICE: ===================================================
DO
Time: 5294.689 ms (00:05.295)
As you can see in the results above, due to the transaction tracking, each scenario burns more XIDs and creates more WAL. This is due to the additional transaction tracking that must be available should the WAL need to be replayed. Where the real killer comes in is when multiple connections all begin to execute this same code. When I did a synthetic test using Apache JMeter the results are astounding. The test was completed on a 4vCPU C4A AlloyDB Instance:
Test
Average Time Per Exec
Min Exec Time
Max EXEC Time
1 User x 50 iterations
2137ms
2069ms
3477ms
4 Users x 50 iterations
3171ms
2376ms
4080ms
10 Users x 50 iterations
9041ms
4018ms
47880ms
As more users are added, the time is increased due to the time needed to manage the XID space and write the additional WAL. Some of this is increased time is due to the elongated connection time due to the XID space. When the same test was executed with a Managed Connection Pooler, times came down a little bit:
Test
Average Time Per Exec
Min Exec Time
Max EXEC Time
10 Users x 50 iterations
6399ms
2690ms
10624ms
Now you ask, how can I monitor for this performance impact? The verdict is that it depends. Wide variations in execution time depending on database load may be one indication. Another indication may be entries in the pg_stat_slru table (however depending on the available ram, metrics may not appear here) and a final indication will always be WAL usage. In summary:
Metric
What it tells you
What it hides
Execution Time
“My query is slow.”
It doesn’t explain why (CPU vs Lock vs I/O). Additional investigation may be required.
pg_stat_slru
“Disk is thrashing.”
It reads 0 if you have enough RAM, hiding the fact that your CPU is burning up managing memory locks.
WAL Volume
The Real Truth.
It proves you are writing massive amounts of metadata (savepoint markers) to disk, even if the data volume is small.
When considering the three scenarios:
Tier 1: Standard Loop (Baseline)
Mechanism: One single transaction for the whole batch.
Overhead: Near zero.
Verdict: 🟢 Safe. This is how Postgres is designed to work.
Tier 2: The “Safety Trap” (Exception Block)
Mechanism: Uses BEGIN…EXCEPTION inside the loop.
Hidden Cost: Every single iteration creates a Subtransaction. This burns a Subtransaction ID and forces a WAL write to create a “savepoint” it can roll back to.
Verdict: 🟡 Risky. It is 3x–5x slower and generates massive Write-Ahead Log (WAL) bloat, even for successful inserts.
Tier 3: The “Overflow” Disaster (Depth > 64)
Mechanism: Nesting subtransactions deeper than 64 layers (or having >64 active savepoints).
The Cliff: PostgreSQL runs out of fast RAM slots (PGPROC array) and must spill tracking data to the slow SLRU cache (pg_subtrans).
Verdict: 🔴 Catastrophic. Performance degrades non-linearly (often 50x–100x slower) and causes global locking contention that can freeze other users.
Final Recommendation: If you need to handle errors in a bulk load (e.g., “Insert 10,000 rows, skip the ones that fail”):
DO validate data before the insert to filter out bad rows in the application layer or a staging table
Do NOT wrap every insert in an EXCEPTION block. i.e a LOOP
Use EXCEPTION logic purposefully and avoid the need for CATCH ALL like “WHEN OTHERS”
DO use INSERT … ON CONFLICT DO NOTHING (if the error is unique constraint)
I recently had a customer that wanted to leverage read replicas to ensure that their read queries were not going to impeded with work being done on the primary instance and also required an SLA of at worst a few seconds. Ultimately they weren’t meeting the SLA and my colleagues and I were asked to look at what was going on.
The first thing we came to understand is that the pattern of work on the primary is a somewhat frequent large DELETE statement followed by a data refresh accomplished by a COPY from STDIN command against a partitioned table with 16 hash partitions.
The problem being observed was that periodically the SELECTs occurring on the read replica would time out and not meet the SLA. Upon investigation, we found that the “startup” process on the read replica would periodically request an “exclusive lock” on some random partition. This exclusive lock would block the SELECT (which is partition unaware) and then cause the timeout. But what is causing the timeout?
After spending some time investigating, the team was able to correlate the exclusive lock with a routine “autovacuum” occurring on the primary. But why was it locking? After inspection of the WAL, it turns out that it the issue was due to a step in the vacuum process whereby it tries to return free pages at the end of the table back to the OS, truncation of the High Water Mark (HWM). Essentially the lock is requested on the primary and then transmitted to the replica via the WAL so that the tables can be kept consistent.
To confirm that it was in fact the step in VACUUM that truncates the HWM, we decided to alter each partition of the table to allow VACUUM to skip that step:
ALTER TABLE [table name / partition name] SET (vacuum_truncate = false);
After letting this run for 24 hours, we in fact saw no further blocking locks causing SLA misses on the replicas. Should we worry about shrinking the High Water Mark (HWM)? Well as with everything in IT, it depends. Other DBMS engines like Oracle do not shrink the High Water Mark (HWM), typically maintenance operations such as DBMS_REDEF or ALTER TABLE … SHRINK SPACE / SHRINK SPACE COMPACT deal with that. So now that we are talking about PostgreSQL do we need to worry about it? This is where the pg_freespacemap extension can help. We can use this extension and a script to check to see if in fact the High Water Mark (HWM) is growing or staying put. If it is growing, we can just execute a regular VACUUM with an additional option called TRUNCATE to handle it:
When you do this, you will see one additional message in the VACUUM output signifying that the VACUUM truncated the High Water Mark (HWM):
INFO: table "large_table": truncated 302534 to 302233 pages
As I stated earlier, we can use pg_freespacemap to see if we actually need to worry about the High Water Mark (HWM) growing. I could have taken a lot of time to write a script to figure it out, but instead, I enlisted Google Gemini to see what it would come up with. After a few iterations, the output was nearly perfect!
CREATE EXTENSION pg_freespacemap;
CREATE OR REPLACE FUNCTION show_empty_pages(p_table_name TEXT)
RETURNS VOID AS $$
DECLARE
-- Core processing variables
table_oid_regclass REGCLASS;
block_size BIGINT;
fsm_granularity BIGINT;
max_fsm_free_space BIGINT;
total_pages BIGINT;
high_water_mark BIGINT := 0;
-- Variables for the final summary
first_empty_block BIGINT;
free_pages_at_end BIGINT;
free_space_at_end TEXT;
BEGIN
-- Setup
table_oid_regclass := p_table_name::regclass;
block_size := current_setting('block_size')::bigint;
SELECT relpages INTO total_pages FROM pg_class WHERE oid = table_oid_regclass;
fsm_granularity := block_size / 256;
max_fsm_free_space := floor((block_size - 24) / fsm_granularity) * fsm_granularity;
--------------------------------------------------------------------------------
-- PASS 1: FIND THE HIGH-WATER MARK (last page with data)
--------------------------------------------------------------------------------
FOR i IN REVERSE (total_pages - 1)..0 LOOP
IF pg_freespace(table_oid_regclass, i) < max_fsm_free_space THEN
high_water_mark := i;
EXIT;
END IF;
END LOOP;
--------------------------------------------------------------------------------
-- FINAL STEP: CALCULATE AND RAISE THE SUMMARY NOTICE
--------------------------------------------------------------------------------
first_empty_block := high_water_mark + 1;
free_pages_at_end := total_pages - first_empty_block;
IF free_pages_at_end < 0 THEN
free_pages_at_end := 0;
END IF;
free_space_at_end := pg_size_pretty(free_pages_at_end * block_size);
RAISE NOTICE '-------------------------------------------------------------';
RAISE NOTICE 'Summary for table: %', p_table_name;
RAISE NOTICE '-------------------------------------------------------------';
RAISE NOTICE 'The High Water Mark (HWM) is at page: %', total_pages;
IF total_pages <> first_empty_block THEN
RAISE NOTICE 'First potentially empty page is at: %', first_empty_block;
RAISE NOTICE 'Total Pages in Table: %', total_pages;
RAISE NOTICE 'Number of potentially truncatable pages at the end: %', free_pages_at_end;
RAISE NOTICE 'Amount of free space at the end of the table: %', free_space_at_end;
ELSE
RAISE NOTICE 'There are no empty pages to truncate';
END IF;
RAISE NOTICE '-------------------------------------------------------------';
END;
$$ LANGUAGE plpgsql;
This handy script could be periodically executed to check the High Water Mark (HWM) and will produce the following output:
(postgres@10.3.1.17:5432) [postgres] > SELECT * FROM show_empty_pages('public.large_table');
NOTICE: -------------------------------------------------------------
NOTICE: Summary for table: public.large_table
NOTICE: -------------------------------------------------------------
NOTICE: The High Water Mark (HWM) is at page: 302534
NOTICE: First potentially empty page is at: 302233
NOTICE: Total Pages in Table: 302534
NOTICE: Number of potentially truncatable pages at the end: 301
NOTICE: Amount of free space at the end of the table: 2408 kB
NOTICE: -------------------------------------------------------------
If there is no freespace after the last full block the output will look like this:
NOTICE: -------------------------------------------------------------
NOTICE: Summary for table: public.large_table
NOTICE: -------------------------------------------------------------
NOTICE: The High Water Mark (HWM) is at page: 302233
NOTICE: There are no empty pages to truncate
NOTICE: -------------------------------------------------------------
So while there is no right answer on how to deal with this, ensure you know the implications of each step in the process. In this case, we have decided to turn the “vacuum_truncation” option to false, but maybe another option might be to tune vacuum in another way such as either making it more or less frequent. Always evaluate your own situation, but in any case it’s always good to know what happens in your database when certain commands are executed.
I previously blogged about ensuring that the “ON CONFLICT” directive is used in order to avoid vacuum from having to do additional work. You can read the original blog here: Reduce Vacuum by Using “ON CONFLICT” Directive
Now that Postgres has incorporated the “MERGE” functionality into Postgres 15 and above, I wanted to ensure that there was no “strange” behavior as it relates to vacuum when using merge. As you can see here, the “MERGE” functionality does perform exactly as expected. For example, when you attempt to have a merge where the directive is to try an insert first followed by an update, exactly one row is marked dead when the insert fails and the update succeeds.
/* Create the table: */
CREATE TABLE public.pk_violation_test (
id int PRIMARY KEY,
value numeric,
product_id int,
effective_date timestamp(3)
);
/* Insert some mocked up data */
INSERT INTO public.pk_violation_test VALUES (
generate_series(0,10000),
random()*1000,
random()*100,
current_timestamp(3));
/* Verify that there are no dead tuples: */
SELECT
schemaname,
relname,
n_live_tup,
n_dead_tup
FROM
pg_stat_all_tables
WHERE
relname = 'pk_violation_test';
schemaname | relname | n_live_tup | n_dead_tup
------------+-------------------+------------+------------
public | pk_violation_test | 100001 | 0
Then, create a simple merge and check the results:
WITH insert_query AS (
SELECT
0 AS id,
44.33893489873 AS value,
46 AS product_id,
now() AS effective_date) MERGE INTO pk_violation_test pkt
USING insert_query i ON pkt.id = i.id
WHEN MATCHED THEN
UPDATE SET
value = i.value, product_id = i.product_id, effective_date = i.effective_date
WHEN NOT MATCHED THEN
INSERT (id, value, product_id, effective_date)
VALUES (i.id, i.value, i.product_id, i.effective_date);
MERGE 1
And then check the dead tuple count:
SELECT
schemaname,
relname,
n_live_tup,
n_dead_tup
FROM
pg_stat_all_tables
WHERE
relname = 'pk_violation_test';
schemaname | relname | n_live_tup | n_dead_tup
------------+-------------------+------------+------------
public | pk_violation_test | 10001 | 1
(1 row)
As expected only one row is marked dead. Merge is such great functionality and I am glad to see it in Postgres. As you get time, all of your “ON CONFLICT” statements should be converted to use this functionality. Enjoy!
I’m always working with customers migrating from Oracle to PostgreSQL. In both DBMS systems, there is I/O impact when using exception handlers such as when handling a PK constraint violation, but the impact in PostgreSQL is different and you should be aware of what is actually going on. For example, when an exception occurs, redo is generated in Oracle (WAL in Postgres) and additional catalog queries are issued in both DBMS systems to get pertinent data about the constraint violation. But what actually happens in Postgres as it relates to MVCC?
Let’s use a simple test to demonstrate. First, create a table with some mocked up “product” data:
/* Create the table: */
CREATE TABLE public.pk_violation_test (
id int PRIMARY KEY,
value numeric,
product_id int,
effective_date timestamp(3)
);
/* Insert some mocked up data */
INSERT INTO public.pk_violation_test VALUES (
generate_series(0,10000),
random()*1000,
random()*100,
current_timestamp(3));
/* Verify that there are no dead tuples: */
SELECT
schemaname,
relname,
n_live_tup,
n_dead_tup
FROM
pg_stat_all_tables
WHERE
relname = 'pk_violation_test';
schemaname | relname | n_live_tup | n_dead_tup
------------+-------------------+------------+------------
public | pk_violation_test | 100001 | 0
Now, we will execute a simple insert statement using no directive:
/* Perform a simple insert: */
INSERT INTO pk_violation_test
VALUES (0, 44.33893489873, 46, now());
ERROR: duplicate key value violates unique constraint "pk_violation_test_pkey"
DETAIL: Key (id)=(0) already exists.
Time: 1.292 ms
/* Verify the dead tuple count: */
SELECT
schemaname,
relname,
n_live_tup,
n_dead_tup
FROM
pg_stat_all_tables
WHERE
relname = 'pk_violation_test';
schemaname | relname | n_live_tup | n_dead_tup
------------+-------------------+------------+------------
public | pk_violation_test | 100001 | 1
As you can see the error is produced that the insert violated the PK and the tuple which was in violation is now a dead tuple.
Now change the insert to use the “ON CONFLICT” directive and check the dead tuple count:
/* Perform a simple insert using the directive: */
INSERT INTO pk_violation_test
VALUES (0, 44.33893489873, 46, now())
ON CONFLICT
DO NOTHING;
INSERT 0 0
Time: 0.889 ms
/* Verify the unchanged dead tuple count: */
SELECT
schemaname,
relname,
n_live_tup,
n_dead_tup
FROM
pg_stat_all_tables
WHERE
relname = 'pk_violation_test';
schemaname | relname | n_live_tup | n_dead_tup
------------+-------------------+------------+------------
public | pk_violation_test | 100001 | 1
As you can see the dead tuple count did not increase, thus reducing the amount of vacuum needed!
Now, most of the code conversion tools might try to do something with that exception block. Maybe it gets converted into something like this:
/* Create a simple function with exception logic: */
CREATE OR REPLACE FUNCTION pk_violation_test_func (p_id int, p_value numeric, p_product_id int)
RETURNS VOID
AS $$
BEGIN
BEGIN
INSERT INTO pk_violation_test (id, value, product_id, effective_date)
VALUES (p_id, p_value, p_product_id, now());
RETURN;
EXCEPTION
WHEN unique_violation THEN
-- try an update
UPDATE
pk_violation_test
SET
value = p_value,
product_id = p_product_id,
effective_date = now()
WHERE
id = p_id;
IF found THEN
RETURN;
END IF;
END;
END;
$$
LANGUAGE plpgsql;
So what happens in this case? Watch how now we get TWO dead tuples. One for the insert and one for the update (if it’s not a HOT Update). I will vacuum the table first so that there is no question around how many dead tuples there are:
/* Vacuum the table: */
vacuum pk_violation_test;
/* Verify the dead tuple count: */
SELECT
schemaname,
relname,
n_live_tup,
n_dead_tup
FROM
pg_stat_all_tables
WHERE
relname = 'pk_violation_test';
schemaname | relname | n_live_tup | n_dead_tup
------------+-------------------+------------+------------
public | pk_violation_test | 1000001 | 0
/* Call the sample function */
select * from pk_violation_test_func(0, 44.33893489873, 46);
pk_violation_test_func
------------------------
(1 row)
/* Verify the dead tuple count: */
SELECT
schemaname,
relname,
n_live_tup,
n_dead_tup
FROM
pg_stat_all_tables
WHERE
relname = 'pk_violation_test';
schemaname | relname | n_live_tup | n_dead_tup
------------+-------------------+------------+------------
public | pk_violation_test | 1000001 | 2
As you can see the insert attempt using the “ON CONFLICT” directive did not create a dead tuple. In turn this will make your inserts which violate a PK more efficient, faster and not cause unnecessary vacuum. Remember that logging message that was a result of the PK violation not being handled? That log message is gone too. A WIN all around!
Point being that it is very important to understand whatever DBMS you are running it. Things that seem very simple can have pros and cons that need to be dealt with.