Tag Archives: AlloyDB

More Obscure Things That Make It Go “Vacuum” in PostgreSQL

Do PostgreSQL Sub-Transactions Hurt Performance?

The short answer is always “maybe”. However, in the following post, I hope to demonstrate what creates a sub-transactions and what happens to the overall transaction id utilization when they are invoked. I will also show how performance is affected when there are lots of connections creating and consuming sub-transactions.

First, it is important to understand what statements will utilize a transaction id and which ones may be more critical (expensive) than others:

Calling Nested Procedures: 🟢 Free. No new XIDs are used. They share the parent’s transaction.
BEGIN…END (No Exception Block): 🟢 Free. Just organization.
COMMIT: 🟡 Expensive. Burns a main Transaction ID (finite resource, leads to Vacuum Freeze).
EXCEPTION: 🔴 Dangerous. Creates a subtransaction (performance killer).

So why is misplaced EXCEPTION logic possibly a performance killer? PostgreSQL is optimized to handle a small number of open subtransactions very efficiently. Each backend process has a fixed-size array in shared memory (part of the PGPROC structure) that can hold up to 64 open subtransaction IDs (XIDs) for the current top-level transaction. As long as your nesting depth stays below 64, PostgreSQL manages everything in this fast, local memory array. It does not need to use the Subtrans SLRU (Simple Least Recently Used) subsystem, which is what pg_stat_slru tracks. The problem is that the utilization of subtransaction IDs (XIDs) can get out of hand rather quickly if you are not paying attention to your application flow and once PostgreSQL runs out of fast RAM slots (PGPROC array) and spills tracking data to the slow SLRU cache (pg_subtrans), performance degrades non-linearly (often 50x–100x slower) and causes global locking contention that can freeze other users.

One of the other byproducts of utilizing too many sub-transactions is the additional WAL that will be generated. All of these can be demonstrated by a simple block of code:

------------------------------------------------------------------
-- 1. PREP: Create the helper function
------------------------------------------------------------------
CREATE OR REPLACE FUNCTION generate_subtrans_load(p_id int, depth int) 
RETURNS void AS $$
BEGIN
    IF depth > 0 THEN
        BEGIN
            PERFORM generate_subtrans_load(p_id, depth - 1);
        EXCEPTION WHEN OTHERS THEN NULL;
        END;
    ELSE
        INSERT INTO penalty_test VALUES (p_id, 'Deep Stack');
    END IF;
END;
$$ LANGUAGE plpgsql;

------------------------------------------------------------------
-- 2. THE COMPLETE TEST SUITE
------------------------------------------------------------------
DO $$ 
DECLARE
    -- Timing Variables
    v_start_ts  timestamp;
    v_t_base    interval; v_t_single interval; 
    v_t_48      interval; v_t_128    interval;

    -- Row Verification
    v_rows_base int;      v_rows_single int;
    v_rows_48   int;      v_rows_128    int;

    -- WAL Variables
    v_start_lsn pg_lsn;   v_end_lsn  pg_lsn;
    v_wal_base  numeric;  v_wal_single numeric;
    v_wal_48    numeric;  v_wal_128    numeric;

    -- XID Variables
    v_start_xid xid8;     v_end_xid  xid8;
    v_xid_base  bigint;   v_xid_single bigint;
    v_xid_48    bigint;   v_xid_128    bigint;

    i int;
    v_target_rows int := 5000; 
    
    -- Helper calc vars
    v_us_48 numeric;
    v_us_128 numeric;
BEGIN
    ---------------------------------------------------
    -- A. SETUP
    ---------------------------------------------------
    DROP TABLE IF EXISTS penalty_test;
    -- Explicitly preserve rows so COMMIT doesn't empty the table
    CREATE TEMP TABLE penalty_test (id int, val text) ON COMMIT PRESERVE ROWS;

    ---------------------------------------------------
    -- B. TEST 1: BASELINE (Standard Loop)
    ---------------------------------------------------
    -- 1. Force fresh start
    COMMIT; 
    v_start_ts  := clock_timestamp();
    v_start_lsn := pg_current_wal_insert_lsn();
    v_start_xid := pg_snapshot_xmax(pg_current_snapshot());

    FOR i IN 1..v_target_rows LOOP
        INSERT INTO penalty_test VALUES (i, 'Standard');
    END LOOP;
    
    -- 2. Force snapshot refresh to see XID consumption
    COMMIT; 
    v_end_lsn := pg_current_wal_insert_lsn();
    v_end_xid := pg_snapshot_xmax(pg_current_snapshot());
    v_t_base  := clock_timestamp() - v_start_ts;
    
    SELECT count(*) INTO v_rows_base FROM penalty_test;
    v_wal_base := v_end_lsn - v_start_lsn;
    v_xid_base := (v_end_xid::text::bigint - v_start_xid::text::bigint);

    ---------------------------------------------------
    -- C. TEST 2: SINGLE TRAP (Depth 1)
    ---------------------------------------------------
    TRUNCATE TABLE penalty_test;
    COMMIT; -- Clear stats
    
    v_start_ts  := clock_timestamp();
    v_start_lsn := pg_current_wal_insert_lsn();
    v_start_xid := pg_snapshot_xmax(pg_current_snapshot());

    FOR i IN 1..v_target_rows LOOP
        BEGIN
            INSERT INTO penalty_test VALUES (i, 'Single Trap');
        EXCEPTION WHEN OTHERS THEN NULL;
        END;
    END LOOP;
    
    COMMIT; -- Force refresh
    v_end_lsn := pg_current_wal_insert_lsn();
    v_end_xid := pg_snapshot_xmax(pg_current_snapshot());
    v_t_single := clock_timestamp() - v_start_ts;

    SELECT count(*) INTO v_rows_single FROM penalty_test;
    v_wal_single := v_end_lsn - v_start_lsn;
    v_xid_single := (v_end_xid::text::bigint - v_start_xid::text::bigint);

    ---------------------------------------------------
    -- D. TEST 3: SAFE ZONE (Depth 48)
    ---------------------------------------------------
    TRUNCATE TABLE penalty_test;
    COMMIT;
    
    v_start_ts  := clock_timestamp();
    v_start_lsn := pg_current_wal_insert_lsn();
    v_start_xid := pg_snapshot_xmax(pg_current_snapshot());

    FOR i IN 1..v_target_rows LOOP
        PERFORM generate_subtrans_load(i, 48);
    END LOOP;
    
    COMMIT;
    v_end_lsn := pg_current_wal_insert_lsn();
    v_end_xid := pg_snapshot_xmax(pg_current_snapshot());
    v_t_48    := clock_timestamp() - v_start_ts;

    SELECT count(*) INTO v_rows_48 FROM penalty_test;
    v_wal_48 := v_end_lsn - v_start_lsn;
    v_xid_48 := (v_end_xid::text::bigint - v_start_xid::text::bigint);

    ---------------------------------------------------
    -- E. TEST 4: OVERFLOW ZONE (Depth 128)
    ---------------------------------------------------
    TRUNCATE TABLE penalty_test;
    COMMIT;
    
    v_start_ts  := clock_timestamp();
    v_start_lsn := pg_current_wal_insert_lsn();
    v_start_xid := pg_snapshot_xmax(pg_current_snapshot());

    FOR i IN 1..v_target_rows LOOP
        PERFORM generate_subtrans_load(i, 128);
    END LOOP;
    
    COMMIT;
    v_end_lsn := pg_current_wal_insert_lsn();
    v_end_xid := pg_snapshot_xmax(pg_current_snapshot());
    v_t_128   := clock_timestamp() - v_start_ts;

    SELECT count(*) INTO v_rows_128 FROM penalty_test;
    v_wal_128 := v_end_lsn - v_start_lsn;
    v_xid_128 := (v_end_xid::text::bigint - v_start_xid::text::bigint);

    ---------------------------------------------------
    -- F. THE REPORT
    ---------------------------------------------------
    RAISE NOTICE '===================================================';
    RAISE NOTICE '    POSTGRESQL SUBTRANSACTION IMPACT REPORT      ';
    RAISE NOTICE '===================================================';
    RAISE NOTICE 'Target Rows: %', v_target_rows;
    RAISE NOTICE '---------------------------------------------------';
    
    RAISE NOTICE 'METRIC 1: EXECUTION TIME & VERIFICATION';
    RAISE NOTICE '  1. Baseline:         % (Rows: %)', v_t_base, v_rows_base;
    RAISE NOTICE '  2. Single Exception: % (Rows: %)', v_t_single, v_rows_single;
    RAISE NOTICE '  3. Safe Zone (48):   % (Rows: %)', v_t_48, v_rows_48;
    RAISE NOTICE '  4. Overflow (128):   % (Rows: %)', v_t_128, v_rows_128;
    RAISE NOTICE '---------------------------------------------------';

    v_us_48  := extract(epoch from v_t_48) * 1000000;
    v_us_128 := extract(epoch from v_t_128) * 1000000;

    RAISE NOTICE 'METRIC 2: AVERAGE COST PER SUBTRANSACTION';
    RAISE NOTICE '   - Safe Zone (48):     % us per subtrans', round(v_us_48 / (v_target_rows * 48), 2);
    RAISE NOTICE '   - Overflow Zone (128): % us per subtrans', round(v_us_128 / (v_target_rows * 128), 2);
    
    IF (v_us_128 / (v_target_rows * 128)) > (v_us_48 / (v_target_rows * 48)) THEN
        RAISE NOTICE '   -> RESULT: Overflow subtransactions were % %% slower per unit.', 
            round( ( ((v_us_128 / (v_target_rows * 128)) - (v_us_48 / (v_target_rows * 48))) / (v_us_48 / (v_target_rows * 48)) * 100)::numeric, 1);
    ELSE
         RAISE NOTICE '   -> RESULT: Overhead appears linear.';
    END IF;
    RAISE NOTICE '---------------------------------------------------';
    
    RAISE NOTICE 'METRIC 3: DISK USAGE (WAL WRITTEN)';
    RAISE NOTICE '  1. Baseline:         % bytes', v_wal_base;
    RAISE NOTICE '  2. Single Exception: % bytes', v_wal_single;
    RAISE NOTICE '  3. Safe Zone (48):   % bytes', v_wal_48;
    RAISE NOTICE '  4. Overflow (128):   % bytes', v_wal_128;
    RAISE NOTICE '---------------------------------------------------';

    RAISE NOTICE 'METRIC 4: TRANSACTION ID CONSUMPTION';
    RAISE NOTICE '  1. Baseline:         % XIDs', v_xid_base;
    RAISE NOTICE '  2. Single Exception: % XIDs', v_xid_single;
    RAISE NOTICE '  3. Safe Zone (48):   % XIDs', v_xid_48;
    RAISE NOTICE '  4. Overflow (128):   % XIDs', v_xid_128;
    
    RAISE NOTICE '===================================================';
END $$;

The code block above shows 4 different scenarios:

Baseline: Simple insert that inserts a number of rows in a loop with no exception logic
Single Exception (Depth 1): The same insert loop but with an exception block that fires after every insert
48 Sub-transaction Loop (Depth 48): A function call that arbitrarily creates 48 subtransactions (well below the 64 limit) and then completes the same insert
128 Sub-transaction Loop (Depth 128): A function call that arbitrarily creates 128 subtransactions (well above the 64 limit) and then completes the same insert

What you will see is that when run by a single user the impact is not terribly great. Besides utilizing more transaction ids (XIDs) and generating much wal, the system and performance impact of this does not appear to be a big deal. When running for a 5000 row insert:

NOTICE:  ===================================================
NOTICE:      POSTGRESQL SUBTRANSACTION IMPACT REPORT
NOTICE:  ===================================================
NOTICE:  Target Rows: 5000
NOTICE:  ---------------------------------------------------
NOTICE:  METRIC 1: EXECUTION TIME & VERIFICATION
NOTICE:    1. Baseline:         00:00:00.020292 (Rows: 5000)
NOTICE:    2. Single Exception: 00:00:00.033419 (Rows: 5000)
NOTICE:    3. Safe Zone (48):   00:00:01.414132 (Rows: 5000)
NOTICE:    4. Overflow (128):   00:00:03.817267 (Rows: 5000)
NOTICE:  ---------------------------------------------------
NOTICE:  METRIC 2: AVERAGE COST PER SUBTRANSACTION
NOTICE:     - Safe Zone (48):     5.89 us per subtrans
NOTICE:     - Overflow Zone (128): 5.96 us per subtrans
NOTICE:     -> RESULT: Overflow subtransactions were 1.2 % slower per unit.
NOTICE:  ---------------------------------------------------
NOTICE:  METRIC 3: DISK USAGE (WAL WRITTEN)
NOTICE:    1. Baseline:         48 bytes
NOTICE:    2. Single Exception: 43936 bytes
NOTICE:    3. Safe Zone (48):   2076216 bytes
NOTICE:    4. Overflow (128):   5536832 bytes
NOTICE:  ---------------------------------------------------
NOTICE:  METRIC 4: TRANSACTION ID CONSUMPTION
NOTICE:    1. Baseline:         1 XIDs
NOTICE:    2. Single Exception: 5001 XIDs
NOTICE:    3. Safe Zone (48):   240001 XIDs
NOTICE:    4. Overflow (128):   640002 XIDs
NOTICE:  ===================================================
DO
Time: 5294.689 ms (00:05.295)

As you can see in the results above, due to the transaction tracking, each scenario burns more XIDs and creates more WAL. This is due to the additional transaction tracking that must be available should the WAL need to be replayed. Where the real killer comes in is when multiple connections all begin to execute this same code. When I did a synthetic test using Apache JMeter the results are astounding. The test was completed on a 4vCPU C4A AlloyDB Instance:

Test	Average Time Per Exec	Min Exec Time	Max EXEC Time
1 User x 50 iterations	2137ms	2069ms	3477ms
4 Users x 50 iterations	3171ms	2376ms	4080ms
10 Users x 50 iterations	9041ms	4018ms	47880ms

As more users are added, the time is increased due to the time needed to manage the XID space and write the additional WAL. Some of this is increased time is due to the elongated connection time due to the XID space. When the same test was executed with a Managed Connection Pooler, times came down a little bit:

Test	Average Time Per Exec	Min Exec Time	Max EXEC Time
10 Users x 50 iterations	6399ms	2690ms	10624ms

Now you ask, how can I monitor for this performance impact? The verdict is that it depends. Wide variations in execution time depending on database load may be one indication. Another indication may be entries in the pg_stat_slru table (however depending on the available ram, metrics may not appear here) and a final indication will always be WAL usage. In summary:

Metric	What it tells you	What it hides
Execution Time	“My query is slow.”	It doesn’t explain why (CPU vs Lock vs I/O). Additional investigation may be required.
pg_stat_slru	“Disk is thrashing.”	It reads 0 if you have enough RAM, hiding the fact that your CPU is burning up managing memory locks.
WAL Volume	The Real Truth.	It proves you are writing massive amounts of metadata (savepoint markers) to disk, even if the data volume is small.

When considering the three scenarios:

Tier 1: Standard Loop (Baseline)
- Mechanism: One single transaction for the whole batch.
- Overhead: Near zero.
- Verdict: 🟢 Safe. This is how Postgres is designed to work.
Tier 2: The “Safety Trap” (Exception Block)
- Mechanism: Uses BEGIN…EXCEPTION inside the loop.
- Hidden Cost: Every single iteration creates a Subtransaction. This burns a Subtransaction ID and forces a WAL write to create a “savepoint” it can roll back to.
- Verdict: 🟡 Risky. It is 3x–5x slower and generates massive Write-Ahead Log (WAL) bloat, even for successful inserts.
Tier 3: The “Overflow” Disaster (Depth > 64)
- Mechanism: Nesting subtransactions deeper than 64 layers (or having >64 active savepoints).
- The Cliff: PostgreSQL runs out of fast RAM slots (PGPROC array) and must spill tracking data to the slow SLRU cache (pg_subtrans).
- Verdict: 🔴 Catastrophic. Performance degrades non-linearly (often 50x–100x slower) and causes global locking contention that can freeze other users.

Final Recommendation:
If you need to handle errors in a bulk load (e.g., “Insert 10,000 rows, skip the ones that fail”):

DO validate data before the insert to filter out bad rows in the application layer or a staging table
Do NOT wrap every insert in an EXCEPTION block. i.e a LOOP
Use EXCEPTION logic purposefully and avoid the need for CATCH ALL like “WHEN OTHERS”
DO use INSERT … ON CONFLICT DO NOTHING (if the error is unique constraint)

Would you like to read more and get additional perspective?
https://postgres.ai/blog/20210831-postgresql-subtransactions-considered-harmful

Hope this helps you understand the effect of subtransactions in PostgreSQL!

Understanding High Water Mark Locking Issues in PostgreSQL Vacuums

Follow-Up: Reduce Vacuum by Using “ON CONFLICT” Directive

2 Replies

I previously blogged about ensuring that the “ON CONFLICT” directive is used in order to avoid vacuum from having to do additional work. You can read the original blog here: Reduce Vacuum by Using “ON CONFLICT” Directive

Now that Postgres has incorporated the “MERGE” functionality into Postgres 15 and above, I wanted to ensure that there was no “strange” behavior as it relates to vacuum when using merge. As you can see here, the “MERGE” functionality does perform exactly as expected. For example, when you attempt to have a merge where the directive is to try an insert first followed by an update, exactly one row is marked dead when the insert fails and the update succeeds.

/* Create the table: */
CREATE TABLE public.pk_violation_test (
        id int PRIMARY KEY, 
        value numeric,
        product_id int,
        effective_date timestamp(3)
        );
 
 
/* Insert some mocked up data */
INSERT INTO public.pk_violation_test VALUES ( 
        generate_series(0,10000), 
        random()*1000,
        random()*100,
        current_timestamp(3));
 
/* Verify that there are no dead tuples: */
SELECT
    schemaname,
    relname,
    n_live_tup,
    n_dead_tup
FROM
    pg_stat_all_tables
WHERE
    relname = 'pk_violation_test';
 
 schemaname |      relname      | n_live_tup | n_dead_tup
------------+-------------------+------------+------------
 public     | pk_violation_test |    100001  |          0

Then, create a simple merge and check the results:

WITH insert_query AS (
    SELECT
        0 AS id,
        44.33893489873 AS value,
        46 AS product_id,
        now() AS effective_date) MERGE INTO pk_violation_test pkt
    USING insert_query i ON pkt.id = i.id
    WHEN MATCHED THEN
        UPDATE SET
            value = i.value, product_id = i.product_id, effective_date = i.effective_date
    WHEN NOT MATCHED THEN
        INSERT (id, value, product_id, effective_date)
            VALUES (i.id, i.value, i.product_id, i.effective_date);
MERGE 1

And then check the dead tuple count:

SELECT
    schemaname,
    relname,
    n_live_tup,
    n_dead_tup
FROM
    pg_stat_all_tables
WHERE
    relname = 'pk_violation_test';
 schemaname |      relname      | n_live_tup | n_dead_tup
------------+-------------------+------------+------------
 public     | pk_violation_test |      10001 |          1
(1 row)

As expected only one row is marked dead. Merge is such great functionality and I am glad to see it in Postgres. As you get time, all of your “ON CONFLICT” statements should be converted to use this functionality. Enjoy!

Using the “hint_plan” Table Provided by the PostgreSQL Extension “pg_hint_plan”

2 Replies

Introduction

For those who have worked with Oracle, the pg_hint_plan extension is one that will allow you to hint plans in patterns that you are likely very familiar with:

sql_patch
sql_profile
sql_plan_baselines

While currently, the functionality provided by pg_hint_plan is not nearly as robust (hints list), it does provide most of what you would encounter day to day as a DBA. That being said, one thing that is currently missing is the ability to easily add hints without changing code via stored_procedures / functions like in Oracle. The only way to currently do this in Open Source PostgreSQL is to manually manipulate a table named “hints” typically located in the “hint_plan” schema.

The “hints” table which is provided by the extension is highly dependent (just like Oracle) on a normalized SQL statement. A normalized SQL statement in PostgreSQL is one that has all carriage returns removed, all spaces converted to single spaces and all literals and parameters replaced with a “?”. Typically you have to do this manually, but in this blog post, I am going to show how I have leveraged entries in “pg_stat_statements” along with custom written functions to normalize the statement and place it into the “hints” table. To use this “hints” table feature, the following setting must be enabled at either the session or system level:

set session pg_hint_plan.enable_hint_table to on;
or
in the postgresql.conf:
pg_hint_plan.enable_hint_table to on;

What Does a Normalized Statement Look Like?

Typically, when you receive code from a developer or even code that you work on yourself, you format it in order to to make it human readable and easier to interpret. For example, you might want your statement to look like this (notice the parameters / literals in the statement:

SELECT
    b.bid,
    sum(abalance)
FROM
    pgbench_branches b
    JOIN pgbench_accounts a ON (b.bid = a.bid)
WHERE
    b.bid = 12345
    AND a.aid BETWEEN 100 AND 200
GROUP BY
    b.bid
ORDER BY
    1;

Now to normalize the statement for use with the “hints” table it needs to look like this:

select b.bid, sum(abalance) from pgbench_branches b join pgbench_accounts a on (b.bid = a.bid) where b.bid = ? and a.aid between ? and ? group by b.bid order by 1;

You can either manually manipulate the statement to get it in this format do this or we can attempt to do it programmatically. I prefer as much as possible to let the system format it for me so I have written a few helper scripts to do this:

Helper Queries:

**** Feel free to utilize these functions, however they may contain errors or may not normalize all statements. They depend on the pg_stat_statements table and if the entire statement will not fit within the query field of that table, then these functions will not produce the correct output. I will also place them on my public github. If you find any errors or omissions, please let me know. ****

hint_plan.display_candidate_pg_hint_plan_queries

While you can easily select from the “hints” table on your own, this query will show what a normalized statement will look like before loading it to the table. You can leave the “p_query_id” parameter null to return all queries present in the pg_stat_statements in a normalized form or you can populate it with a valid “query_id” and it will return a single normalized statement:

CREATE OR REPLACE FUNCTION hint_plan.display_candidate_pg_hint_plan_queries(
  p_query_id bigint default null
  )
  RETURNS TABLE(queryid bigint, norm_query_string text)
  LANGUAGE 'plpgsql'
  COST 100
  VOLATILE PARALLEL UNSAFE
AS $BODY$
 DECLARE 
 	pg_stat_statements_exists boolean := false;
 BEGIN
   SELECT EXISTS (
    SELECT FROM 
        information_schema.tables 
    WHERE 
        table_schema LIKE 'public' AND 
        table_type LIKE 'VIEW' AND
        table_name = 'pg_stat_statements'
    ) INTO pg_stat_statements_exists;
   IF pg_stat_statements_exists AND p_query_id is not null THEN
    RETURN QUERY
    SELECT pss.queryid,
           substr(regexp_replace(
             regexp_replace(
                regexp_replace(
                   regexp_replace(
                      regexp_replace(pss.query, '\$\d+', '?', 'g'),
                                E'\r', ' ', 'g'),
                              E'\t', ' ', 'g'),
                           E'\n', ' ', 'g'),
                         '\s+', ' ', 'g') || ';',1,100)
 	FROM pg_stat_statements pss where pss.queryid = p_query_id;
   ELSE
    RETURN QUERY
    SELECT pss.queryid,
           substr(regexp_replace(
             regexp_replace(
                regexp_replace(
                   regexp_replace(
                      regexp_replace(pss.query, '\$\d+', '?', 'g'),
                                E'\r', ' ', 'g'),
                              E'\t', ' ', 'g'),
                           E'\n', ' ', 'g'),
                         '\s+', ' ', 'g') || ';',1,100)
 	FROM pg_stat_statements pss;
   END IF;
 END; 
$BODY$;

If our candidate query was this:

select queryid, query from pg_stat_statements where queryid =  -8949523101378282526;
       queryid        |            query
----------------------+-----------------------------
 -8949523101378282526 | select b.bid, sum(abalance)+
                      | from pgbench_branches b    +
                      | join pgbench_accounts a    +
                      | on (b.bid = a.bid)         +
                      | where b.bid = $1           +
                      | group by b.bid             +
                      | order by 1
(1 row)

The display function would return the following normalized query:

SELECT hint_plan.display_candidate_pg_hint_plan_queries(p_query_id => -8949523101378282526);
-[ RECORD 1 ]--------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------
display_candidate_pg_hint_plan_queries | (-8949523101378282526,"select b.bid, sum(abalance) from pgbench_branches b join pgbench_accounts a on (b.bid = a.bid) where b.bid = ? group by b.bid order by 1;")

You can then verify that the query is normalized properly and then move on toward using the next function to add the normalized query to the “hints” table.

hint_plan.add_stored_pg_hint_plan

Using the same query in the previous section, we will now add it to the “hints” table. This is where it is important to understand what hint you want to add.

CREATE OR REPLACE FUNCTION hint_plan.add_stored_pg_hint_plan(
  p_query_id bigint,
  p_hint_text text,
  p_application_name text default ''
  )
  RETURNS varchar
  LANGUAGE 'plpgsql'
  COST 100
  VOLATILE PARALLEL UNSAFE
AS $BODY$
-- p_hint_text can contain one or more hints either separated by a space or
-- a carriage return character.  Examples include:
-- Space Separated: SeqScan(a) Parallel(a 0 hard)
-- ASCII CRLF Separated: SeqScan(a)'||chr(10)||'Parallel(a 0 hard)
-- Single Hint: SeqScan(a)
-- 
-- Escaped text does not work: /* E'SeqScan(a)\nParallel(a 0 hard)'
 DECLARE 
 	hint_id hint_plan.hints.id%TYPE;
 	normalized_query_text hint_plan.hints.norm_query_string%TYPE;
 	pg_stat_statements_exists boolean := false;
 BEGIN
   SELECT EXISTS (
    SELECT FROM 
        information_schema.tables 
    WHERE 
        table_schema LIKE 'public' AND 
        table_type LIKE 'VIEW' AND
        table_name = 'pg_stat_statements'
    ) INTO pg_stat_statements_exists;
   IF NOT pg_stat_statements_exists THEN
    RAISE NOTICE 'pg_stat_statements extension has not been loaded, exiting';
    RETURN 'error';
   ELSE
    SELECT regexp_replace(
             regexp_replace(
                regexp_replace(
                   regexp_replace(
                      regexp_replace(query, '\$\d+', '?', 'g'),
                                E'\r', ' ', 'g'),
                              E'\t', ' ', 'g'),
                           E'\n', ' ', 'g'),
                         '\s+', ' ', 'g') || ';'
 	 INTO normalized_query_text
 	 FROM pg_stat_statements where queryid = p_query_id;
     IF normalized_query_text IS NOT NULL THEN
		INSERT INTO hint_plan.hints(norm_query_string, application_name, hints)
    	VALUES (normalized_query_text,
    			p_application_name,
    			p_hint_text
    	);
    	SELECT id into hint_id
    	FROM hint_plan.hints
    	WHERE norm_query_string = normalized_query_text;
 	    RETURN cast(hint_id as text);
     ELSE
 		RAISE NOTICE 'Query ID %q does not exist in pg_stat_statements', cast(p_query_id as text);
 		RETURN 'error';
     END IF;
   END IF;
 END; 
$BODY$;

Hint text contain one or more hints either separated by a space or a carriage return character. Examples include:

Space Separated: SeqScan(a) Parallel(a 0 hard)
ASCII CRLF Separated: SeqScan(a)’||chr(10)||’Parallel(a 0 hard)
Single Hint: SeqScan(a)
Escaped text does not work in the context of this function although this can be used if you are inserting manually to the “hints” table: E’SeqScan(a)\nParallel(a 0 hard)’

SELECT hint_plan.add_stored_pg_hint_plan(p_query_id => -8949523101378282526,
						p_hint_text => 'SeqScan(a) Parallel(a 0 hard)',
						p_application_name => '');

-[ RECORD 1 ]-----------+---
add_stored_pg_hint_plan | 28

Time: 40.889 ms

select * from hint_plan.hints where id = 28;
-[ RECORD 1 ]-----+------------------------------------------------------------------------------------------------------------------------------------------
id                | 28
norm_query_string | select b.bid, sum(abalance) from pgbench_branches b join pgbench_accounts a on (b.bid = a.bid) where b.bid = ? group by b.bid order by 1;
application_name  |
hints             | SeqScan(a) Parallel(a 0 hard)

In the above example, we are forcing a serial sequential scan of the “pgbench_accounts”. We left the “application name” parameter empty so that the hint applies to any calling application.

hint_plan.delete_stored_pg_hint_plan

You could easily just issue a delete against the “hints” table, but in keeping with utilizing a “function” approach to utilizing this functionality, a delete helper has also been developed:

CREATE OR REPLACE FUNCTION hint_plan.delete_stored_pg_hint_plan(
  p_hint_id bigint
  )
  RETURNS TABLE(id integer, norm_query_string text, application_name text, hints text)
  LANGUAGE 'plpgsql'
  COST 100
  VOLATILE PARALLEL UNSAFE
AS $BODY$
 BEGIN
    RETURN QUERY
    DELETE FROM hint_plan.hints h WHERE h.id = p_hint_id RETURNING *;
 END; 
$BODY$;

To delete a plan you can call the procedure as follows:

 SELECT hint_plan.delete_stored_pg_hint_plan(p_hint_id => 28);
-[ RECORD 1 ]--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
delete_stored_pg_hint_plan | (28,"select b.bid, sum(abalance) from pgbench_branches b join pgbench_accounts a on (b.bid = a.bid) where b.bid = ? group by b.bid order by 1;","","SeqScan(a) Parallel(a 0 hard)")

Time: 33.685 ms
select * from hint_plan.hints where id = 28;
(0 rows)

Time: 24.868 ms

As you can see the “hints” table is very useful and can help you emulate many parts of SQL Plan Management just like in Oracle.

Enjoy and all feedback is welcomed!!!

Leverage Google Cloud Logging + Monitoring for Custom Cloud SQL for Postgres or AlloyDB Alerts

1 Reply

As migrations to CloudSQL and AlloyDB pick up speed, inevitably you will run into a condition where the cloud tooling has not quite caught up with exposing custom alerts and incidents that you may be exposing on-premises with tools such as Nagios or Oracle Enterprise Manager. One such example is monitoring of replication tools such as the GoldenGate Heartbeat table. While there are many ways that you may be able to implement this, I wanted to demonstrate a way to leverage Google Cloud Logging + Google Cloud Monitoring. Using this method will allow us to keep a long term log of certain parameters like lag or anything else you have built into the heartbeat mechanism. To demonstrate, lets use Python to query the database and create a Cloud Logging Entry:

import argparse
from datetime import datetime, timedelta
from sqlalchemy import create_engine, text
from google.cloud import logging


def retrievePgAlert(
    username: str,
    password: str,
    hostname: str,
    portNumber: int,
    databaseName: str,
    alertType: str,
) -> None:

    alertList: list = []

    conn_string = f"postgresql+psycopg2://{username}:{password}@{hostname}:{portNumber}/{databaseName}?client_encoding=utf8"
    engine = create_engine(conn_string)
    with engine.connect() as con:

        if alertType == "ogg-lag":
            sqlQuery = text(
                f"select replicat, effective_date, lag from ogg.heartbeat where lag >=:lagAmt and effective_date >= now() - interval ':intervalAmt min'"
            )

        result = con.execute(
            sqlQuery, {"lagAmt": oggLagAmt, "intervalAmt": checkIntervalMinutes}
        ).fetchall()
        for row in result:
            alertList.append(row)

        if not alertList:
            print(f"No alerts as of {datetime.now().strftime('%m/%d/%Y %H:%M:%S')}")
        else:
            for alertText in alertList:
                print(
                    f"Replicat: {alertText[0]} at date {alertText[1]} has a total lag of: {alertText[2]} seconds"
                )

            writeGcpCloudLoggingAlert(
                logger_alert_type=alertType,
                loggerName=args.loggerName,
                logger_message=alertList,
            )

    con.close()
    engine.dispose()


def writeGcpCloudLoggingAlert(
    logger_alert_type: str,
    loggerName: str,
    logger_message: list,
) -> None:

    # Writes log entries to the given logger.
    logging_client = logging.Client()

    # This log can be found in the Cloud Logging console under 'Custom Logs'.
    logger = logging_client.logger(loggerName)

    # Struct log. The struct can be any JSON-serializable dictionary.
    if logger_alert_type == "ogg-lag":
        replicatName: str
        effectiveDate: datetime
        lagAmount: str

        for alertFields in logger_message:
            replicatName = alertFields[0]
            effectiveDate = alertFields[1]
            lagAmount = int(alertFields[2])

            logger.log_struct(
                {
                    "alertType": logger_alert_type,
                    "replicat": str(alertFields[0]),
                    "alertDate": alertFields[1].strftime("%m/%d/%Y, %H:%M:%S"),
                    "alertRetrievalDate": datetime.now().strftime("%m/%d/%Y, %H:%M:%S"),
                    "lagInSeconds": int(alertFields[2]),
                },
                severity="ERROR",
            )

    print("Wrote logs to {}.".format(logger.name))


def delete_logger(loggerName):
    """Deletes a logger and all its entries.

    Note that a deletion can take several minutes to take effect.
    """
    logging_client = logging.Client()
    logger = logging_client.logger(loggerName)

    logger.delete()

    print("Deleted all logging entries for {}".format(logger.name))


if __name__ == "__main__":

    cloudSQLHost: str = "127.0.0.1"
    hostname: str
    portNumber: str
    database: str
    username: str
    password: str
    oggLagAmt: int = 15
    checkIntervalMinutes: int = 20

    with open("~/.pgpass", "r") as pgpassfile:
        for line in pgpassfile:
            if line.strip().split(":")[0] == cloudSQLHost:
                hostname, portNumber, database, username, password = line.strip().split(
                    ":"
                )

    parser = argparse.ArgumentParser(
        description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter
    )
    parser.add_argument(
        "-loggerName",
        "--loggerName",
        type=str,
        help="GCP Cloud Log Namespace",
        default="postgres-alert",
    )
    parser.add_argument(
        "-alertType",
        "--alertType",
        type=str,
        help="Type of alert to log",
        default="ogg-lag",
    )
    args = parser.parse_args()

    if args.alertType == "ogg-lag":
        retrievePgAlert(
            hostname=hostname,
            username=username,
            password=password,
            portNumber=portNumber,
            databaseName=database,
            alertType=args.alertType,
        )

In this script we utilize the Google Cloud Logging APIs, SQLAlchemy and some other basic python imports to query the database based on a lag amount we are looking for from the heartbeat table.

***Note: The query within the python code could check for any condition by changing the query, by leveraging “gcloud” commands or REST API calls.

If the condition is met, the script creates a JSON message which is then written to the appropriate Google Cloud Logging Namespace. An example of the JSON message is below (sensitive information like the project id and instance id have been redacted):

{
  "insertId": "1b6fb35g18b606n",
  "jsonPayload": {
    "alertRetrievalDate": "01/20/2023, 18:47:20",
    "lagInSeconds": 15,
    "alertType": "ogg-lag",
    "alertDate": "01/20/2023, 18:34:55",
    "replicat": "r_hr"
  },
  "resource": {
    "type": "gce_instance",
    "labels": {
      "project_id": "[project id]",
      "instance_id": "****************",
      "zone": "projects/[project id]/zones/us-central1-c"
    }
  },
  "timestamp": "2023-01-20T18:47:20.103058301Z",
  "severity": "ERROR",
  "logName": "projects/[project id]/logs/postgres-alert",
  "receiveTimestamp": "2023-01-20T18:47:20.103058301Z"
}

Create a Cloud Logging Alert

Now that we have published a message to Cloud Logging, what can we do with it? Generally there are two paths, either a Cloud Metric or a Cloud Alert. For this demonstration, we will use the “Cloud Alert”. So to start the setup navigate to the console page “Operations Logging” —> “Logs Explorer”. From there click the “Create alert” function. The following dialog will show. You will need to double check the query to retrieve the appropriate logs in step 2, and in step 3, you can choose the time between notifications (this is to mute alerts that happen in between the interval) and how long past the last alert an incident will stay open. In this case, we will mute duplicate alerts that happen for 5 minutes after the first alert (if an alert occurs at 6 minutes another notification will fire) and incidents will remain open for 30 minutes past the last alert (no new incidents will be logged unless an alert occurs after that time frame). The query to be used within the alert is as follows:


logName="projects/[project id]/logs/postgres-alert"
AND severity="ERROR"
AND (jsonPayload.alertType = "ogg-lag")
AND (jsonPayload.lagInSeconds >= 15)
AND resource.labels.instance_id = [instance id]

The following dialogues outline the screens used to setup the alert.

The last step will be to choose your notification method, which is managed by different notification channels. The different types of notification channels include:

Mobile Devices
PagerDuty Services
PagerDuty Sync
Slack
Webhooks
E-Mail
SMS
Pub/Sub

Once all of this is defined, your alert is now set to notify once you place the python script on an appropriate schedule such as linux cron, Google Cloud Scheduler, etc. In this case we will now wait for an issue to occur that conforms to the alert. When it does an email like the following will result to the notification channel:

As your migration to cloud continues, keep an open mind and look for alternative ways to handle all of the operational “things” you are accustomed to in your on-premises environment. Most of the time there is a way in cloud to handle it!

Tuning the PostgreSQL “random_page_cost” Parameter

Three Configuration Parameters for PostgreSQL That Are Worth Further Investigation!

3 Replies

In my new role at Google, not only am I still working with lots of Oracle and replication tools, I am also expanding more into moving Oracle systems to Google Cloud on either CloudSQL for PostgreSQL or AlloyDB for PostgreSQL. After you have been looking at the systems for a little bit of time, there seem to be a few things worth tweaking from the out of the box values. It is my goal to discuss some of those things now and in future blog posts.

Let me start off by saying managed PostgreSQL CloudSQL products such as Google’s CloudSQL for PostgreSQL and AlloyDB for PostgreSQL (in Preview as of this post) are designed to be low maintenance and fit many different types of workloads. That being said, there are a few configuration parameters that you should really look at tuning as the defaults (as of PostgreSQL version 14) in most cases are just not set to the most efficient value if your workload is anything more than a VERY light workload.

work_mem

Sets the base maximum amount of memory to be used by a query operation (such as a sort or hash table) before writing to temporary disk files and the default value is four megabytes (4MB). People coming from the Oracle world will equate this setting with PGA, however you must keep in mind that the implementation is “private” memory in PostgreSQL while it is “shared” memory in Oracle. You must take care not to over configure this setting in PostgreSQL.

A full description of the parameter can be found here.

random_page_cost

Sets the planner’s estimate of the cost of a non-sequentially-fetched disk page and the default is 4.0. In reality this setting is good for a system in which disk performance is a concern (i.e a system with HDD vs SSDs) as it is assumed that random disk access is 40x slower than sequential access. Essentially if you want your system to prefer index and cache reads, lower this number from the default, but to no lower than the setting for seq_page_cost. For normal CloudSQL for PostgreSQL deployments that use SSD, I like to set this to 2. In deployments which utilize AlloyDB for PostgreSQL an even lower setting of 1.1 can be used due to the efficient Colossus Storage implementation.

For those that have been around Oracle for a while, this parameter behaves much like the “optimizer_index_cost_adj” parameter.

A full description of the parameter can be found here.

effective_io_concurrency

Sets the number of concurrent disk I/O operations that PostgreSQL expects can be executed simultaneously. Raising this value will increase the number of I/O operations that any individual PostgreSQL session attempts to initiate in parallel. The default is 1 and at this point this setting only effects bitmap heap scans. That being said, bitmap heap scans, while efficient, by nature have to look at the index as well as a corresponding heap block and if that data has to be read from disk and if your system can handle the parallelism like when you use SSD storage, you should increase this to a more meaningful value. I will do a separate blog post to show the effects of this, but in general as this number is increased beyond 1/2 the number of CPUs available, greater diminishing returns are observed.

A full description of the parameter can be found here.

In closing, just like Oracle and other RDBMSs, there are numerous configuration parameters all which can have effects on the workload. However, the above three parameters are the ones I most often find that have opportunities for optimization, especially on more modern platforms.

In future posts I will detail how each one of these can change a workload.

Enjoy!

🛩️ Shane Borden's Technology Blog

My Experiences Navigating Technology – All Views Are My Own

Tag Archives: AlloyDB

More Obscure Things That Make It Go “Vacuum” in PostgreSQL

Do PostgreSQL Sub-Transactions Hurt Performance?

Understanding High Water Mark Locking Issues in PostgreSQL Vacuums

Follow-Up: Reduce Vacuum by Using “ON CONFLICT” Directive

Using the “hint_plan” Table Provided by the PostgreSQL Extension “pg_hint_plan”

Introduction

What Does a Normalized Statement Look Like?

Helper Queries:

hint_plan.display_candidate_pg_hint_plan_queries

hint_plan.add_stored_pg_hint_plan

hint_plan.delete_stored_pg_hint_plan

Leverage Google Cloud Logging + Monitoring for Custom Cloud SQL for Postgres or AlloyDB Alerts

Create a Cloud Logging Alert

Tuning the PostgreSQL “random_page_cost” Parameter

Three Configuration Parameters for PostgreSQL That Are Worth Further Investigation!

work_mem

random_page_cost

effective_io_concurrency

Share this:

Share this:

Share this:

Share this:

Introduction

What Does a Normalized Statement Look Like?

Helper Queries:

hint_plan.display_candidate_pg_hint_plan_queries

hint_plan.add_stored_pg_hint_plan

hint_plan.delete_stored_pg_hint_plan

Share this:

Create a Cloud Logging Alert

Share this:

Share this:

work_mem

random_page_cost

effective_io_concurrency

Share this: