Category Archives: Extension

Understanding High Water Mark Locking Issues in PostgreSQL Vacuums

I recently had a customer that wanted to leverage read replicas to ensure that their read queries were not going to impeded with work being done on the primary instance and also required an SLA of at worst a few seconds. Ultimately they weren’t meeting the SLA and my colleagues and I were asked to look at what was going on.

The first thing we came to understand is that the pattern of work on the primary is a somewhat frequent large DELETE statement followed by a data refresh accomplished by a COPY from STDIN command against a partitioned table with 16 hash partitions.

The problem being observed was that periodically the SELECTs occurring on the read replica would time out and not meet the SLA. Upon investigation, we found that the “startup” process on the read replica would periodically request an “exclusive lock” on some random partition. This exclusive lock would block the SELECT (which is partition unaware) and then cause the timeout. But what is causing the timeout?

After spending some time investigating, the team was able to correlate the exclusive lock with a routine “autovacuum” occurring on the primary. But why was it locking? After inspection of the WAL, it turns out that it the issue was due to a step in the vacuum process whereby it tries to return free pages at the end of the table back to the OS, truncation of the High Water Mark (HWM). Essentially the lock is requested on the primary and then transmitted to the replica via the WAL so that the tables can be kept consistent.

To confirm that it was in fact the step in VACUUM that truncates the HWM, we decided to alter each partition of the table to allow VACUUM to skip that step:

ALTER TABLE [table name / partition name] SET (vacuum_truncate = false);

After letting this run for 24 hours, we in fact saw no further blocking locks causing SLA misses on the replicas. Should we worry about shrinking the High Water Mark (HWM)? Well as with everything in IT, it depends. Other DBMS engines like Oracle do not shrink the High Water Mark (HWM), typically maintenance operations such as DBMS_REDEF or ALTER TABLE … SHRINK SPACE / SHRINK SPACE COMPACT deal with that. So now that we are talking about PostgreSQL do we need to worry about it? This is where the pg_freespacemap extension can help. We can use this extension and a script to check to see if in fact the High Water Mark (HWM) is growing or staying put. If it is growing, we can just execute a regular VACUUM with an additional option called TRUNCATE to handle it:

VACUUM (verbose, truncate true) [schema].[table name];

When you do this, you will see one additional message in the VACUUM output signifying that the VACUUM truncated the High Water Mark (HWM):

INFO:  table "large_table": truncated 302534 to 302233 pages

As I stated earlier, we can use pg_freespacemap to see if we actually need to worry about the High Water Mark (HWM) growing. I could have taken a lot of time to write a script to figure it out, but instead, I enlisted Google Gemini to see what it would come up with. After a few iterations, the output was nearly perfect!

CREATE EXTENSION pg_freespacemap;

CREATE OR REPLACE FUNCTION show_empty_pages(p_table_name TEXT)
RETURNS VOID AS $$
DECLARE
    -- Core processing variables
    table_oid_regclass  REGCLASS;
    block_size          BIGINT;
    fsm_granularity     BIGINT;
    max_fsm_free_space  BIGINT;
    total_pages         BIGINT;
    high_water_mark     BIGINT := 0;

    -- Variables for the final summary
    first_empty_block   BIGINT;
    free_pages_at_end   BIGINT;
    free_space_at_end   TEXT;
BEGIN
    -- Setup
    table_oid_regclass := p_table_name::regclass;
    block_size  := current_setting('block_size')::bigint;
    SELECT relpages INTO total_pages FROM pg_class WHERE oid = table_oid_regclass;
    fsm_granularity    := block_size / 256;
    max_fsm_free_space := floor((block_size - 24) / fsm_granularity) * fsm_granularity;

    --------------------------------------------------------------------------------
    -- PASS 1: FIND THE HIGH-WATER MARK (last page with data)
    --------------------------------------------------------------------------------
    FOR i IN REVERSE (total_pages - 1)..0 LOOP
        IF pg_freespace(table_oid_regclass, i) < max_fsm_free_space THEN
            high_water_mark := i;
            EXIT;
        END IF;
    END LOOP;

    --------------------------------------------------------------------------------
    -- FINAL STEP: CALCULATE AND RAISE THE SUMMARY NOTICE
    --------------------------------------------------------------------------------
    first_empty_block := high_water_mark + 1;
    free_pages_at_end := total_pages - first_empty_block;
    IF free_pages_at_end < 0 THEN
        free_pages_at_end := 0;
    END IF;
    free_space_at_end := pg_size_pretty(free_pages_at_end * block_size);

    RAISE NOTICE '-------------------------------------------------------------';
    RAISE NOTICE 'Summary for table: %', p_table_name;
    RAISE NOTICE '-------------------------------------------------------------';
    RAISE NOTICE 'The High Water Mark (HWM) is at page: %', total_pages;
    IF total_pages <> first_empty_block THEN
    	RAISE NOTICE 'First potentially empty page is at: %', first_empty_block;
    	RAISE NOTICE 'Total Pages in Table: %', total_pages;
    	RAISE NOTICE 'Number of potentially truncatable pages at the end: %', free_pages_at_end;
    	RAISE NOTICE 'Amount of free space at the end of the table: %', free_space_at_end;
    ELSE
    	RAISE NOTICE 'There are no empty pages to truncate';
    END IF;
    RAISE NOTICE '-------------------------------------------------------------';
END;
$$ LANGUAGE plpgsql;

This handy script could be periodically executed to check the High Water Mark (HWM) and will produce the following output:

(postgres@10.3.1.17:5432) [postgres] > SELECT * FROM show_empty_pages('public.large_table');
NOTICE:  -------------------------------------------------------------
NOTICE:  Summary for table: public.large_table
NOTICE:  -------------------------------------------------------------
NOTICE:  The High Water Mark (HWM) is at page: 302534
NOTICE:  First potentially empty page is at: 302233
NOTICE:  Total Pages in Table: 302534
NOTICE:  Number of potentially truncatable pages at the end: 301
NOTICE:  Amount of free space at the end of the table: 2408 kB
NOTICE:  -------------------------------------------------------------

If there is no freespace after the last full block the output will look like this:

NOTICE:  -------------------------------------------------------------
NOTICE:  Summary for table: public.large_table
NOTICE:  -------------------------------------------------------------
NOTICE:  The High Water Mark (HWM) is at page: 302233
NOTICE:  There are no empty pages to truncate
NOTICE:  -------------------------------------------------------------

So while there is no right answer on how to deal with this, ensure you know the implications of each step in the process. In this case, we have decided to turn the “vacuum_truncation” option to false, but maybe another option might be to tune vacuum in another way such as either making it more or less frequent. Always evaluate your own situation, but in any case it’s always good to know what happens in your database when certain commands are executed.

Enjoy!

Use of “pgtt” Extension in Self Managed vs. Cloud Products

There are several benefits to using temp tables in SQL DML operations. Given my long relationship and work on Oracle databases, I am highly familiar with the Oracle style “Global Temporary Tables” and have used them to solve many performance issues when complex joins are involved. That’s why I was excited when Gilles Darold released the extension “pgtt” for PostgreSQL. I was even more excited when Google released the extension for use in the Google CloudSQL for Postgres and AlloyDB for Postgres products.

In my opinion, native Postgres temporary tables leave a lot to be desired. For example, each table has to be created upon use. Nothing is persistent. This takes time and leads to Postgres catalog bloat (looking to do a post on this soon). That being said, in self managed instances, ‘pgtt’ is very easy to use. You can load the extension either at connection time by user, or for anyone that connects to a database. For example (make sure the extension is created in the database before setting these):

alter role test_user SET session_preload_libraries to 'pgtt';

alter database test_pgtt SET session_preload_libraries = 'pgtt';

This will, by default, use the ‘pgtt’ version of temporary tables instead of the default temporary tables Postgres offers. You can then create ‘pgtt’ style tables and refer to them via the pgtt_schema with no further modifications.

To revert back in the session, you must “disconnect” the session alter the user / database to revert the behavior to “old” style temporary tables. To my knowledge, if the library is enabled this way there is no way to use both styles of temporary tables at the same time.

alter role test_user RESET session_preload_libraries;

alter database pgtt_test RESET session_preload_libraries;

The Set Up…..

/* set flag "pgtt.enabled" on alloydb / cloudsql postgres console  to on */

/* create the extension in every database you need to use global temp tables */
CREATE EXTENSION pgtt;
LOAD 'pgtt';

/* Additional extensions needed for this test */
CREATE EXTENSION "uuid-ossp";

/* create the persistent global temporary table */
CREATE /*GLOBAL*/ TEMPORARY TABLE test_gtt_table (
id integer,
lbl text) ON COMMIT DELETE ROWS;

/* create an index on the global temp table */
CREATE INDEX ON pgtt_schema.test_gtt_table (id);

Use of ‘pgtt’ in Cloud Database Products

When you look at managed Cloud databases (from any vendor), you have to understand that some items need to be locked down in order to safeguard against the “managed” nature of the product. Unfortunately (as of this writing), at least in Google Cloud Postgres products, you cannot manipulate the parameter “session_preload_libraries”. So then you ask “How can I programmatically use the extension”? As the time of this writing, I have found only two ways. Either the user has to explicitly execute “LOAD ‘pgtt'” at the beginning / at some point in their session or the “LOAD” must be embedded in a function / procedure similar to the following:

/* function definition with loading of extension as a work around */
create or replace function public.test_gtt_function(P_IN_IDS integer[])
	returns table (
		id integer,
		lbl text
	)
    language plpgsql 
AS $func$
BEGIN
LOAD 'pgtt';
	/* revised query below */
	INSERT INTO pgtt_schema.test_gtt_table
		select generate_series,substr(uuid_generate_v4()::text,1,15) FROM generate_series(1, 5000);
	RETURN QUERY
	SELECT mygtt.id,mygtt.lbl from pgtt_schema.test_gtt_table mygtt where mygtt.id = ANY (P_IN_IDS);
END
$func$;

As you can see in the above function, the “LOAD” directive is the first thing in the function. While I am not a fan of having to do it this way, there doesn’t seem to be a huge performance impact but one nonetheless that you must account for:

(postgres@##########:5432) [tpcc] > LOAD 'pgtt';
LOAD
Time: 3.811 ms
(postgres@##########:5432) [tpcc] > \watch 1
LOAD
Time: 2.922 ms
LOAD
Time: 1.219 ms
LOAD
Time: 1.335 ms
LOAD
Time: 1.300 ms
LOAD
Time: 1.282 ms
LOAD

One reason I do like the implementation within the function is that it is easier to remove the directive vs the directive being embedded within the code. Either way the right direction is to get the vendor to allow targeted values for this parameter. This will allow the most flexibility and the least amount of awareness and hassle when using this extension.

So What Happens If You Don’t Load The Library??

So if you do not load the library with one of the options above, the functions still works. But what happens? The table actually behaves as a “real” table and the data is never truncated… Even after disconnection:

create or replace function public.test_gtt_wo_load_function(P_IN_IDS integer[])
	returns table (
		id integer,
		lbl text
	)
    language plpgsql 
AS $func$
BEGIN
	/* revised query below */
	INSERT INTO pgtt_schema.test_gtt_table
		select generate_series,substr(uuid_generate_v4()::text,1,15) FROM generate_series(1, 5000);
	RETURN QUERY
	SELECT mygtt.id,mygtt.lbl from pgtt_schema.test_gtt_table mygtt where mygtt.id = ANY (P_IN_IDS);
END
$func$;

(postgres@#######:5432) [tpcc] > \set ids ARRAY[3,28,69,235,389,512,678,899]
(postgres@#######:5432) [tpcc] > select * from public.test_gtt_wo_load_function(:ids);
 id  |       lbl
-----+-----------------
   3 | 2f82b04f-db13-4
  28 | 49dfea9a-5c41-4
  69 | d2d27343-a8d4-4
 235 | d5f32cd9-9495-4
 389 | 69842fc5-f0e5-4
 512 | e52c1625-0fab-4
 678 | 0fd320a9-8f08-4
 899 | 452a5f95-6e16-4
(8 rows)

(postgres@#######:5432) [tpcc] > select * from public.test_gtt_wo_load_function(:ids);
 id  |       lbl
-----+-----------------
   3 | 2f82b04f-db13-4
   3 | 632e097d-0f80-4
  28 | 49dfea9a-5c41-4
  28 | bf9ed2c4-14d2-4
  69 | d2d27343-a8d4-4
........
 899 | 452a5f95-6e16-4
 899 | d40c915f-04c7-4
(16 rows)

(postgres@#######:5432) [tpcc] > select * from public.test_gtt_wo_load_function(:ids);
 id  |       lbl
-----+-----------------
   3 | 2f82b04f-db13-4
   3 | 632e097d-0f80-4
   3 | 9cd65fda-ab4a-4
  28 | 49dfea9a-5c41-4
  28 | bf9ed2c4-14d2-4
  28 | 6cdd9a22-5c60-4
.........
 899 | 452a5f95-6e16-4
 899 | d40c915f-04c7-4
 899 | 0319d855-a935-4
(24 rows)

(postgres@#######:5432) [tpcc] > \q
shaneborden_google_com@replication-instance-1:/home/shaneborden_google_com $ psql -h ###### -p 5432 -U postgres -d tpcc
Password for user postgres:
(postgres@#######:5432) [tpcc] > \set ids ARRAY[3,28,69,235,389,512,678,899]
(postgres@#######:5432) [tpcc] > select * from public.test_gtt_wo_load_function(:ids);
 id  |       lbl
-----+-----------------
   3 | 82f66c20-f484-4
   3 | 01ad1b4d-2790-4
   3 | a3c38d05-fb7d-4
   3 | 7a4b2ff8-3e32-4
   3 | af8a96e1-194c-4
   3 | 02586796-1cc6-4
  28 | 62b1fb31-d3c0-4
  28 | 3acfe744-574f-4
  28 | 1e2d64fa-1106-4
  28 | d92a9321-4330-4
  28 | f3b1d45e-d4a7-4
  28 | 270146eb-1098-4
  69 | 506ad97c-fdad-4
  69 | bf3340d5-17a2-4
  69 | 47cbd9eb-83da-4
  69 | 487e8b22-4a05-4
  69 | b17314bc-5822-4
.......
 678 | a8d7dc45-c027-4
 678 | 17b666e5-1fc1-4
 678 | 4386eeb4-7b4e-4
 678 | 2942f25f-3d13-4
 678 | 3853bac0-e0ba-4
 678 | 95499bef-4440-4
 899 | 4503351c-4fe4-4
 899 | 900337bf-a658-4
 899 | 5f1f02f8-29a3-4
 899 | d635c921-8f9b-4
 899 | 0dd42f22-0f8e-4
 899 | 16bb8dcf-4e35-4
(48 rows)

Pretty crazy. So if for some reason the LOAD does not occur or fails, there could be an issue where the temp table has duplicate rows.

So what happens if the “pgtt” temp table was never created? You get a failure as you expect:

(postgres@#######:5432) [tpcc] > select * from public.test_gtt_wo_load_function(:ids);
ERROR:  relation "pgtt_schema.test_gtt_table" does not exist
LINE 1: INSERT INTO pgtt_schema.test_gtt_table
                    ^
QUERY:  INSERT INTO pgtt_schema.test_gtt_table
select generate_series,substr(uuid_generate_v4()::text,1,15) FROM generate_series(1, 5000)
CONTEXT:  PL/pgSQL function test_gtt_wo_load_function(integer[]) line 4 at SQL statement
Time: 5.204 ms

So it is important to ensure that if you intend to use “pgtt” you have a programmatic method to load the extension….. and in the meantime, I will be working on my end to get a cleaner way to load it in the managed products I have access to.