Extension | 🛩️ Shane Borden's Technology Blog

CREATE EXTENSION pg_freespacemap; CREATE OR REPLACE FUNCTION show_empty_pages(p_table_name TEXT) RETURNS VOID AS $$ DECLARE -- Core processing variables table_oid_regclass REGCLASS; block_size BIGINT; fsm_granularity BIGINT; max_fsm_free_space BIGINT; total_pages BIGINT; high_water_mark BIGINT := 0; -- Variables for the final summary first_empty_block BIGINT; free_pages_at_end BIGINT; free_space_at_end TEXT; BEGIN -- Setup table_oid_regclass := p_table_name::regclass; block_size := current_setting('block_size')::bigint; SELECT relpages INTO total_pages FROM pg_class WHERE oid = table_oid_regclass; fsm_granularity := block_size / 256; max_fsm_free_space := floor((block_size - 24) / fsm_granularity) * fsm_granularity; -------------------------------------------------------------------------------- -- PASS 1: FIND THE HIGH-WATER MARK (last page with data) -------------------------------------------------------------------------------- FOR i IN REVERSE (total_pages - 1)..0 LOOP IF pg_freespace(table_oid_regclass, i) < max_fsm_free_space THEN high_water_mark := i; EXIT; END IF; END LOOP; -------------------------------------------------------------------------------- -- FINAL STEP: CALCULATE AND RAISE THE SUMMARY NOTICE -------------------------------------------------------------------------------- first_empty_block := high_water_mark + 1; free_pages_at_end := total_pages - first_empty_block; IF free_pages_at_end < 0 THEN free_pages_at_end := 0; END IF; free_space_at_end := pg_size_pretty(free_pages_at_end * block_size); RAISE NOTICE '-------------------------------------------------------------'; RAISE NOTICE 'Summary for table: %', p_table_name; RAISE NOTICE '-------------------------------------------------------------'; RAISE NOTICE 'The High Water Mark (HWM) is at page: %', total_pages; IF total_pages <> first_empty_block THEN RAISE NOTICE 'First potentially empty page is at: %', first_empty_block; RAISE NOTICE 'Total Pages in Table: %', total_pages; RAISE NOTICE 'Number of potentially truncatable pages at the end: %', free_pages_at_end; RAISE NOTICE 'Amount of free space at the end of the table: %', free_space_at_end; ELSE RAISE NOTICE 'There are no empty pages to truncate'; END IF; RAISE NOTICE '-------------------------------------------------------------'; END; $$ LANGUAGE plpgsql;

(postgres@10.3.1.17:5432) [postgres] > SELECT * FROM show_empty_pages('public.large_table'); NOTICE: ------------------------------------------------------------- NOTICE: Summary for table: public.large_table NOTICE: ------------------------------------------------------------- NOTICE: The High Water Mark (HWM) is at page: 302534 NOTICE: First potentially empty page is at: 302233 NOTICE: Total Pages in Table: 302534 NOTICE: Number of potentially truncatable pages at the end: 301 NOTICE: Amount of free space at the end of the table: 2408 kB NOTICE: -------------------------------------------------------------

NOTICE: ------------------------------------------------------------- NOTICE: Summary for table: public.large_table NOTICE: ------------------------------------------------------------- NOTICE: The High Water Mark (HWM) is at page: 302233 NOTICE: There are no empty pages to truncate NOTICE: -------------------------------------------------------------

There are several benefits to using temp tables in SQL DML operations. Given my long relationship and work on Oracle databases, I am highly familiar with the Oracle style “Global Temporary Tables” and have used them to solve many performance issues when complex joins are involved. That’s why I was excited when Gilles Darold released the extension “pgtt” for PostgreSQL. I was even more excited when Google released the extension for use in the Google CloudSQL for Postgres and AlloyDB for Postgres products.

In my opinion, native Postgres temporary tables leave a lot to be desired. For example, each table has to be created upon use. Nothing is persistent. This takes time and leads to Postgres catalog bloat (looking to do a post on this soon). That being said, in self managed instances, ‘pgtt’ is very easy to use. You can load the extension either at connection time by user, or for anyone that connects to a database. For example (make sure the extension is created in the database before setting these):

alter role test_user SET session_preload_libraries to 'pgtt';

alter database test_pgtt SET session_preload_libraries = 'pgtt';

This will, by default, use the ‘pgtt’ version of temporary tables instead of the default temporary tables Postgres offers. You can then create ‘pgtt’ style tables and refer to them via the pgtt_schema with no further modifications.

To revert back in the session, you must “disconnect” the session alter the user / database to revert the behavior to “old” style temporary tables. To my knowledge, if the library is enabled this way there is no way to use both styles of temporary tables at the same time.

alter role test_user RESET session_preload_libraries;

alter database pgtt_test RESET session_preload_libraries;

The Set Up…..

/* set flag "pgtt.enabled" on alloydb / cloudsql postgres console  to on */

/* create the extension in every database you need to use global temp tables */
CREATE EXTENSION pgtt;
LOAD 'pgtt';

/* Additional extensions needed for this test */
CREATE EXTENSION "uuid-ossp";

/* create the persistent global temporary table */
CREATE /*GLOBAL*/ TEMPORARY TABLE test_gtt_table (
id integer,
lbl text) ON COMMIT DELETE ROWS;

/* create an index on the global temp table */
CREATE INDEX ON pgtt_schema.test_gtt_table (id);

Use of ‘pgtt’ in Cloud Database Products

When you look at managed Cloud databases (from any vendor), you have to understand that some items need to be locked down in order to safeguard against the “managed” nature of the product. Unfortunately (as of this writing), at least in Google Cloud Postgres products, you cannot manipulate the parameter “session_preload_libraries”. So then you ask “How can I programmatically use the extension”? As the time of this writing, I have found only two ways. Either the user has to explicitly execute “LOAD ‘pgtt'” at the beginning / at some point in their session or the “LOAD” must be embedded in a function / procedure similar to the following:

/* function definition with loading of extension as a work around */
create or replace function public.test_gtt_function(P_IN_IDS integer[])
	returns table (
		id integer,
		lbl text
	)
    language plpgsql 
AS $func$
BEGIN
LOAD 'pgtt';
	/* revised query below */
	INSERT INTO pgtt_schema.test_gtt_table
		select generate_series,substr(uuid_generate_v4()::text,1,15) FROM generate_series(1, 5000);
	RETURN QUERY
	SELECT mygtt.id,mygtt.lbl from pgtt_schema.test_gtt_table mygtt where mygtt.id = ANY (P_IN_IDS);
END
$func$;

As you can see in the above function, the “LOAD” directive is the first thing in the function. While I am not a fan of having to do it this way, there doesn’t seem to be a huge performance impact but one nonetheless that you must account for:

(postgres@##########:5432) [tpcc] > LOAD 'pgtt';
LOAD
Time: 3.811 ms
(postgres@##########:5432) [tpcc] > \watch 1
LOAD
Time: 2.922 ms
LOAD
Time: 1.219 ms
LOAD
Time: 1.335 ms
LOAD
Time: 1.300 ms
LOAD
Time: 1.282 ms
LOAD

One reason I do like the implementation within the function is that it is easier to remove the directive vs the directive being embedded within the code. Either way the right direction is to get the vendor to allow targeted values for this parameter. This will allow the most flexibility and the least amount of awareness and hassle when using this extension.

So What Happens If You Don’t Load The Library??

So if you do not load the library with one of the options above, the functions still works. But what happens? The table actually behaves as a “real” table and the data is never truncated… Even after disconnection:

create or replace function public.test_gtt_wo_load_function(P_IN_IDS integer[])
	returns table (
		id integer,
		lbl text
	)
    language plpgsql 
AS $func$
BEGIN
	/* revised query below */
	INSERT INTO pgtt_schema.test_gtt_table
		select generate_series,substr(uuid_generate_v4()::text,1,15) FROM generate_series(1, 5000);
	RETURN QUERY
	SELECT mygtt.id,mygtt.lbl from pgtt_schema.test_gtt_table mygtt where mygtt.id = ANY (P_IN_IDS);
END
$func$;

(postgres@#######:5432) [tpcc] > \set ids ARRAY[3,28,69,235,389,512,678,899]
(postgres@#######:5432) [tpcc] > select * from public.test_gtt_wo_load_function(:ids);
 id  |       lbl
-----+-----------------
   3 | 2f82b04f-db13-4
  28 | 49dfea9a-5c41-4
  69 | d2d27343-a8d4-4
 235 | d5f32cd9-9495-4
 389 | 69842fc5-f0e5-4
 512 | e52c1625-0fab-4
 678 | 0fd320a9-8f08-4
 899 | 452a5f95-6e16-4
(8 rows)

(postgres@#######:5432) [tpcc] > select * from public.test_gtt_wo_load_function(:ids);
 id  |       lbl
-----+-----------------
   3 | 2f82b04f-db13-4
   3 | 632e097d-0f80-4
  28 | 49dfea9a-5c41-4
  28 | bf9ed2c4-14d2-4
  69 | d2d27343-a8d4-4
........
 899 | 452a5f95-6e16-4
 899 | d40c915f-04c7-4
(16 rows)

(postgres@#######:5432) [tpcc] > select * from public.test_gtt_wo_load_function(:ids);
 id  |       lbl
-----+-----------------
   3 | 2f82b04f-db13-4
   3 | 632e097d-0f80-4
   3 | 9cd65fda-ab4a-4
  28 | 49dfea9a-5c41-4
  28 | bf9ed2c4-14d2-4
  28 | 6cdd9a22-5c60-4
.........
 899 | 452a5f95-6e16-4
 899 | d40c915f-04c7-4
 899 | 0319d855-a935-4
(24 rows)

(postgres@#######:5432) [tpcc] > \q
shaneborden_google_com@replication-instance-1:/home/shaneborden_google_com $ psql -h ###### -p 5432 -U postgres -d tpcc
Password for user postgres:
(postgres@#######:5432) [tpcc] > \set ids ARRAY[3,28,69,235,389,512,678,899]
(postgres@#######:5432) [tpcc] > select * from public.test_gtt_wo_load_function(:ids);
 id  |       lbl
-----+-----------------
   3 | 82f66c20-f484-4
   3 | 01ad1b4d-2790-4
   3 | a3c38d05-fb7d-4
   3 | 7a4b2ff8-3e32-4
   3 | af8a96e1-194c-4
   3 | 02586796-1cc6-4
  28 | 62b1fb31-d3c0-4
  28 | 3acfe744-574f-4
  28 | 1e2d64fa-1106-4
  28 | d92a9321-4330-4
  28 | f3b1d45e-d4a7-4
  28 | 270146eb-1098-4
  69 | 506ad97c-fdad-4
  69 | bf3340d5-17a2-4
  69 | 47cbd9eb-83da-4
  69 | 487e8b22-4a05-4
  69 | b17314bc-5822-4
.......
 678 | a8d7dc45-c027-4
 678 | 17b666e5-1fc1-4
 678 | 4386eeb4-7b4e-4
 678 | 2942f25f-3d13-4
 678 | 3853bac0-e0ba-4
 678 | 95499bef-4440-4
 899 | 4503351c-4fe4-4
 899 | 900337bf-a658-4
 899 | 5f1f02f8-29a3-4
 899 | d635c921-8f9b-4
 899 | 0dd42f22-0f8e-4
 899 | 16bb8dcf-4e35-4
(48 rows)

Pretty crazy. So if for some reason the LOAD does not occur or fails, there could be an issue where the temp table has duplicate rows.

So what happens if the “pgtt” temp table was never created? You get a failure as you expect:

(postgres@#######:5432) [tpcc] > select * from public.test_gtt_wo_load_function(:ids);
ERROR:  relation "pgtt_schema.test_gtt_table" does not exist
LINE 1: INSERT INTO pgtt_schema.test_gtt_table
                    ^
QUERY:  INSERT INTO pgtt_schema.test_gtt_table
select generate_series,substr(uuid_generate_v4()::text,1,15) FROM generate_series(1, 5000)
CONTEXT:  PL/pgSQL function test_gtt_wo_load_function(integer[]) line 4 at SQL statement
Time: 5.204 ms

So it is important to ensure that if you intend to use “pgtt” you have a programmatic method to load the extension….. and in the meantime, I will be working on my end to get a cleaner way to load it in the managed products I have access to.

🛩️ Shane Borden's Technology Blog

My Experiences Navigating Technology – All Views Are My Own

Category Archives: Extension

Understanding High Water Mark Locking Issues in PostgreSQL Vacuums

Use of “pgtt” Extension in Self Managed vs. Cloud Products

The Set Up…..

Use of ‘pgtt’ in Cloud Database Products

So What Happens If You Don’t Load The Library??

Share this:

The Set Up…..

Use of ‘pgtt’ in Cloud Database Products

So What Happens If You Don’t Load The Library??

Share this: