BLOGGERS NOTE… The original posting had verbiage where it wasn’t clear that I was referring to the planner not partition pruning vs the execution not partition pruning. Apologies for the error. The below post has been corrected:
Recently I had a customer come to me with a poorly performing query that had a perfectly fine execution time on Oracle, but once migrated to Postgres the query was slow. Upon investigation, it was found that the planning portion of the query was not pruning partitions causing a much longer planning time. That said even though the planning portion was not accomplishing partition pruning, the execution portion was (This is indicated by the portion of the explain plan notating “never executed”) .
The query had “WHERE” and “JOIN” clauses which explicitly specified partition keys and were joined on the partition key, so why wasn’t pruning happening during planning time?
The setup for the test is at the bottom of the blog post: Jump to Test Case Setup
Original Query
The tables in question are all partitioned by “hire_date” as as you can see in the “WHERE” clause below, the planner should project partition pruning because the “hire_date” from the driving table is equal to that of the joined table, yet the planner did not prune partitions:
WITH top_emp AS (
SELECT
DISTINCT
id,
first_name,
last_name,
hire_date,
birth_date
FROM
employees.employee_part emp
INNER JOIN employees.salary sal on (sal.employee_id = emp.id)
WHERE
emp.hire_date = '1985-01-01'::date
and emp.gender = 'M'
and emp.birth_date between '1957-01-01'::date and '1970-01-01'::date
and sal.amount > 50000
)
SELECT
dept.employee_id,
dept.department_id,
dept.hire_date
FROM
top_emp emp1,
employees.department_employee_part dept
WHERE
emp1.hire_date = dept.hire_date and
emp1.id = dept.employee_id;
/* Explain Plan */
Nested Loop (cost=1467.16..2646.20 rows=307 width=17) (actual time=4.565..4.613 rows=4 loops=1)
Output: dept.employee_id, dept.department_id, dept.hire_date
Buffers: shared hit=330
-> Unique (cost=1466.87..1471.11 rows=37 width=31) (actual time=4.537..4.560 rows=4 loops=1)
Output: emp.id, emp.first_name, emp.last_name, emp.hire_date, emp.birth_date
Buffers: shared hit=318
-> Sort (cost=1466.87..1467.72 rows=339 width=31) (actual time=4.534..4.539 rows=69 loops=1)
Output: emp.id, emp.first_name, emp.last_name, emp.hire_date, emp.birth_date
Sort Key: emp.id, emp.first_name, emp.last_name, emp.birth_date
Sort Method: quicksort Memory: 30kB
Buffers: shared hit=318
-> Nested Loop (cost=0.43..1452.62 rows=339 width=31) (actual time=1.524..4.450 rows=69 loops=1)
Output: emp.id, emp.first_name, emp.last_name, emp.hire_date, emp.birth_date
Buffers: shared hit=309
-> Seq Scan on employees.employee_part_p1985 emp (cost=0.00..999.32 rows=37 width=31) (actual time=1.500..4.380 rows=4 loops=1)
Output: emp.id, emp.first_name, emp.last_name, emp.hire_date, emp.birth_date
Filter: ((emp.birth_date >= '1957-01-01'::date) AND (emp.birth_date <= '1970-01-01'::date) AND (emp.hire_date = '1985-01-01'::date) AND (emp.gender = 'M'::employee_gender))
Rows Removed by Filter: 35312
Buffers: shared hit=293
-> Index Scan using idx_16991_primary on employees.salary sal (cost=0.43..12.16 rows=9 width=8) (actual time=0.010..0.014 rows=17 loops=4)
Output: sal.employee_id, sal.amount, sal.from_date, sal.to_date
Index Cond: (sal.employee_id = emp.id)
Filter: (sal.amount > 50000)
Rows Removed by Filter: 1
Buffers: shared hit=16
-> Append (cost=0.29..31.59 rows=16 width=17) (actual time=0.010..0.011 rows=1 loops=4)
Buffers: shared hit=12
-> Index Scan using contract_employees_p1985_pkey on employees.department_employee_part_p1985 dept_1 (cost=0.29..2.33 rows=1 width=17) (actual time=0.006..0.006 rows=1 loops=4)
Output: dept_1.employee_id, dept_1.department_id, dept_1.hire_date
Index Cond: (dept_1.employee_id = emp.id)
Filter: (emp.hire_date = dept_1.hire_date)
Buffers: shared hit=12
-> Index Scan using contract_employees_p1986_pkey on employees.department_employee_part_p1986 dept_2 (cost=0.29..2.36 rows=1 width=17) (never executed)
Output: dept_2.employee_id, dept_2.department_id, dept_2.hire_date
Index Cond: (dept_2.employee_id = emp.id)
Filter: (emp.hire_date = dept_2.hire_date)
-> Index Scan using contract_employees_p1987_pkey on employees.department_employee_part_p1987 dept_3 (cost=0.29..2.33 rows=1 width=17) (never executed)
Output: dept_3.employee_id, dept_3.department_id, dept_3.hire_date
Index Cond: (dept_3.employee_id = emp.id)
Filter: (emp.hire_date = dept_3.hire_date)
......... LOTS OF PARTITIONS ........
-> Index Scan using contract_employees_p2000_pkey on employees.department_employee_part_p2000 dept_16 (cost=0.14..0.24 rows=1 width=17) (never executed)
Output: dept_16.employee_id, dept_16.department_id, dept_16.hire_date
Index Cond: (dept_16.employee_id = emp.id)
Filter: (emp.hire_date = dept_16.hire_date)
Settings: effective_cache_size = '3256704kB', effective_io_concurrency = '128', random_page_cost = '1.1', search_path = '"$user", public, employees', temp_buffers = '128MB', work_mem = '256MB'
Query Identifier: -1847211568235447798
Planning:
Buffers: shared hit=1313
Planning Time: 4.111 ms
Execution Time: 5.176 ms
So if the partition keys are equal and the tables are partitioned the exact same way, then why is the planner not projecting partition pruning? The answer can be found in the Postgres Listserv by clicking here.
The planner can only push quals down into a subquery, it cannot pull
quals from a subquery into the outer query.
So I want (and need the planner to partition prune) so how can I get it? Both alternatives require re-writing the query although one is a little less invasive than the other. If you know that the partition keys are exactly equal and there can only be one value, then you can modify the the where clause to have an additional entry like this (notice the additional entry in the lower “where” clause):
WITH top_emp AS (
SELECT
DISTINCT
id,
first_name,
last_name,
hire_date,
birth_date
FROM
employees.employee_part emp
INNER JOIN employees.salary sal on (sal.employee_id = emp.id)
WHERE
emp.hire_date = '1985-01-01'::date
and emp.gender = 'M'
and emp.birth_date between '1957-01-01'::date and '1970-01-01'::date
and sal.amount > 50000
)
SELECT
dept.employee_id,
dept.department_id,
dept.hire_date
FROM
top_emp emp1,
employees.department_employee_part dept
WHERE
emp1.hire_date = dept.hire_date and
dept.hire_date = '1985-01-01'::date and
emp1.id = dept.employee_id;
Or, you could totally re-write the query like this so, just in case you could match multiple partitions in the other table, that situation is handled:
SELECT
employee_id,
department_id,
hire_date
FROM (
SELECT
dept.employee_id,
dept.department_id,
dept.hire_date,
count(1)
FROM
employees.employee_part emp
INNER JOIN employees.salary sal on (sal.employee_id = emp.id)
JOIN employees.department_employee_part dept on (emp.hire_date = dept.hire_date and emp.id = dept.employee_id)
WHERE
emp.hire_date = '1985-01-01'::date
and emp.gender = 'M'
and emp.birth_date between '1957-01-01'::date and '1970-01-01'::date
and sal.amount > 50000
GROUP BY
dept.employee_id,
dept.department_id,
dept.hire_date) a;
In this case, the following plan is used and partition pruning occurs. Both planning and execution time are significantly less as well:
Subquery Scan on a (cost=1273.80..1273.83 rows=1 width=17) (actual time=2.629..2.641 rows=4 loops=1)
Output: a.employee_id, a.department_id, a.hire_date
Buffers: shared hit=331
-> GroupAggregate (cost=1273.80..1273.82 rows=1 width=25) (actual time=2.628..2.639 rows=4 loops=1)
Output: dept.employee_id, dept.department_id, dept.hire_date, NULL::bigint
Group Key: dept.employee_id, dept.department_id, dept.hire_date
Buffers: shared hit=331
-> Sort (cost=1273.80..1273.80 rows=1 width=17) (actual time=2.620..2.624 rows=69 loops=1)
Output: dept.employee_id, dept.department_id, dept.hire_date
Sort Key: dept.employee_id, dept.department_id
Sort Method: quicksort Memory: 29kB
Buffers: shared hit=331
-> Nested Loop (cost=0.74..1273.79 rows=1 width=17) (actual time=0.836..2.604 rows=69 loops=1)
Output: dept.employee_id, dept.department_id, dept.hire_date
Buffers: shared hit=331
-> Nested Loop (cost=0.30..967.66 rows=43 width=25) (actual time=0.828..2.557 rows=4 loops=1)
Output: emp.id, dept.employee_id, dept.department_id, dept.hire_date
Buffers: shared hit=315
-> Seq Scan on employees.department_employee_part_p1985 dept (cost=0.00..776.50 rows=109 width=17) (actual time=0.801..2.515 rows=9 loops=1)
Output: dept.employee_id, dept.department_id, dept.hire_date
Filter: (dept.hire_date = '1985-01-01'::date)
Rows Removed by Filter: 39071
Buffers: shared hit=288
-> Memoize (cost=0.30..2.03 rows=1 width=12) (actual time=0.004..0.004 max time=0.020 rows=0 loops=9)
Output: emp.id, emp.hire_date
Cache Key: dept.employee_id
Cache Mode: logical
Hits: 0 Misses: 9 Evictions: 0 Overflows: 0 Memory Usage: 1kB
Buffers: shared hit=27
-> Index Scan using employee_part_p1985_pkey on employees.employee_part_p1985 emp (cost=0.29..2.02 rows=1 width=12) (actual time=0.002..0.003 max time=0.012 rows=0 loops=9)
Output: emp.id, emp.hire_date
Index Cond: (emp.id = dept.employee_id)
Filter: ((emp.birth_date >= '1957-01-01'::date) AND (emp.birth_date <= '1970-01-01'::date) AND (emp.hire_date = '1985-01-01'::date) AND (emp.gender = 'M'::employees.employee_gender))
Rows Removed by Filter: 1
Buffers: shared hit=27
-> Memoize (cost=0.44..8.11 rows=9 width=8) (actual time=0.004..0.009 max time=0.014 rows=17 loops=4)
Output: sal.employee_id
Cache Key: emp.id
Cache Mode: logical
Hits: 0 Misses: 4 Evictions: 0 Overflows: 0 Memory Usage: 3kB
Buffers: shared hit=16
-> Index Scan using idx_16991_primary on employees.salary sal (cost=0.43..8.10 rows=9 width=8) (actual time=0.003..0.006 max time=0.009 rows=17 loops=4)
Output: sal.employee_id
Index Cond: (sal.employee_id = emp.id)
Filter: (sal.amount > 50000)
Rows Removed by Filter: 1
Buffers: shared hit=16
Settings: effective_cache_size = '9830040kB', random_page_cost = '1.1', work_mem = '256MB'
Query Identifier: 2871877186944609726
Planning:
Buffers: shared hit=26
Planning Time: 0.460 ms
Execution Time: 2.707 ms
The Verdict
So as you can see it is very important to get the planner to project what you want. While ultimately the execution phase was not fully executed, it still had to look at the stats of the partition to decide not to execute it resulting in some wasted time. Also, keep in mind that this test case is a very simple one. The customer issue I worked on had a lot more partitions and a lot more data so the planning time being wasted was much higher resulting in a much higher impact.
Test Case Setup
To create a test case for this example, I used the employee database located on this Github site:
https://github.com/h8/employees-database
I then created some partitioned versions of the tables from the sample database as shown below:
-- Table: test_schema.contract_employees
CREATE TABLE employees.department_employee_part (
employee_id int8 NOT NULL,
department_id bpchar(4) NOT NULL,
from_date date NOT NULL,
to_date date NOT NULL,
hire_date date NOT NULL
) PARTITION BY RANGE (hire_date);
-- Partitions SQL
CREATE TABLE employees.department_employee_part_p1985 PARTITION OF employees.department_employee_part
(
CONSTRAINT department_employee_part_p1985_pkey PRIMARY KEY (employee_id, department_id)
)
FOR VALUES FROM ('1985-01-01') TO ('1986-01-01');
CREATE TABLE employees.department_employee_part_p1986 PARTITION OF employees.department_employee_part
(
CONSTRAINT department_employee_part_p1986_pkey PRIMARY KEY (employee_id, department_id)
)
FOR VALUES FROM ('1986-01-01') TO ('1987-01-01');
CREATE TABLE employees.department_employee_part_p1987 PARTITION OF employees.department_employee_part
(
CONSTRAINT department_employee_part_p1987_pkey PRIMARY KEY (employee_id, department_id)
)
FOR VALUES FROM ('1987-01-01') TO ('1988-01-01');
CREATE TABLE employees.department_employee_part_p1988 PARTITION OF employees.department_employee_part
(
CONSTRAINT department_employee_part_p1988_pkey PRIMARY KEY (employee_id, department_id)
)
FOR VALUES FROM ('1988-01-01') TO ('1989-01-01');
CREATE TABLE employees.department_employee_part_p1989 PARTITION OF employees.department_employee_part
(
CONSTRAINT department_employee_part_p1989_pkey PRIMARY KEY (employee_id, department_id)
)
FOR VALUES FROM ('1989-01-01') TO ('1990-01-01');
CREATE TABLE employees.department_employee_part_p1990 PARTITION OF employees.department_employee_part
(
CONSTRAINT department_employee_part_p1990_pkey PRIMARY KEY (employee_id, department_id)
)
FOR VALUES FROM ('1990-01-01') TO ('1991-01-01');
CREATE TABLE employees.department_employee_part_p1991 PARTITION OF employees.department_employee_part
(
CONSTRAINT department_employee_part_p1991_pkey PRIMARY KEY (employee_id, department_id)
)
FOR VALUES FROM ('1991-01-01') TO ('1992-01-01');
CREATE TABLE employees.department_employee_part_p1992 PARTITION OF employees.department_employee_part
(
CONSTRAINT department_employee_part_p1992_pkey PRIMARY KEY (employee_id, department_id)
)
FOR VALUES FROM ('1992-01-01') TO ('1993-01-01');
CREATE TABLE employees.department_employee_part_p1993 PARTITION OF employees.department_employee_part
(
CONSTRAINT department_employee_part_p1993_pkey PRIMARY KEY (employee_id, department_id)
)
FOR VALUES FROM ('1993-01-01') TO ('1994-01-01');
CREATE TABLE employees.department_employee_part_p1994 PARTITION OF employees.department_employee_part
(
CONSTRAINT department_employee_part_p1994_pkey PRIMARY KEY (employee_id, department_id)
)
FOR VALUES FROM ('1994-01-01') TO ('1995-01-01');
CREATE TABLE employees.department_employee_part_p1995 PARTITION OF employees.department_employee_part
(
CONSTRAINT department_employee_part_p1995_pkey PRIMARY KEY (employee_id, department_id)
)
FOR VALUES FROM ('1995-01-01') TO ('1996-01-01');
CREATE TABLE employees.department_employee_part_p1996 PARTITION OF employees.department_employee_part
(
CONSTRAINT department_employee_part_p1996_pkey PRIMARY KEY (employee_id, department_id)
)
FOR VALUES FROM ('1996-01-01') TO ('1997-01-01');
CREATE TABLE employees.department_employee_part_p1997 PARTITION OF employees.department_employee_part
(
CONSTRAINT department_employee_part_p1997_pkey PRIMARY KEY (employee_id, department_id)
)
FOR VALUES FROM ('1997-01-01') TO ('1998-01-01');
CREATE TABLE employees.department_employee_part_p1998 PARTITION OF employees.department_employee_part
(
CONSTRAINT department_employee_part_p1998_pkey PRIMARY KEY (employee_id, department_id)
)
FOR VALUES FROM ('1998-01-01') TO ('1999-01-01');
CREATE TABLE employees.department_employee_part_p1999 PARTITION OF employees.department_employee_part
(
CONSTRAINT department_employee_part_p1999_pkey PRIMARY KEY (employee_id, department_id)
)
FOR VALUES FROM ('1999-01-01') TO ('2000-01-01');
CREATE TABLE employees.department_employee_part_p2000 PARTITION OF employees.department_employee_part
(
CONSTRAINT department_employee_part_p2000_pkey PRIMARY KEY (employee_id, department_id)
)
FOR VALUES FROM ('2000-01-01') TO ('2001-01-01');
CREATE INDEX ON employees.department_employee_part USING btree (department_id);
insert into employees.department_employee_part
select de.*,emp.hire_date
from employees.employee emp join employees.department_employee de on (emp.id = de.employee_id);
vacuum (verbose, analyze) employees.department_employee_part;
CREATE TABLE employees.employee_part (
id int8 DEFAULT nextval('employees.id_employee_seq'::regclass) NOT NULL,
birth_date date NOT NULL,
first_name varchar(14) NOT NULL,
last_name varchar(16) NOT NULL,
gender employees.employee_gender NOT NULL,
hire_date date NOT NULL
) PARTITION BY RANGE (hire_date);
CREATE TABLE employees.employee_part_p1985 PARTITION OF employees.employee_part
(
CONSTRAINT employee_part_p1985_pkey PRIMARY KEY (id)
)
FOR VALUES FROM ('1985-01-01') TO ('1986-01-01');
CREATE TABLE employees.employee_part_p1986 PARTITION OF employees.employee_part
(
CONSTRAINT employee_part_p1986_pkey PRIMARY KEY (id)
)
FOR VALUES FROM ('1986-01-01') TO ('1987-01-01');
CREATE TABLE employees.employee_part_p1987 PARTITION OF employees.employee_part
(
CONSTRAINT employee_part_p1987_pkey PRIMARY KEY (id)
)
FOR VALUES FROM ('1987-01-01') TO ('1988-01-01');
CREATE TABLE employees.employee_part_p1988 PARTITION OF employees.employee_part
(
CONSTRAINT employee_part_p1988_pkey PRIMARY KEY (id)
)
FOR VALUES FROM ('1988-01-01') TO ('1989-01-01');
CREATE TABLE employees.employee_part_p1989 PARTITION OF employees.employee_part
(
CONSTRAINT employee_part_p1989_pkey PRIMARY KEY (id)
)
FOR VALUES FROM ('1989-01-01') TO ('1990-01-01');
CREATE TABLE employees.employee_part_p1990 PARTITION OF employees.employee_part
(
CONSTRAINT employee_part_p1990_pkey PRIMARY KEY (id)
)
FOR VALUES FROM ('1990-01-01') TO ('1991-01-01');
CREATE TABLE employees.employee_part_p1991 PARTITION OF employees.employee_part
(
CONSTRAINT employee_part_p1991_pkey PRIMARY KEY (id)
)
FOR VALUES FROM ('1991-01-01') TO ('1992-01-01');
CREATE TABLE employees.employee_part_p1992 PARTITION OF employees.employee_part
(
CONSTRAINT employee_part_p1992_pkey PRIMARY KEY (id)
)
FOR VALUES FROM ('1992-01-01') TO ('1993-01-01');
CREATE TABLE employees.employee_part_p1993 PARTITION OF employees.employee_part
(
CONSTRAINT employee_part_p1993_pkey PRIMARY KEY (id)
)
FOR VALUES FROM ('1993-01-01') TO ('1994-01-01');
CREATE TABLE employees.employee_part_p1994 PARTITION OF employees.employee_part
(
CONSTRAINT employee_part_p1994_pkey PRIMARY KEY (id)
)
FOR VALUES FROM ('1994-01-01') TO ('1995-01-01');
CREATE TABLE employees.employee_part_p1995 PARTITION OF employees.employee_part
(
CONSTRAINT employee_part_p1995_pkey PRIMARY KEY (id)
)
FOR VALUES FROM ('1995-01-01') TO ('1996-01-01');
CREATE TABLE employees.employee_part_p1996 PARTITION OF employees.employee_part
(
CONSTRAINT employee_part_p1996_pkey PRIMARY KEY (id)
)
FOR VALUES FROM ('1996-01-01') TO ('1997-01-01');
CREATE TABLE employees.employee_part_p1997 PARTITION OF employees.employee_part
(
CONSTRAINT employee_part_p1997_pkey PRIMARY KEY (id)
)
FOR VALUES FROM ('1997-01-01') TO ('1998-01-01');
CREATE TABLE employees.employee_part_p1998 PARTITION OF employees.employee_part
(
CONSTRAINT employee_part_p1998_pkey PRIMARY KEY (id)
)
FOR VALUES FROM ('1998-01-01') TO ('1999-01-01');
CREATE TABLE employees.employee_part_p1999 PARTITION OF employees.employee_part
(
CONSTRAINT employee_part_p1999_pkey PRIMARY KEY (id)
)
FOR VALUES FROM ('1999-01-01') TO ('2000-01-01');
CREATE TABLE employees.employee_part_p2000 PARTITION OF employees.employee_part
(
CONSTRAINT employee_part_p2000_pkey PRIMARY KEY (id)
)
FOR VALUES FROM ('2000-01-01') TO ('2001-01-01');
insert into employees.employee_part select * from employees.employee;
vacuum (verbose, analyze) employees.employee_part;
