On Github robuye / ghg-postgres-plans-in-ruby
Robert Ulejczyk
Limit (cost=251.68..334.94 rows=10 width=1634) -> Nested Loop (cost=168.43..5938.23 rows=693 width=1634) -> Bitmap Heap Scan on users (cost=168.00..2804.95 rows=711 width=1634) Recheck Cond: ((lower((username)::text) ~~ 'truerob'::text) OR ((email)::text ~~ 'truerob'::text)) Filter: ((lower((username)::text) ~~ 'truerob'::text) OR ((email)::text ~~ 'truerob'::text)) -> BitmapOr (cost=168.00..168.00 rows=711 width=0) -> Bitmap Index Scan on index_users_lowercase_username (cost=0.00..4.44 rows=1 width=0) Index Cond: (lower((username)::text) = 'truerob'::text) -> Bitmap Index Scan on index_users_email_text_pattern_ops (cost=0.00..4.56 rows=1 width=0) Index Cond: ((email)::text = 'truerob'::text) -> Index Only Scan using index_profiles_on_user_id on profiles (cost=0.43..4.40 rows=1 width=4) Index Cond: (user_id = users.id)
world=# explain select * from city inner join country on country.code = city.countrycode; QUERY PLAN ----------------------------------------------------------------------- Hash Join (cost=10.38..173.65 rows=4180 width=143) Hash Cond: (city.countrycode = country.code) -> Seq Scan on city (cost=0.00..105.80 rows=4180 width=30) -> Hash (cost=7.39..7.39 rows=239 width=113) -> Seq Scan on country (cost=0.00..7.39 rows=239 width=113)
world=# explain (costs false) select * from city ci inner join country co on co.code = ci.countrycode inner join countrylanguage l on co.code = l.countrycode; QUERY PLAN ---------------------------------------------------- Hash Join Hash Cond: (ci.countrycode = co.code) -> Seq Scan on city ci -> Hash -> Hash Join Hash Cond: (l.countrycode = co.code) -> Seq Scan on countrylanguage l -> Hash -> Seq Scan on country co
Format in YAML if text is not readable.
world=# explain (format yaml, costs false) select * from city inner join country on country.code = city.countrycode; QUERY PLAN ---------------------------------------------------- - Plan: Node Type: "Hash Join" Join Type: "Inner" Hash Cond: "(city.countrycode = country.code)" Plans: - Node Type: "Seq Scan" Parent Relationship: "Outer" Relation Name: "city" Alias: "city" - Node Type: "Hash" Parent Relationship: "Inner" Plans: - Node Type: "Seq Scan" Parent Relationship: "Outer" Relation Name: "country" Alias: "country"
What if I have no join but Postgres plan shows join?
world=# explain (costs false) select * from city, country where city.countrycode = country.code; ------------------------------------------------ Hash Join Hash Cond: (city.countrycode = country.code) -> Seq Scan on city -> Hash -> Seq Scan on country (5 rows)
Technically this is a join.
Postgres knows better.
Accept it and move on.
Countries:
@countries = [ { code: "NLD", name: "Netherlands" }, { code: "ALB", name: "Albania" }, { code: "AND", name: "Andorra" }, { code: "BEL", name: "Belgium" }, { code: "GER", name: "Germany" }, # No match { code: "BIH", name: "Bosnia and Herzegovina" } ]
Cities:
@cities = [ { country_code: "ALB", name: "Tirana" }, { country_code: "AND", name: "Andorra la Vella" }, { country_code: "BEL", name: "Schaerbeek" }, { country_code: "BEL", name: "Brugge" }, { country_code: "BIH", name: "Zenica" }, { country_code: "BIH", name: "Banja Luka" }, { country_code: "BIH", name: "Sarajevo" }, { country_code: "NLD", name: "Alkmaar" }, { country_code: "NLD", name: "Ede" }, { country_code: "POL", name: "Krakow" } # No match ]
Cities index on code:
@cities_code_index = @cities.inject({}) do |memo, city| memo[city[:country_code]] ||= [] memo[city[:country_code]] << @cities.index(city) memo end
Cities index on name:
@cities_name_index = @cities.inject({}) do |memo, city| memo[city[:name]] ||= [] memo[city[:name]] << @cities.index(city) memo end
SELECT * FROM city WHERE country_code = 'BEL';
Three methods are supported:
Reads the data directly from the disk page by page
results = [] i = 0 @cities.each do |city| i = i + 1 results << city if city[:country_code] == 'BEL' end
Reads the index directly from the disk sequentially, then reads individual records from the heap using random access
results = [] i = 0 @cities_code_index['BEL'].each do |row_position| i = i + 1 results << @cities[row_position] end
Reads whole index, builds a bitmap, sorts it and then reads the heap
SELECT * FROM city WHERE country_code = 'BEL' OR name = 'Tirana' OR name = 'Brugge';
name_bitmap_tirana = @cities_name_index['Tirana'] name_bitmap_brugge = @cities_name_index['Brugge'] code_bitmap_bel = @cities_code_index['BEL'] merged_bitmap = name_bitmap_tirana | name_bitmap_brugge | code_bitmap_bel sorted_bitmap = merged_bitmap.sort sorted_bitmap.each do |row_position| i = i + 1 results << @cities[row_position] end
EXPLAIN SELECT * FROM countrylanguage WHERE language = '--language--';
But first create an index
CREATE INDEX idx_countrylanguage_language ON countrylanguage (language text_pattern_ops); ANALYZE;
Capture the explain results
CREATE OR REPLACE FUNCTION parse_explain(IN query text, -- http://stackoverflow.com/questions/7682102/putting-explain-results-into-a-table OUT startup numeric, OUT totalcost numeric, OUT planrows numeric, OUT planwidth numeric, OUT type text) AS $BODY$ DECLARE query_explain text; explanation xml; nsarray text[][]; BEGIN nsarray := ARRAY[ARRAY['x', 'http://www.postgresql.org/2009/explain']]; query_explain :=e'EXPLAIN(FORMAT XML) ' || query; EXECUTE query_explain INTO explanation; startup := (xpath('/x:explain/x:Query/x:Plan/x:Startup-Cost/text()', explanation, nsarray))[1]; totalcost := (xpath('/x:explain/x:Query/x:Plan/x:Total-Cost/text()', explanation, nsarray))[1]; planrows := (xpath('/x:explain/x:Query/x:Plan/x:Plan-Rows/text()', explanation, nsarray))[1]; planwidth := (xpath('/x:explain/x:Query/x:Plan/x:Plan-Width/text()', explanation, nsarray))[1]; type := (xpath('/x:explain/x:Query/x:Plan/x:Node-Type/text()', explanation, nsarray))[1]; RETURN; END; $BODY$ LANGUAGE plpgsql;
Run it for all countries
with langs as ( select count(*), language from countrylanguage group by 2 order by 1 desc ), queries as ( select sub.planrows as plan_count, sub.type, sub.totalcost, langs.language, count as real_count from langs inner join lateral ( select totalcost, langs.language, planrows, type from parse_explain( concat(E'select * from countrylanguage where language = \'', langs.language, E'\'')) ) sub on langs.language = sub.language ) select language, real_count, plan_count, totalcost, type from queries;
language | real_count | plan_count | totalcost | type -----------------+------------+------------+-----------+------------------ English | 60 | 60 | 11.49 | Bitmap Heap Scan Arabic | 33 | 33 | 10.94 | Bitmap Heap Scan Spanish | 28 | 28 | 10.84 | Bitmap Heap Scan French | 25 | 25 | 10.78 | Bitmap Heap Scan -- snap -- -- snap -- -- snap -- Wolof | 3 | 3 | 9.97 | Bitmap Heap Scan Shona | 3 | 3 | 9.97 | Bitmap Heap Scan Gurma | 3 | 3 | 9.97 | Bitmap Heap Scan Hebrew | 2 | 1 | 8.29 | Index Scan Arawakan | 2 | 1 | 8.29 | Index Scan Teke | 2 | 1 | 8.29 | Index Scan Papuan Languages | 2 | 1 | 8.29 | Index Scan
Will use only inner join as an example.
SELECT country.code FROM country INNER JOIN city ON country.code = city.countrycode;
Three methods are supported:
For each row r in R do For each row s in S do If r and s satisfy the join condition Then output the row
results = [] i = 0 @countries.each do |country| @cities.each do |city| i = i + 1 results << country if city[:country_code] == country[:code] end end
["NLD", "NLD", "ALB", "AND", "BEL", "BEL", "BIH", "BIH", "BIH"]
for each row R1 in the build table begin calculate hash value on R1 join key(s) insert R1 into the appropriate hash bucket end for each row R2 in the probe table begin calculate hash value on R2 join key(s) for each row R1 in the corresponding hash bucket if R1 joins with R2 return (R1, R2) end
results = [] i = 0 temp_hash = @cities.inject(Hash.new([])) do |memo, city| memo[city[:country_code]] = memo[city[:country_code]] + [city] memo end @countries.each do |country| temp_hash[country[:code]].each do i = i + 1 results << country end end
["NLD", "NLD", "ALB", "AND", "BEL", "BEL", "BIH", "BIH", "BIH"]
get first row R1 from input 1 get first row R2 from input 2 while not at the end of either input begin if R1 joins with R2 begin return (R1, R2) get next row R2 from input 2 end else if R1 < R2 get next row R1 from input 1 else get next row R2 from input 2 end
results = [] i = 0 sorted_countries = @countries.sort_by {|country| country[:code]} sorted_cities = @cities.sort_by {|city| city[:country_code]} x, y = 0, 0 loop do i = i + 1 country = sorted_countries[x] city = sorted_cities[y] if country[:code] == city[:country_code] results << country y = y + 1 elsif country[:code] < city[:country_code] x = x + 1 elsif country[:code] > city[:country_code] y = y + 1 end break if (sorted_countries[x].nil? || sorted_cities[y].nil?) end
["ALB", "AND", "BEL", "BEL", "BIH", "BIH", "BIH", "NLD", "NLD"]
Aggregate function computes a single result from multiple rows
clear the current aggregate results clear the current group by columns for each input row begin if the input row does not match the current group by columns begin output the aggregate results clear the current aggregate results set the current group by columns to the input row end update the aggregate results with the input row end
results = [] i = 0 sorted_cities = @cities.sort_by {|city| city[:country_code]} current_aggregate = {} sorted_cities.each do |row| i = i + 1 if (current_aggregate[:country_code] != row[:country_code]) results << current_aggregate unless current_aggregate[:count].nil? current_aggregate = {} end current_aggregate[:country_code] = row[:country_code] current_aggregate[:count] = current_aggregate[:count].to_i + 1 end results << current_aggregate
Window function performs a calculation across a set of table rows that are somehow related to the current row. Unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row — the rows retain their separate identities.
Out of scope, sorry!
Query Optimizer decides on Scan Method, Join Method and Join Order.
Postgres does not provide syntax for giving hints to it.
But some sort of control is still possible.
There are no statistics available for expressions so Postgres assumes default estimations.
world=# explain analyze select * from city where countrycode = 'POL'; QUERY PLAN ----------------------------------------------------------------------------------- Index Scan using idx_city_countrycode on city (cost=0.28..22.87 rows=44 width=31) (actual time=0.031..0.044 rows=44 loops=1) Index Cond: (countrycode = 'POL'::bpchar) Total runtime: 0.075 ms (3 rows)
world=# explain analyze select * from city where countrycode = concat('POL'); QUERY PLAN ----------------------------------------------------------------------------------- Seq Scan on city (cost=0.00..135.38 rows=20 width=31) (actual time=6.926..9.500 rows=44 loops=1) Filter: ((countrycode)::text = concat('POL')) Rows Removed by Filter: 4035 Total runtime: 9.546 ms (4 rows)
We know there is only one row matching the condition so we give a hint to fetch only one row.
world=# explain analyze select * from city where countrycode = concat('POL'); ----------------------------------------------------------------------------- Seq Scan on city (cost=0.00..135.38 rows=20 width=31) (actual time=7.154..8.968 rows=44 loops=1) Filter: ((countrycode)::text = concat('POL')) Rows Removed by Filter: 4035 Total runtime: 9.011 ms
world=# explain analyze select * from city where countrycode = concat('POL') limit 1; ------------------------------------------------------------------------------------- Limit (cost=0.00..6.77 rows=1 width=31) (actual time=7.254..7.255 rows=1 loops=1) -> Seq Scan on city (cost=0.00..135.38 rows=20 width=31) (actual time=7.252..7.252 rows=1 loops=1) Filter: ((countrycode)::text = concat('POL')) Rows Removed by Filter: 2994 Total runtime: 7.297 ms
You can use CTE, Query Optimizer won't inline WITH queries.
It's a feature, not a bug.
Details:
http://blog.2ndquadrant.com/postgresql-ctes-are-optimization-fences/