Don't Forget The Elephant

A Review of Modern PostgreSQL

Presented by Keith Fiske / @keithf4

Database Administrator @ OmniTI

Follow along at http://slides.keithf4.com/dontforget

OmniTI, Inc

Full-stack support for high-traffic websites & applications
- Millions of users
- Terabytes of data
- Gilt, Etsy, Ora.TV, Freelotto
Surge Conference
- http://surge.omniti.com
- Annually in Sept
We're hiring!
- http://omniti.com/is/hiring
- SysAdmins, DBAs, Developers

Also mention Linux kernel engineers for hire

What is PostgreSQL?

Open Source RDBMS
Started at UC Berkley 1986, open sourced in 1996
BSD-type License
Follows SQL Standard very closely
Yearly major version releases (x.x)
- Minor (x.x.x) patch releases as needed
- Generally patch all supported versions at one time
Third-party Plugin Support
- Procedural Languages
  - C, Java, Python, Perl, JavaScript, PHP, R, Ruby, etc
- Extensions
- Background Workers
Mailing Lists, IRC, Wiki, planet.postgresql.org, Conferences & User Groups

A Brief History of Releases

Version Released Final Version EOL PostgreSQL 8.3 Feb 2008 8.3.23 Feb 2013 PostgreSQL 8.2 Dec 2006 8.2.23 Dec 2011 PostgreSQL 8.1 Nov 2005 8.1.23 Oct 2010 PostgreSQL 8.0 Jan 2005 8.0.26 Oct 2010 PostgreSQL 7.4 Nov 2003 7.4.30 Oct 2010 PostgreSQL 7.3 Nov 2002 7.3.21 Nov 2007 PostgreSQL 7.2 Feb 2002 7.2.8 Feb 2007 PostgreSQL 7.1 Apr 2001 7.1.3 Apr 2006 PostgreSQL 7.0 May 2000 7.0.3 May 2005 PostgreSQL 6.5 June 1999 6.5.3 June 2004 PostgreSQL 6.4 Oct 1998 6.4.2 Oct 2003 PostgreSQL 6.3 Mark 1998 6.3.2 March 2003

8.4 Features

Released July 2009 - EOL July 2014 Final Version 8.4.22

Window Functions

Performs a calculation across a set of table rows that are somehow related to the current row
The OVER clause determines exactly how the rows of the query are split up for processing by the window function

SELECT salary, sum(salary) OVER () FROM empsalary;

 salary |  sum  
--------+-------
   5200 | 47100
   5000 | 47100
   3500 | 47100
   4800 | 47100
   3900 | 47100
   4200 | 47100
   4500 | 47100
   4800 | 47100
   6000 | 47100
   5200 | 47100
(10 rows)


SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary;

 salary |  sum  
--------+-------
   3500 |  3500
   3900 |  7400
   4200 | 11600
   4500 | 16100
   4800 | 25700
   4800 | 25700
   5000 | 30700
   5200 | 41100
   5200 | 41100
   6000 | 47100
(10 rows)

For each row, there is a set of rows within its partition called its window frame
Many (but not all) window functions act only on the rows of the window frame, rather than of the whole partition. By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause.

Window Functions

The PARTITION BY list within OVER specifies dividing the rows into groups, or partitions, that share the same values of the PARTITION BY expression(s).

SELECT depname, empno, salary, avg(salary) 
OVER (PARTITION BY depname) 
FROM empsalary;

  depname  | empno | salary |          avg          
-----------+-------+--------+-----------------------
 develop   |    11 |   5200 | 5020.0000000000000000
 develop   |     7 |   4200 | 5020.0000000000000000
 develop   |     9 |   4500 | 5020.0000000000000000
 develop   |     8 |   6000 | 5020.0000000000000000
 develop   |    10 |   5200 | 5020.0000000000000000
 personnel |     5 |   3500 | 3700.0000000000000000
 personnel |     2 |   3900 | 3700.0000000000000000
 sales     |     3 |   4800 | 4866.6666666666666667
 sales     |     1 |   5000 | 4866.6666666666666667
 sales     |     4 |   4800 | 4866.6666666666666667
(10 rows)


SELECT depname, empno, salary, rank() 
OVER (PARTITION BY depname ORDER BY salary DESC) 
FROM empsalary;

  depname  | empno | salary | rank 
-----------+-------+--------+------
 develop   |     8 |   6000 |    1
 develop   |    10 |   5200 |    2
 develop   |    11 |   5200 |    2
 develop   |     9 |   4500 |    4
 develop   |     7 |   4200 |    5
 personnel |     2 |   3900 |    1
 personnel |     5 |   3500 |    2
 sales     |     1 |   5000 |    1
 sales     |     4 |   4800 |    2
 sales     |     3 |   4800 |    2
(10 rows)

Normally the avg() would be for all rows in the output. The first three output columns come directly from the table empsalary, and there is one output row for each row in the table. The fourth column represents an average taken across all the table rows that have the same depname value as the current row.
The rank function produces a numerical rank within the current row's partition for each distinct ORDER BY value, in the order defined by the ORDER BY clause. rank needs no explicit parameter, because its behavior is entirely determined by the OVER clause.

Common Table Expression

WITH provides a way to write auxiliary statements for use in a larger query
These statements, referred to as Common Table Expressions or CTEs, can be thought of as defining temporary tables that exist just for one query
WITH queries are useful because they are evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling WITH queries.

WITH regional_sales AS (
        SELECT region, SUM(amount) AS total_sales
        FROM orders
        GROUP BY region
     ), top_regions AS (
        SELECT region
        FROM regional_sales
        WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales)
     )
SELECT region,
       product,
       SUM(quantity) AS product_units,
       SUM(amount) AS product_sales
FROM orders
WHERE region IN (SELECT region FROM top_regions)
GROUP BY region, product;

Each auxiliary statement in a WITH clause can be its own SELECT statement
The WITH clause itself is attached to a primary statement that can also be a SELECT

Common Table Expressions

Allow recursive queries

WITH RECURSIVE t(n) AS (
    VALUES (1)
  UNION ALL
    SELECT n+1 FROM t WHERE n < 100
)
SELECT sum(n) FROM t;

 sum  
------
 5050
(1 row)

Like a normal recursive function, there is an initial condition (VALUES(1)) followed by step to perform that refers back to itself. In this case the WHERE condition provides the exit from recursion.

Parallel pg_restore

Prior versions, each object was restored serially one after the other
Parallel option tells it to always keep restoring that many objects at all times until complete.
Significantly reduced downtime for major version upgrades

Not just telling it parallel batch sizes. Like multiple assembly lines all running at once. If parallel is set to 10, there are always that many objects being restored. This allows it to keep working on larger tables in the background while smaller tables and all other objects continue to keep restoring and finish.

Additional

Default & variadic parameters for functions
Column permissions
Per-database locale settings
SSL certificate authentication
Automatic sizing of Free Space Map (eliminated significant administrative headache)
Live editing of functions (\ef in psql)
New contrib modules: pg_stat_statements, auto_explain, btree_gin

Friendly Reminder

If you are running any version <= 8.4, you are no longer receiving support
No security updates & no data corruption fixes!
Significant performance increases from 8.4 and up
Getting to new version now will make future upgrades much easier (pg_upgrade)
Time to get it over with and pull the bandaid off now!
We can help! - omniti.com

9.0 Features

Released Sept 2010 - EOL Sept 2015 Current Release 9.0.18

Streaming Replication

Prior replication methods required significant setup and relied on third-party tools (commands given to archive_command)
Simple, TCP-based connection method
Significantly reduced lag time between master & slave systems

Hot Standby

Allow readonly connections to slave servers
Offset load on master systems by sending non-write queries to slave(s)
Better security: allow access to data but not to a system that permits writes

pg_upgrade

Previously, all major version (x.x) upgrades required a full dump & restore
Can either copy existing files to new data directory or can use hard links to reuse old files
Link method can allow very quick upgrade of large databases (ex: 700GB in < 5min)
Minimum version to use: 8.4.7
- Anything older still requires dump & restore

Additional

Enhanced permissions management
- Grant to all objects in schema with single command
- Alter initial, default privileges of objects created by specific roles
Better VACUUM FULL
Windows 64-bit support
Anonymous code blocks (DO statement)
Deferrable unique constraints
Enhanced EXPLAIN plans. Export in JSON, XML or YAML

VACUUM FULL now rewrites entire table and indexes rather than moving rows. Avoids index bloat that used to be caused pre-9.0
Deferred unique constraint means allowing a constraint to be checked at the end of a multi-row update instead of checking as each row is inserted

9.1 Features

Released Sept 2011 - EOL Sept 2016 Current verison 9.1.14

Foreign Data Wrappers

SQL/MED (Management of External Data) part of SQL Standard
Connect to remote databases/filesystems/services as if they were local tables inside PostgreSQL
Oracle, MySQL, SQL Server, SQLite, MongoDB, Redis, CSV Files, Twitter, etc
Joins & conditions pushed to remote system
Still performance issues joining local tables w/ remote
Ease the migration to PostgreSQL from different databases

Extensions

Package related objects into a managed bundle
Versioned
- Ensure all systems have the same code
- Controlled upgrades & downgrades
All contrib modules are now extensions
Examples: PostGIS, pg_repack, pg_partman

Writable CTE

WITH statements can now contain INSERT, UPDATE & DELETE
Use RETURNING clause to do fun things

WITH moved_rows AS (
    DELETE FROM products
    WHERE
        "date" >= '2010-10-01' AND
        "date" < '2010-11-01'
    RETURNING *
)
INSERT INTO products_log
SELECT * FROM moved_rows;

Rows deleted in WITH statement are returned by RETURNING to be used by the SELECT statement to populate another table that tracks deletes

Writable CTE

Simplify transactional logic that contains multiple, related insert/update/delete statements
As a side-effect, significantly improves performance!

BEFORE:
BEGIN;
INSERT INTO users (name, email) VALUES (?,?) RETURNING userid;

INSERT INTO addresses (userid, address, city, state, zip) 
VALUES (?,?,?,?,?) RETURNING addressid;

INSERT INTO user_history (userid, addressid, action) VALUES (?,?,?) RETURNING historyid;
COMMIT;

AFTER:
WITH userdata AS (
  INSERT INTO users (name, email) VALUES (?,?)
  RETURNING userid
), addressdata AS (
  INSERT INTO addresses (userid, address, city, state, zip)
  SELECT userid,?,?,?,?
  FROM userdata
  RETURNING addressid 
), historydata AS (
  INSERT INTO user_history (userid, addressid, action)
  SELECT userid, addressid,?
  FROM userdata, addressdata 
  RETURNING historyid
)
SELECT userid, addressid, history id 
FROM userdata, addressdata, historydata;

BEFORE set of queries is even slower if coming from an application. Roundtrip on network for each statement.
Be careful with multiple update statements. If CTE doesn't depend on another running first, postgres doesn't guarentee order of execution. Can lead to deadlocks.

Additional

Synchronous Replication
True serializable transaction isolation
Unlogged tables

9.2 Features

Released Sept 2012 - EOL Sept 2017 Current version 9.2.9

Range Data Type

Data type representing a range of values of some element type (time, id, custom)
Scheduling, probablilty, intersection of ordered data

-- includes 3, does not include 7, and does include all points in between
SELECT '[3,7)'::int4range;

-- does not include either 3 or 7, but includes all points in between
SELECT '(3,7)'::int4range;

-- includes only the single point 4
SELECT '[4,4]'::int4range;

-- includes no points (and will be normalized to 'empty')
SELECT '[4,4)'::int4range;


CREATE TABLE reservation (room int, during tsrange);
INSERT INTO reservation VALUES
    (1108, '[2010-01-01 14:30, 2010-01-01 15:30)');


-- Containment
SELECT int4range(10, 20) @> 3;
 ?column? 
----------
 f

-- Overlaps
SELECT numrange(11.1, 22.2) && numrange(20.0, 30.0);
 ?column? 
----------
 t

-- Extract the upper bound
SELECT upper(int8range(15, 25));
 upper 
-------
    25

-- Compute the intersection
SELECT int4range(10, 20) * int4range(15, 25);
 ?column? 
----------
 [15,20)

Range Types

Practical Example: Buying a car in budget (Thank you Jonathan Katz)
Without range type, table below would have had to have a min_price & max_price column to track its possible range of values

test=# SELECT * FROM cars ORDER BY lower(cars.price_range);
 id  | name                | price_range
-----+---------------------+-----------------
 2   | Buick Skylark       | [2000,4001)
 3   | Pontiac GTO         | [5000,7501)
 4   | Chevrolet Camero    | [10000,12001)
 5   | Ford Mustang        | [11000,15001)
 6   | Lincoln Continental | [12000,14001)
 7   | BMW M3              | [35000,42001)
 8   | Audi RS4            | [41000,45001)
 9   | Porsche 911         | [47000,58001)
 10  | Lamborghini LP700   | [385000,400001)

Range Type

Budget of $13,000 - $15,000
Find all cars that have any values in that price range
With old min_price / max_price columns, you write fun queries like this:

SELECT *
FROM cars
WHERE
    (
        cars.min_price ≤ 13000 AND
        cars.min_price ≤ 15000 AND
        cars.max_price ≥ 13000 AND
        cars.max_price ≤ 15000
    ) OR
    (
        cars.min_price ≤ 13000 AND
        cars.min_price ≤ 15000 AND
        cars.max_price ≥ 13000 AND
        cars.max_price ≥ 15000
    ) OR
    (
        cars.min_price ≥ 13000 AND
        cars.min_price ≤ 15000 AND
        cars.max_price ≥ 13000 AND
        cars.max_price ≤ 15000
    ) OR
    (
        cars.min_price ≥ 13000 AND
        cars.min_price ≤ 15000 AND
        cars.max_price ≥ 13000 AND
        cars.max_price ≥ 15000
    )
ORDER BY cars.min_price;

Range Types

Range types & operators make this just slightly easier

SELECT *
FROM cars
WHERE cars.price_range && int4range(13000, 15000, '[]')
ORDER BY lower(cars.price_range);

 id | name                | price_range
----+---------------------+---------------
 5  | Ford Mustang        | [11000,15001)
 6  | Lincoln Continental | [12000,14001)

Cascading Replication

Database can be a streaming slave of another slave
Every streaming slave off of a master causes additional load for reading its WAL stream
Distribute slave load among several systems in large, multi-cluster setups

Additional

Index-Only scan
JSON Data Type (limited)

9.3 Features

Released Sept 2013 - EOL Sept 2018 Current Version 9.3.5

Reduce System V shared memory

Install OS Install PostgreSQL package Do tuning adjustments to fix your shared_buffers (32mb is enough for anyone right?) Restart to put new settings into place Uh oh...

IpcMemoryCreate: shmget(key=5432001, size=415776768, 03600) failed: Invalid argument 

This error usually means that PostgreSQL's request for a shared memory 
segment exceeded your kernel's SHMMAX parameter. You can either 
reduce the request size or reconfigure the kernel with larger SHMMAX. 
To reduce the request size (currently 415776768 bytes), reduce 
PostgreSQL's shared_buffers parameter (currently 50000) and/or 
its max_connections parameter (currently 12).

No longer an issue with 9.3
Blog post by Robert Haas explains the original cause and fix

Custom Background Worker

PostgreSQL already used distinct background processes for its own inner workings (postmaster, writer, logger, autovacuum, stats collector, etc)
Now able to program your own workers and let PostgreSQL manage them for you
Auto-start & stop along with the database
Use custom configuration options in postgresql.conf
Example: Working on allowing pg_partman partition maintenance to run as a BGW so separate cronjob is no longer required

Additional

JSON Operators - Actually useful!
LATERAL queries/joins
Writable Foreign Data Wrappers
An actual PostgreSQL FDW!
Parallel pg_dump
Materialized Views (kind of useless)
Checksum data pages

9.4 Features

ETA 4Q 2014 Current Version 9.4.beta3

Upcoming 9.4 Features

Binary JSON data type. Even more useful! Replace MongoDB.
- Benchmark PG vs Mongo - https://github.com/EnterpriseDB/pg_nosql_benchmark
ALTER SYSTEM command to dynamically change postgresql.conf within the database
Materialzed view refresh no longer blocks reads.
- Actually useful! Still not auto-refreshed, tho
Dynamic loading/unloading of Background Workers
Logical decoding in WAL stream. First steps to native logical replication & multi-master

PostgreSQL Home Page - postgresql.org
Planet PostgreSQL Community News Feed - planet.postgresql.org
PostgreSQL Extension Network - pgxn.org
OmniTI - omniti.com
Me! - keithf4.com / @keithf4
Make cool presentations like this - reveal.js

Don't Forget The Elephant – A Brief History of Releases – 8.0 Features

keithf4

Don't Forget The Elephant – A Brief History of Releases – 8.0 Features

0 1 (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/platform.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

dont_forget_the_elephant

Don't Forget The Elephant

A Review of Modern PostgreSQL

OmniTI, Inc

What is PostgreSQL?

A Brief History of Releases

8.4 Features

Window Functions

Window Functions

Common Table Expression

Common Table Expressions

Parallel pg_restore

Additional

Friendly Reminder

9.0 Features

Streaming Replication

Hot Standby

pg_upgrade

Additional

9.1 Features

Foreign Data Wrappers

Extensions

Writable CTE

Writable CTE

Additional

9.2 Features

Range Data Type

Range Types

Range Type

Range Types

Cascading Replication

Additional

9.3 Features

Reduce System V shared memory

Custom Background Worker

Additional

9.4 Features

Upcoming 9.4 Features

0 1