A set of related information.
Something of interest to the database user community. Examples include customers, parts, geographic locations, etc.
An individual piece of data stored in a table.
A set of columns that together completely describe an entity or some action on an entity. Also called a record or a tuple.
A set of rows, held either in memory (nonpersistent) or on permanent storage (persistent).
Another name for a nonpersistent table, generally the result of an SQL query.
One or more columns that can be used as a unique identifier for each row in a table.
One or more columns that can be used together to identify a single row in another table.
Copy this code in the CLI
$ sudo service postgresql start $ sudo sudo -u postgres psql
-- Get the current date SELECT CURRENT_DATE; /** * date * ------------ * 2016-02-16 * (1 row) */
$ sudo sudo -u postgres psql
postgres=# \i PATH/to-script.sql
postgres=# \q
All the values of a column have to be of the same type
I want to store some information about the students of GA
A Student has only one City
Add a Foreign Key in the Student table referring to the City table
A Student has one or many Classes
Create a pivot table with 2 Foreign Keys referring to Student table and to Class table
The Foreign key couple is in fact the Primary key of this pivot table
-- Sample query select o.order_id "Order ID", count(distinct i.id) "Quantity" from orders o inner join items i on o.order_id = i.order_id where o.delivered_at is not null and not i.is_sample group by o.order_id order by 1 desc limit 100;
-- Indented sample query SELECT o.order_id AS "Order ID", COUNT(DISTINCT i.id) AS "Quantity" FROM orders o INNER JOIN items i ON o.order_id = i.order_id WHERE o.delivered_at IS NOT NULL AND NOT i.is_sample GROUP BY o.order_id ORDER BY o.created_at DESC LIMIT 100;
Not only for you, but also for everyone else!
-- Create table query DROP TABLE IF EXISTS student; CREATE TABLE student ( id SERIAL PRIMARY KEY, first_name VARCHAR(255), last_name VARCHAR(255), email VARCHAR(255), age INTEGER );
-- Insert rows INSERT INTO student ( first_name, last_name, email, age ) VALUES ( 'Leo', 'Di Caprio', 'leo@gmail.com', 40 ), ( 'Angelina', 'Joli', 'angi@gmail.com', 42 );
-- Select rows SELECT id, first_name, last_name FROM student LIMIT 10;
Create the tables City and Class and insert some data inside
\d+ city
-- Update a table ALTER TABLE student ADD city_id INTEGER REFERENCES city(id);
-- Update rows UPDATE student -- Table name SET city_id = 1 -- Field to update WHERE id = 1 -- Row to update ;
Create the pivot table student_class
-- Select some data SELECT s.first_name AS "Student first name", s.last_name AS "Student last name", c.name AS "City" FROM student s INNER JOIN city c ON c.id = s.city_id WHERE c.name = 'NYC' ORDER BY s.first_name DESC LIMIT 10;
# Leave PSQL if opened with \q $ wget https://raw.githubusercontent.com/aeud/sql-pres/master/movielens.sql # Download the dataset $ sudo sudo -u postgres psql < movielens.sql # Add the dataset to your database $ sudo sudo -u postgres psql # Restart PSQL client
A User has one Occupation
A Rating has one User and one Movie
A Movie has many Genres (via genre_movie)
-- PostgreSQL \d+ occupations \d+ users \d+ ratings \d+ movies \d+ genres \d+ genres_movies
-- Indented sample query SELECT o.order_id AS "Order ID", COUNT(DISTINCT i.id) AS "Quantity" FROM orders o INNER JOIN items i ON o.order_id = i.order_id WHERE o.delivered_at IS NOT NULL AND NOT i.is_sample GROUP BY o.order_id ORDER BY o.created_at DESC LIMIT 100;
-- Apply these queries SELECT * FROM users LIMIT 5; SELECT id, age, gender FROM users LIMIT 5; SELECT id AS "User ID", age AS "Age", gender AS "Gender" FROM users LIMIT 5;
-- Apply these queries SELECT * FROM users LIMIT 5; SELECT * FROM movies LIMIT 5;
-- Apply these queries SELECT * FROM users WHERE id = 1 LIMIT 5; SELECT * FROM users WHERE gender = 'M' LIMIT 5;
How to determine how many females / males are recorded in our dataset?
Gender Count M 670 F 273-- Apply this queries SELECT u.gender AS "Gender", -- Dimension(s) count(u.*) AS "Count" -- Aggregate function FROM users AS u GROUP BY gender -- Dimension(s) ;
-- Apply these queries SELECT * FROM users ORDER BY age ASC LIMIT 5; SELECT * FROM users ORDER BY age DESC LIMIT 5; SELECT * FROM users ORDER BY gender ASC LIMIT 5;
-- Apply these queries SELECT * FROM users ORDER BY age ASC LIMIT 5; SELECT * FROM users ORDER BY age ASC LIMIT 10; SELECT * FROM users ORDER BY age ASC LIMIT 100; -- use \q to escape
How can I get the distribution of the movies, by release year?
The WHERE clause is made of a serie of expressions, linked by logical operators
-- Conditions SELECT id, age, gender FROM users u WHERE u.gender = 'M' AND u.age < 35 ORDER BY id desc LIMIT 5;
With the documentation, write a query to fetch the movies released in 1996 with a title starting with the letter D or T
Get the distribution of movies released in 1996, by title first letter
Get the distribution of movies with a title starting with P, by year of release
How to get how many time a movie has been rated? And by which user?
-- Join, use aliases!! SELECT m.title, -- 1 count(r.id) reviews, -- 2 count(distinct r.user_id) unique_reviews -- 3 FROM movies m INNER JOIN ratings r ON r.movie_id = m.id ORDER BY 2 desc LIMIT 5;
-- Join, use aliases!! SELECT m.title, -- 1 count(r.id) reviews, -- 2 count(distinct r.user_id) unique_reviews -- 3 FROM movies m INNER JOIN ratings r ON r.movie_id = m.id GROUP BY 1 ORDER BY 2 desc LIMIT 5;
Get the top 10 movies, by average rate
Get some statistics by year, about the number of rates, the average rates, etc.. Be creative
Try to find an answer to this question: Does users give rates to movies only if they liked it? only if they hated it? only if they liked or hated it? (tip: analyse the distribution of rates)