SQLAlchemy Core – An Introduction – Differences between Core and ORM



SQLAlchemy Core – An Introduction – Differences between Core and ORM

3 9


pycon2014

My Intro to SQLAlchemy Core Slides

On Github jasonamyers / pycon2014

SQLAlchemy Core

An Introduction

Jason Myers / @jasonamyers

Background by maul555

Welcome, How do I describe SQLAlchemy? Well?
It has a ton of very, very useful functions and while not everything is not a nail it solves a ton of common problems for me. Let’s talk about about how its put together.

Differences between Core and ORM

ORM - Domain Model

class User(Base):
    __tablename__ = 'users'

    id = Column(Integer, primary_key=True)
    name = Column(String)
    fullname = Column(String)
    password = Column(String)
Focused around business objects and models

Core - Schema-centric Model

from sqlalchemy import Table, Column, Integer, String, MetaData
metadata = MetaData()
users = Table('users', metadata,
    Column('id', Integer, primary_key=True),
    Column('name', String),
    Column('fullname', String),
)
Focused around the database

Structure

Copyright © 2014 Mochimochi Land

The base of SQLAlchemy is the dialect and it’s handshake with the actual drivers via connection pools, the dialect along with schema introspection and typing enables us to use a python friendly SQL Expression language. All of these components are considered the Core of SQL Alchemy. Then we have a very rich ORM built on top of that foundation. If you TL;DR that ... it’s like GIT

Structure

The base of SQLAlchemy is the dialect and it’s handshake with the actual drivers via connection pools, the dialect along with schema introspection and typing enables us to use a python friendly SQL Expression language. All of these components are considered the Core of SQL Alchemy. Then we have a very rich ORM built on top of that foundation. If you TL;DR that ... it’s like GIT

Installing

pip install sqlalchemy

pip install flask-sqlalchemy

bin/paster create -t pyramid_alchemy tutorial

In many cases you’ll just pip install SQLAlchemy. but if you’re using flask I recommend using the variant just for it and pyramid has a scaffold for SQLAlchemy as well

Initializing

import sqlalchemy
from sqlalchemy import create_engine
engine = create_engine('sqlite:///:memory:')
The first thing we need to do is setup an engine and give it the connection information. In this case, we're actually using an in memory SQLite database

Defining a Table

from sqlalchemy import Table, Column, Integer, String, MetaData, ForeignKey
metadata = MetaData()
actors = Table('actors', metadata,
    Column('id', Integer, primary_key=True),
    Column('name', String),
    Column('fullname', String),
    Column('body_count', Integer)
)
roles = Table('roles', metadata,
    Column('id', Integer, primary_key=True),
    Column('actor_id', None, ForeignKey('actors.id')),
    Column('character_name', String, nullable=False)
)
It’s not like a normal class definition, but a more list a list of table columns. The metadata is used as a catalog to provide a mapping between the actual tables and our representation of them.

Create the tables

metadata.create_all(engine)
We take the MetaData object we created early, and bind it to the engine as see below. This will create our tables in the actually database. This is conditional by default, will not attempt to recreate tables already present in the target database. NOTE: It doesn't check to make sure they are the same just that an existing table is already present.

Table Objects

actors.columns.items()

[
    ('id', Column('id', Integer(), table=actors, primary_key=True...)),
    ('name', Column('name', String(), table=actors)),
    ('fullname', Column('fullname', String(), table=actors)),
    ('body_count', Column('body_count', Integer(), table=actors))
]
We can look at the tables attributes, and use this to reference parts of them later. In this example, we're looking at the columns of the table. This can also be abbreviated as just actors.c.items()

Opening a connection

conn = engine.connect()
Next we're going to open a connection to that engine so we can make queries etc.

Single Insert

ins = actors.insert().\
    values(name='Graham', fullname='Graham Chapman', body_count=3)
result = conn.execute(ins)
result.inserted_primary_key

[1]
First we're going to build an insert statement for putting a single record in our actors table. Then, we're excute that statement and store the result. Next, we'll get the primary key of that record back

Looking at what was executed

print str(ins)
ins.compile().params

INSERT INTO actors (name, fullname, body_count) VALUES (:name, :fullname, :body_count)
{'body_count': 3, 'fullname': 'Graham Chapman', 'name': 'Graham'}
Now, we're going to print the SQL statement as build via the dialect we choose above, and then print the params from that statement

Multiple Insert

results = conn.execute(roles.insert(), [
    {'actor_id': 1, 'character_name' : 'King Arthur'},
    {'actor_id': 1, 'character_name' : 'Voice of God'},
    {'actor_id': 2, 'character_name' : 'Sir Lancelot'},
    {'actor_id': 2, 'character_name' : 'Black Knight'},
    {'actor_id': 3, 'character_name' : 'Patsy'},
    {'actor_id': 3, 'character_name' : 'Sir Bors'},
])
results.rowcount

6
We can also insert multiple records like above. Transactions can be manually started, but by default wrap each execute.

Update

stmt = actors.update().where(actors.c.name == 'Graham').values(name='Gram')
result = conn.execute(stmt)
result.rowcount

1
When looking at updating data we often want to do it to just a particular record, or a certain group of records. The use of the .where() clause let's us do just that. Then we supply a list of values to update.

Delete

result = conn.execute(actors.delete().where(actors.c.name == 'Terry'))
result.rowcount

1
Very similar to update just it will remove the record.

Selecting

s = select([actors.c.name, actors.c.fullname])
result = conn.execute(s)
for row in result:
    print row

(u'Graham', u'Graham Chapman')
(u'John', u'John Cleese')
(u'Terry', u'Terry Gilliam')
So unlike the other statements that were clauses off of the table object, this time we're going to use a new select stmt and operate on the table that way. You can use a clause off the table object; however, this is more commonly used as you'll normally be joining data and doing other things. Also notice how I specified the columns.

Ordering

stmt = select([actors.c.name]).order_by(actors.c.name.desc())
conn.execute(stmt).fetchall()

[(u'Terry',), (u'John',), (u'Graham',)]
Here we're sorting by the name in descending alphabetical order. .asc() will do the accending order

Limiting

stmt = select([actors.c.name, actors.c.fullname]).limit(1).offset(1)
conn.execute(stmt).first()

(u'John', u'John Cleese')
Here we are limiting the query to return only one record and offset it by 1. So it should be the second record in the table. Notice the use of the ``.first()`` clause on the statement execution. This gives us back the first result instead of the list we had been dealing with previously.

Count

from sqlalchemy.sql import func
stmt = select([func.count(actors)])
conn.execute(stmt).scalar()

2
Here we are limiting the query to return only one record and offset it by 1. So it should be the second record in the table. Notice the use of the .first() clause on the statement execution. This gives us back the first result instead of the list we had been dealing with previously.

Sum

stmt = select([func.count(actors), func.sum(actors.c.body_count)])
conn.execute(stmt).first()

(2, 5)
Here we are building a select statement that counts the actors in our table, and sums their body counts

Joins

s = select([actors, roles]).where(actors.c.id == roles.c.actor_id)
for row in conn.execute(s):
    print row

(1, u'Graham', u'Graham Chapman', 1, 1, u'King Arthur')
(1, u'Graham', u'Graham Chapman', 2, 1, u'Voice of God')
(2, u'John', u'John Cleese', 3, 2, u'Sir Lancelot')
(2, u'John', u'John Cleese', 4, 2, u'Black Knight')
(3, u'Terry', u'Terry Gilliam', 5, 3, u'Patsy')
(3, u'Terry', u'Terry Gilliam', 6, 3, u'Sir Bors')
Now let's join the actors and the roles tables on the ID of the actor.

Grouping

stmt = select([actors.c.name, func.count(roles.c.id)]).\
    select_from(actors.join(roles)).\
    group_by(actors.c.name)
conn.execute(stmt).fetchall()

[(u'Graham', 2), (u'John', 2), (u'Terry', 2)]
We can also use the .join() clause on the table object to join to tables, and then group them to get a count of roles by actor.

Filtering

from sqlalchemy.sql import and_, or_, not_
stmt = select([actors.c.name, roles.c.character_name]).\
    where(
        and_(
          actors.c.name.like('Gra%'),
          roles.c.character_name.like('Vo%'),
          actors.c.id == roles.c.actor_id
        )
    )
conn.execute(stmt).fetchall()

[(u'Graham', u'Voice of God')]
First, let's pull in some additional parts of the sqlalchemy.sql module. NOTE: we're only using and_() in this example, but they all work the same way and can be nested. Next we're going to use the .where() clause and nest an and_() cause inside of it.

And So on...

This is really just the beginning of the types of querying that can be done with sqlalchemy, but I want to move on to two other areas of sqlalchemy awesomeness.
in addition to column types, the dialects define sql commands, their parameters loosely they are validated by the database driver and by sending the actual command to the DB.

Common Dialects

  • Informix
  • MS SQL
  • Oracle
  • Postgres
  • SQLite
  • Custom
Most common systems are covered, and customizing them is fairly easy. For example at work we use redshift, and we’ve been writing a sqlalchemy_redshift dialect. Which you can download on github. So let’s query some data...

But what if...

class UnloadFromSelect(Executable, ClauseElement):

    def __init__(self, select, bucket, access_key, secret_key):
        self.select = select
        self.bucket = bucket
        self.access_key = access_key
        self.secret_key = secret_key

@compiles(UnloadFromSelect)
def visit_unload_from_select(element, compiler, **kw):
    return "unload ('%(query)s') to '%(bucket)s'
        credentials 'aws_access_key_id=%(access_key)s;
        aws_secret_access_key=%(secret_key)s' delimiter ','
        addquotes allowoverwrite" % {
        'query': compiler.process(element.select,
              unload_select=True, literal_binds=True),
        'bucket': element.bucket,
        'access_key': element.access_key,
        'secret_key': element.secret_key,
    }
Here is an example of a single statement for a custom dialect. It establishes an unload from command as seen in Amazon Redshift

Example Statement

unload = UnloadFromSelect(
    select([fields]),
    '/'.join(['s3:/', BUCKET, filename]),
    ACCESS_KEY,
    SECRET_KEY
)
Example Usage

Example Usage

unload (
  'select * from venue where venueid in (
        select venueid from venue order by venueid desc limit 10)'
)
to 's3://mybucket/venue_pipe_'
credentials 'aws_access_key_id=ACCESS_KEY;
  aws_secret_access_key=SECRET_KEY';
Generated Statement

Dynamic Table Introspection

def build_table(engine, table_name):
    return Table(table_name, metadata, autoload=True, autoload_with=engine)
This introspection makes a table from a given engine and metadata collection.

Chaining

s = select(
        [
            t.c.race,
            t.c.factor,
            func.sum(g.t.c.value).label('summed')
        ], t.c.race > 0
    ).where(
        and_(
            t.c.type == 'POVERTY',
            t.c.value != 0
        )
    ).group_by(
        t.c.race,
        t.c.factor
    ).order_by(
        t.c.race,
        t.c.factor)

Conditionals

s = select(
    [
        table.c.discharge_year,
        func.count(1).label(
            'patient_discharges'),
        table.c.zip_code,
    ], table.c.discharge_year.in_(years)
).group_by(table.c.discharge_year)
s = s.where(table.c.hospital_name == provider)

if 'total_charges' not in unavailable_fields:
    s = s.column(
            func.sum(table.c.total_charges
            ).label('patient_charges')
    )
What’s great is we can do this a step at a time and add conditionals to create kind of a query builder. Some datasets have total_charges if they do we add those to the result.
s = s.group_by(table.c.zip_code)
s = s.order_by('discharges DESC')

cases = conn.execute(s).fetchall()

questions

Thank you

Jason Myers / @jasonamyers

Slides and IPython Notebook on http://bit.ly/pycon2014slides