battling_a_legacy_schema_with

Battling a Legacy Schema With DBIx::Class

Lee Johnson

YAPC::{NA,EU} 2016

who has experience with DBIx::Class?
with what they would call a legacy schema?
there will be live demos!
suggest following along on screen so code examples are clear

Me?

http://leejo.github.io/code

briefly introduce Givengain
gave a talk on the frontend stuff
this is about the backend
examples here taken from work we have done

Why DBIx::Class?

For any non-trivial application you're going to write a model.

Most trivial applications turn into non-trivial applications.

And you're probably sick of writing trivial SQL.

So your trivial SQL statements may become non-trivial SQL statements.

So "don't leak the abstraction"

DBIx::Class offers many useful features, as we shall see.

Other ORMs/toolkits are available.

my first intro to dbic, don't want to go back
no writing same SQL over and over
quoting Ovid, who was quoting Joel Spolsky
concentrate on the business logic objects, the "model" (somewhat overloaded term)
DBIx::Class is not just an ORM, it's a toolkit (plugins, helpers, etc)

More About That Model

The ORM isn't your model, it helps your model.

MVC

OMVC

thin controllers, fat model + ORM

More About That Model

Abstract this away into your model:

$model->resultset( "Piste" )->find( 17 )->name;

Like so:

SkiResort::Model::Piste->new( id => 17 )->name;

Your controllers are then not coupled to the data store

And they know nothing about the ORM

example: ./examples/slides/more_about_that_model.sh 17 (+ exception)
controller(s) not tightly coupled to the schema
model code makes the decisions on storage

More About That Model

And you can better utilise exceptions:

try sub {
    ...
    my $piste = SkiResort::Model::Piste->new( id => $piste_id );
    ...
},
catch_when 'SkiResort::Exception::Database' => sub {
    # 500 error?
},
catch_when 'SkiResort::Exception::Input' => sub {
    # 400 error?
},
catch_default sub {
    # something else?
};

Legacy Schema?

Anything you didn't design yourself, right?

Well, anything that's grown organically.

Or used older tech and didn't keep up.

Maybe just full of technical debt?

Technical debt is hard to fix in your data store.

It's more like a mortgage on your stack.

So lets look at some examples.

tech debt: because often the code is tightly coupled to it
mortgage: takes a long time to pay off, makes moving difficult, missing a payment can be catastrophic.
a well designed schema: you can just about point DBIx::Class at it and run
but we can still use dbic with a legacy schema, just needs some help
following: how we can abstract problems away to ease paying off that debt.

Legacy Schema?

many more examples can be found in this book
legacy schema probably contains many of these
that's not to say a well designed schema won't contain any

A Ski Resort

Simple, right?

This is done "right" (potentially subjective)
Show villars_piste_map.pdf
Because, for example, resort can be made of other resorts
And a piste can be made of other pistes
And you can have lifts that link resorts
But keeping this simple enough for the examples
And this is not the "legacy" schema...

The "legacy" version:

Eh, not so good.

This is the legacy schema
Purposefully simplistic

High Level Overview

11:31 <@ribasushi> if you have an existing gnarly database you want to wrap your head
                   around, vanstyn's rdbic is a superb tool building on top of the
                   ecosystem: http://www.catalystframework.org/calendar/2014/16

RapidApp

rdbic.pl examples/db/legacy/resorts_legacy.db

example: ./examples/slides/high_level_overview.sh

Confusing Terminology

If you don't leak the abstraction then this becomes a non-issue.

Because your SkiResort::Model::Piste class could be querying a table named potato.

But if doesn't matter, the confusing terminology is safely contained and you only ever interact with the correctly named class.

a re-purposed schema
no, we don't have any tables named "potato"

Relationships (or lack thereof)

there was once an aversion to these
"they're slow"
"they don't give us anything"
"they make queries over complicated"
some storage engines didn't support them
The R in "RDBMS" is important

Generating Classes

#!/bin/bash

set -e -x -u

folder=$1
db_path=$2
overwrite=$3

dbicdump \
        -o debug=1 -o generate_pod=0 -o preserve_case=1 \
        -o dump_directory=$folder \
        -o components="[qw{InflateColumn::DateTime}]" \
        -o overwrite_modifications=$overwrite \
        -o datetime_timezone=UTC \
                'SkiResort::Model::LegacySchema' \
                'dbi:SQLite:dbname='$db_path''

Result source classes
cd examples; sh db/gen_dbic_classes_legacy.sh lib db/legacy/resorts_legacy.db 1
useful to have this as a script to rerun as required
safe to rerun on modified classes (assuming no changes above the md5sum)
result_base_class=DBIx::Class::Code (the default, can change)
saves trouble/time down the road

Generating Classes

use utf8;
package SkiResort::Model::LegacySchema::Result::ResortItem;

# Created by DBIx::Class::Schema::Loader
# DO NOT MODIFY THE FIRST PART OF THIS FILE

use strict;
use warnings;

use base 'DBIx::Class::Core';
__PACKAGE__->load_components("InflateColumn::DateTime");
__PACKAGE__->table("resort_item");
__PACKAGE__->add_columns(
  "resort_id",
  { data_type => "integer", default_value => \"null", is_nullable => 1 },
  "item_source",
  { data_type => "varchar( 255 )", is_nullable => 1 },
  "item_id",
  { data_type => "integer", is_nullable => 0 },
);


# Created by DBIx::Class::Schema::Loader v0.07036 @ 2016-04-07 14:16:33
# DO NOT MODIFY THIS OR ANYTHING ABOVE! md5sum:kOaW4xm7IT+Y3qVDDBy2hg

# You can replace this text with custom code or comments, and it will be
# preserved on regeneration
1;

gives us the base but we need to add relationships
the relationships are where we can really use the power of dbic

Adding Relationships

# You can replace this text with custom code or comments, and it will be
# preserved on regeneration

__PACKAGE__->belongs_to(
  "resort",
  "SkiResort::Model::LegacySchema::Result::Resort",
  { id => "resort_id" },
  { is_deferrable => 0, on_delete => "CASCADE", on_update => "CASCADE" },
);

Which gives us:

$model->resultset( "ResortItem" )
    ->first->resort->name;

And of course:

$model->resultset( "ResortItem" )->search(
    {},{ prefetch => [ qw/ resort / ] }
)->first->resort->name;

example: ./examples/slides/adding_relationships.sh
probably only want to add relationships as you need them
i.e. no point in spending time adding them all
but prefetch is an essential concept in optimisation

Adding Relationships

DBIx::Class::Relationship

might_have = 0 .. 1
has_many = 0 .. n
belongs_to = The right hand side of the above:
- If a Resort has_many Piste(s) then a Piste belongs_to a Resort
(many_to_many = convenience)

belongs_to is really the one true representative relationship
as foreign keys in a schema definition only ever say what they "belong to"
many_to_many is literally an accessor, there is no metadata behind it
see this

Polymorphic Relationships?

if you've ever seen a table that has a column to store the name of another table, then...
seems like an elegant solution, but:
can't enforce referential integrity in the storage engine
pushes the relational implementation into your application code
OK, but what if we have one of these and we can't yet fix it?

Polymorphic Relationships?

package SkiResort::Model::LegacySchema::Result::ResortItem;

...

__PACKAGE__->set_primary_key( "resort_id","item_source","item_id" );

__PACKAGE__->belongs_to(
  piste => 'SkiResort::Model::LegacySchema::Result::Piste',
  sub {
    my ( $args ) = @_;

    return {
      "$args->{self_alias}.item_source" => 'piste',
      "$args->{self_alias}.item_id"   => { -ident => "$args->{foreign_alias}.id" },
    };
  },
);

Allows:

$model->resultset( "PisteItem" )->search(
    { item_source => 'piste' },
    { prefetch => [ qw/ piste / ] }
)->first->piste->name;

example: ./examples/slides/polymorphic_relationships.sh

Polymorphic Relationships?

Using search_related bridge having added the previous belongs_to:

package SkiResort::Model::LegacySchema::Result::Resort;

...

__PACKAGE__->has_many(
  resort_items => 'SkiResort::Model::LegacySchema::Result::ResortItem',
  { 'foreign.resort_id' => 'self.id' }
);

Allows:

$model->resultset( "Resort" )
    ->search_related( 'resort_items' )
    ->search_related( 'piste' )->first->name;

example: ./examples/slides/polymorphicrelationships_mtm.sh
note in showing the example that we only run one query
the query happens nowhere during search*
we can walk anywhere

Polymorphic Relationships?

And if you have many of these:

package SkiResort::Model::LegacySchema::Result::ResortItem;

...

foreach my $source ( qw/ piste lift / ) {

  __PACKAGE__->belongs_to(
    $source => 'SkiResort::Model::LegacySchema::Result::' . ucfirst( $source ),
    sub {
      my ( $args ) = @_;

      return {
        "$args->{self_alias}.item_source" => $source,
        "$args->{self_alias}.item_id"   => { -ident => "$args->{foreign_alias}.id" },
      };
    },
  );

}

example: ./examples/slides/polymorphic_relationships_lift.sh

Fixing Column Data

Inflating Column Data

We can fix data and/or get objects from column data.

DateTime - You'll almost certainly want this one:

__PACKAGE__->load_components(qw/InflateColumn::DateTime/);

When $column is a date, timestamp or datetime data type:

$model->resultset( "Foo" )->first->$column->subtract->( months => 1 )->ymd( '-' );

DATE_FORMAT(DATE_SUB(c,INTERVAL 1 MONTH),'%Y-%m-%d')

Stop using your RDMS for date calculations / localisation.

we can both inflate and deflate
e.g. ensuring data is encrypted
DBIx::Class::InflateColumn::*
a couple of these are core to DBIx::Class, e.g. InflateColumn::DateTime
- has ways to handle timezones, locale, etc
stop usings RDMS, see next slide

DBIx::Class::Helper::ResultSet::DateMethods

# get count per year/month
$rs->search(undef, {
   columns => {
      count => '*',
      year  => $rs->dt_SQL_pluck({ -ident => '.start' }, 'year'),
      month => $rs->dt_SQL_pluck({ -ident => '.start' }, 'month'),
   },
   group_by => [
      $rs->dt_SQL_pluck({ -ident => '.start' }, 'year'),
      $rs->dt_SQL_pluck({ -ident => '.start' }, 'month'),
   ],
});

# mysql
(SELECT `me`.*, EXTRACT(MONTH FROM `me`.`start`), EXTRACT(YEAR FROM `me`.`start`)
FROM `HasDateOps` `me`
GROUP BY EXTRACT(YEAR FROM `me`.`start`), EXTRACT(MONTH FROM `me`.`start`))

# SQLite
(SELECT "me".*, STRFTIME('%m', "me"."start"), STRFTIME('%Y', "me"."start")
FROM "HasDateOps" "me"
GROUP BY STRFTIME('%Y', "me"."start"), STRFTIME('%m', "me"."start"))

if you already have lots of RDMS-side date code
and can't move away because you're dependent on the DB time sync, etc

Fixing Column Data With Filters

The resort table contains an active column that is a char(1).

It should be a boolean, so:

package SkiResort::Model::LegacySchema::Result::Resort;

...

__PACKAGE__->load_components( "FilterColumn" );

__PACKAGE__->filter_column( active => {
    filter_to_storage   => sub { return $_[1] ? 'Y' : 'N'; },
    filter_from_storage => sub { return defined $_[1] && $_[1] =~ /Y/i ? 1 : 0; },
});

Cleans up significantly:

if ( $model->resultset( "Resort" )->first->active ) {
    ...
}

DBIx::Class::FilterColumn
./examples/slides/fixing_column_data_with_filters.sh
en/decryption of columns

Fixing Column Data With Filters

How about sanity checking a column that contains a CSV list?

__PACKAGE__->filter_column( column_with_csv => {

    filter_from_storage   => sub { return [ split( ',',$_[1] ) ]; },

    filter_to_storage => sub {
        my ( $self,$values ) = @_;

        foreach my $value ( @{ $values // [] } ) {
            # maybe we can check $value here - if it's supposed
            # to be a reference to another table then we could
            # check $value exists in the child table? if not we
            # throw an exception
        }

        return join( ',',@{ $values // [] } ) || undef;
    },
});

Text::CSV
Manually enforce referential integrity

Complex Queries

my $sql = "Some complex SQL we don't want to rewrite as SQL::Abstract";

Options:

Not a sign of a legacy schema
You can rewrite most (if not all) SQL, but you need to find a balance
DBIC_TRACE if you need to debug your SQL::Abstract struct (more later)

dbh_do

my @results = $schema->storage->dbh_do(
    sub {
        my ( $storage,$dbh,@binds ) = @_;

        # this is just pure DBI
        $dbh->selectrow_array( "Complex SQL Here",{},@binds );
    },
    @binds,
);

dbh_do, low level db handle (with exceptions)
straight out of the docs
using this rather than ->_dbh or ->dbh ensures correct exception handling and reconnection

Virtual Views

package SkiResort::Model::LegacySchema::Result::PistesForRating;

use base qw/DBIx::Class::Core/;

__PACKAGE__->table_class('DBIx::Class::ResultSource::View');
__PACKAGE__->table('pistes_for_rating');
__PACKAGE__->result_source_instance->is_virtual(1);
__PACKAGE__->add_columns( "name", {
    data_type     => "varchar( 255 )",
    default_value => \"null",
    is_nullable   => 1,
} );

__PACKAGE__->result_source_instance->view_definition( "
    SELECT piste.name
    FROM   piste
    WHERE  rating = ?
" );

1;

note the is_virtual flag

Virtual Views

$model->resultset( "PistesForRating" )
    ->search( {},{ bind => [ $rating ] } )

example: ./examples/slides/virtual_views.sh Easy
restricted to a specific number of binds, or are we? (next slide)

Virtual Views Extended

package SkiResort::Model::LegacySchema::Result::PistesForRatingMatchingString;

use base qw/SkiResort::Model::LegacySchema::Result::PistesForRating/;

__PACKAGE__->table('pistes_for_rating_like');
__PACKAGE__->result_source_instance->view_definition(
    __PACKAGE__->SUPER::result_source_instance->view_definition . "
    AND piste.name like ?
" );

1;

And:

$model->resultset( "PistesForRatingMatchingString" )
    ->search( {},{ bind => [ $rating,"%$string%" ] } )

you might want to avoid any pistes next to lakes, so:
example: ./examples/slides/virtual_views_extended.sh Easy Lac
virtual views are just classes, so we can trivially extend them

Reports

DBIx::Class::Report

Just link and show perldoc here, explain ALPHA code
It's just building up virtual views at runtime

Query Tracing and Profiling

DBIC_TRACE=1 or DBIC_TRACE=1=/path/to/output/file
DBIC_TRACE_PROFILE=console - for pretty printing
DBI_TRACE=1 # 2,3,SQL,... - if you're using dbh_do

use Carp qw/ cluck longmess shortmess /;

sub resultset {
    my ( $self,$table ) = @_;

    if ( my $trace = $ENV{DBIC_TRACE} ) {
        my ( $level,$trace_file ) = split( /=/,$trace );
        my $mess = $level == 1
            ? shortmess( "RESULTSET: $table" )
            : longmess( "RESULTSET: $table" );
        if ( $trace_file ) {
            open( my $fh,'>>',$trace_file ) || cluck( ... );
            print $fh $mess;
            close( $fh );
        } else {
            print STDERR $mess;
        }
    }

    return $self->SUPER::resultset( $table );
}

what i've been using in the examples
console_monochrome if you don't want colours/want to see exact bind sites
put the sub in your Schema.pm module (or base class)
shows you potential ->resultset calls missing prefetch
as seen in examples, e.g. ./examples/slides/adding_relationships.sh
been meaning to write something to parse the out of this, but...

Connection Options

UTF-8

mysql_enable_utf8 => 1,

Almost certainly

mysql_auto_reconnect => 1,

If only new code is going to use the dbic route:

$schema->storage->on_connect_do( [
    "SET sql_mode=STRICT_ALL_TABLES",
] );

or whatever the flag for other db engines is
see the perldoc for your DBD driver
STRICT_ALL_TABLES - for mysql

Gotchas

Prefetch prefetch prefetch.

Test your app with a representative dataset (where possible).

Keep your ResultSource classes up to date.

"ORMs are slow", no you're probably not using it correctly
representative data set + DBIC_TRACE to find optimisation points
up to date using the script shown in "Generating Classes"

In Summary

Thin controllers, fat model
- The ORM is not your model
- Use it in your model
DBIx::Class can be thought of as a toolkit not just an ORM, nor an opinionated framework
Define relationships even if they don't exist as FKs in the schema
Filters can help you fix problems with data
- So you don't need to faff about downstream
- And give you objects
- Or you can sanity check data before it gets into the db
You can get to the low level handle if you have a complex query
- Or use virtual views so you DRY
- And again get objects

Stuff I Didn't Cover

Logging

Deployment

Helpers

DBIx-Class-Helpers

show metacpan page for Helpers (there are loads)

There's Loads Of Other Stuff

curl -XPOST api.metacpan.org/v0/release/_search?size=100 -d '{
    "query": {
        "wildcard" : {
            "release.distribution" : "DBIx-Class*"
        }
    },
    "size" : 5000,
    "filter" : {
        "term" : {
            "status" : "latest"
        }
    },
    "fields": [
        "release.distribution",
        "release.date",
        "provides"
    ]
}' | jq -r '.hits.hits[].fields.distribution' | sort

100 distributions of the form DBIx::Class:: when i last checked

Questions?

Links and resources:

DBIx::Class on metacpan.
fREW's blog, lots of DBI(x::Class) tips.
#dbix-class on irc.perl.org

Battling a Legacy Schema With DBIx::Class Lee Johnson YAPC::{NA,EU} 2016 who has experience with DBIx::Class? with what they would call a legacy schema? there will be live demos! suggest following along on screen so code examples are clear

battling_a_legacy_schema_with_dbic

leejo

battling_a_legacy_schema_with_dbic

0 6