battling_a_legacy_schema_with_dbic



battling_a_legacy_schema_with_dbic

0 6


battling_a_legacy_schema_with_dbic

A talk planned for 2016 YAPC::{NA,EU} and possibly others

On Github leejo / battling_a_legacy_schema_with_dbic

Battling a Legacy Schema With DBIx::Class

Lee Johnson

YAPC::{NA,EU} 2016

  • who has experience with DBIx::Class?
  • with what they would call a legacy schema?
  • there will be live demos!
  • suggest following along on screen so code examples are clear

Me?

http://leejo.github.io/code

  • briefly introduce Givengain
  • gave a talk on the frontend stuff
  • this is about the backend
  • examples here taken from work we have done

Why DBIx::Class?

For any non-trivial application you're going to write a model.

Most trivial applications turn into non-trivial applications.

And you're probably sick of writing trivial SQL.

So your trivial SQL statements may become non-trivial SQL statements.

So "don't leak the abstraction"

DBIx::Class offers many useful features, as we shall see.

Other ORMs/toolkits are available.

  • my first intro to dbic, don't want to go back
  • no writing same SQL over and over
  • quoting Ovid, who was quoting Joel Spolsky
  • concentrate on the business logic objects, the "model" (somewhat overloaded term)
  • DBIx::Class is not just an ORM, it's a toolkit (plugins, helpers, etc)

More About That Model

The ORM isn't your model, it helps your model.

MVC

MVC

OMVC

  • thin controllers, fat model + ORM

More About That Model

Abstract this away into your model:

$model->resultset( "Piste" )->find( 17 )->name;

Like so:

SkiResort::Model::Piste->new( id => 17 )->name;

Your controllers are then not coupled to the data store

And they know nothing about the ORM

  • example: ./examples/slides/more_about_that_model.sh 17 (+ exception)
  • controller(s) not tightly coupled to the schema
  • model code makes the decisions on storage

More About That Model

And you can better utilise exceptions:

try sub {
    ...
    my $piste = SkiResort::Model::Piste->new( id => $piste_id );
    ...
},
catch_when 'SkiResort::Exception::Database' => sub {
    # 500 error?
},
catch_when 'SkiResort::Exception::Input' => sub {
    # 400 error?
},
catch_default sub {
    # something else?
};

Legacy Schema?

Anything you didn't design yourself, right?

Well, anything that's grown organically.

Or used older tech and didn't keep up.

Maybe just full of technical debt?

Technical debt is hard to fix in your data store.

It's more like a mortgage on your stack.

So lets look at some examples.

  • tech debt: because often the code is tightly coupled to it
  • mortgage: takes a long time to pay off, makes moving difficult, missing a payment can be catastrophic.
  • a well designed schema: you can just about point DBIx::Class at it and run
  • but we can still use dbic with a legacy schema, just needs some help
  • following: how we can abstract problems away to ease paying off that debt.

Legacy Schema?

  • many more examples can be found in this book
  • legacy schema probably contains many of these
  • that's not to say a well designed schema won't contain any

A Ski Resort

Simple, right?

  • This is done "right" (potentially subjective)
  • Show villars_piste_map.pdf
  • Because, for example, resort can be made of other resorts
  • And a piste can be made of other pistes
  • And you can have lifts that link resorts
  • But keeping this simple enough for the examples
  • And this is not the "legacy" schema...

The "legacy" version:

Eh, not so good.

  • This is the legacy schema
  • Purposefully simplistic

High Level Overview

11:31 <@ribasushi> if you have an existing gnarly database you want to wrap your head
                   around, vanstyn's rdbic is a superb tool building on top of the
                   ecosystem: http://www.catalystframework.org/calendar/2014/16

RapidApp

rdbic.pl examples/db/legacy/resorts_legacy.db
  • example: ./examples/slides/high_level_overview.sh

Confusing Terminology

If you don't leak the abstraction then this becomes a non-issue.

Because your SkiResort::Model::Piste class could be querying a table named potato.

But if doesn't matter, the confusing terminology is safely contained and you only ever interact with the correctly named class.

  • a re-purposed schema
  • no, we don't have any tables named "potato"

Relationships (or lack thereof)

  • there was once an aversion to these
  • "they're slow"
  • "they don't give us anything"
  • "they make queries over complicated"
  • some storage engines didn't support them
  • The R in "RDBMS" is important

Generating Classes

#!/bin/bash

set -e -x -u

folder=$1
db_path=$2
overwrite=$3

dbicdump \
        -o debug=1 -o generate_pod=0 -o preserve_case=1 \
        -o dump_directory=$folder \
        -o components="[qw{InflateColumn::DateTime}]" \
        -o overwrite_modifications=$overwrite \
        -o datetime_timezone=UTC \
                'SkiResort::Model::LegacySchema' \
                'dbi:SQLite:dbname='$db_path''
  • Result source classes
  • cd examples; sh db/gen_dbic_classes_legacy.sh lib db/legacy/resorts_legacy.db 1
  • useful to have this as a script to rerun as required
  • safe to rerun on modified classes (assuming no changes above the md5sum)
  • result_base_class=DBIx::Class::Code (the default, can change)
  • saves trouble/time down the road

Generating Classes

use utf8;
package SkiResort::Model::LegacySchema::Result::ResortItem;

# Created by DBIx::Class::Schema::Loader
# DO NOT MODIFY THE FIRST PART OF THIS FILE

use strict;
use warnings;

use base 'DBIx::Class::Core';
__PACKAGE__->load_components("InflateColumn::DateTime");
__PACKAGE__->table("resort_item");
__PACKAGE__->add_columns(
  "resort_id",
  { data_type => "integer", default_value => \"null", is_nullable => 1 },
  "item_source",
  { data_type => "varchar( 255 )", is_nullable => 1 },
  "item_id",
  { data_type => "integer", is_nullable => 0 },
);


# Created by DBIx::Class::Schema::Loader v0.07036 @ 2016-04-07 14:16:33
# DO NOT MODIFY THIS OR ANYTHING ABOVE! md5sum:kOaW4xm7IT+Y3qVDDBy2hg

# You can replace this text with custom code or comments, and it will be
# preserved on regeneration
1;
  • gives us the base but we need to add relationships
  • the relationships are where we can really use the power of dbic

Adding Relationships

# You can replace this text with custom code or comments, and it will be
# preserved on regeneration

__PACKAGE__->belongs_to(
  "resort",
  "SkiResort::Model::LegacySchema::Result::Resort",
  { id => "resort_id" },
  { is_deferrable => 0, on_delete => "CASCADE", on_update => "CASCADE" },
);

Which gives us:

$model->resultset( "ResortItem" )
    ->first->resort->name;

And of course:

$model->resultset( "ResortItem" )->search(
    {},{ prefetch => [ qw/ resort / ] }
)->first->resort->name;
  • example: ./examples/slides/adding_relationships.sh
  • probably only want to add relationships as you need them
  • i.e. no point in spending time adding them all
  • but prefetch is an essential concept in optimisation

Adding Relationships

DBIx::Class::Relationship

  • might_have = 0 .. 1
  • has_many = 0 .. n
  • belongs_to = The right hand side of the above:
    • If a Resort has_many Piste(s) then a Piste belongs_to a Resort
  • (many_to_many = convenience)
  • belongs_to is really the one true representative relationship
  • as foreign keys in a schema definition only ever say what they "belong to"
  • many_to_many is literally an accessor, there is no metadata behind it
  • see this

Polymorphic Relationships?

  • if you've ever seen a table that has a column to store the name of another table, then...
  • seems like an elegant solution, but:
  • can't enforce referential integrity in the storage engine
  • pushes the relational implementation into your application code
  • OK, but what if we have one of these and we can't yet fix it?

Polymorphic Relationships?

package SkiResort::Model::LegacySchema::Result::ResortItem;

...

__PACKAGE__->set_primary_key( "resort_id","item_source","item_id" );

__PACKAGE__->belongs_to(
  piste => 'SkiResort::Model::LegacySchema::Result::Piste',
  sub {
    my ( $args ) = @_;

    return {
      "$args->{self_alias}.item_source" => 'piste',
      "$args->{self_alias}.item_id"   => { -ident => "$args->{foreign_alias}.id" },
    };
  },
);

Allows:

$model->resultset( "PisteItem" )->search(
    { item_source => 'piste' },
    { prefetch => [ qw/ piste / ] }
)->first->piste->name;
  • example: ./examples/slides/polymorphic_relationships.sh

Polymorphic Relationships?

Using search_related bridge having added the previous belongs_to:

package SkiResort::Model::LegacySchema::Result::Resort;

...

__PACKAGE__->has_many(
  resort_items => 'SkiResort::Model::LegacySchema::Result::ResortItem',
  { 'foreign.resort_id' => 'self.id' }
);

Allows:

$model->resultset( "Resort" )
    ->search_related( 'resort_items' )
    ->search_related( 'piste' )->first->name;
  • example: ./examples/slides/polymorphicrelationships_mtm.sh
  • note in showing the example that we only run one query
  • the query happens nowhere during search*
  • we can walk anywhere

Polymorphic Relationships?

And if you have many of these:

package SkiResort::Model::LegacySchema::Result::ResortItem;

...

foreach my $source ( qw/ piste lift / ) {

  __PACKAGE__->belongs_to(
    $source => 'SkiResort::Model::LegacySchema::Result::' . ucfirst( $source ),
    sub {
      my ( $args ) = @_;

      return {
        "$args->{self_alias}.item_source" => $source,
        "$args->{self_alias}.item_id"   => { -ident => "$args->{foreign_alias}.id" },
      };
    },
  );

}
  • example: ./examples/slides/polymorphic_relationships_lift.sh

Fixing Column Data

Inflating Column Data

We can fix data and/or get objects from column data.

DateTime - You'll almost certainly want this one:

__PACKAGE__->load_components(qw/InflateColumn::DateTime/);

When $column is a date, timestamp or datetime data type:

$model->resultset( "Foo" )->first->$column->subtract->( months => 1 )->ymd( '-' );

DATE_FORMAT(DATE_SUB(c,INTERVAL 1 MONTH),'%Y-%m-%d')

Stop using your RDMS for date calculations / localisation.

  • we can both inflate and deflate
  • e.g. ensuring data is encrypted
  • DBIx::Class::InflateColumn::*
  • a couple of these are core to DBIx::Class, e.g. InflateColumn::DateTime
    • has ways to handle timezones, locale, etc
  • stop usings RDMS, see next slide

DBIx::Class::Helper::ResultSet::DateMethods

# get count per year/month
$rs->search(undef, {
   columns => {
      count => '*',
      year  => $rs->dt_SQL_pluck({ -ident => '.start' }, 'year'),
      month => $rs->dt_SQL_pluck({ -ident => '.start' }, 'month'),
   },
   group_by => [
      $rs->dt_SQL_pluck({ -ident => '.start' }, 'year'),
      $rs->dt_SQL_pluck({ -ident => '.start' }, 'month'),
   ],
});

# mysql
(SELECT `me`.*, EXTRACT(MONTH FROM `me`.`start`), EXTRACT(YEAR FROM `me`.`start`)
FROM `HasDateOps` `me`
GROUP BY EXTRACT(YEAR FROM `me`.`start`), EXTRACT(MONTH FROM `me`.`start`))

# SQLite
(SELECT "me".*, STRFTIME('%m', "me"."start"), STRFTIME('%Y', "me"."start")
FROM "HasDateOps" "me"
GROUP BY STRFTIME('%Y', "me"."start"), STRFTIME('%m', "me"."start"))
  • if you already have lots of RDMS-side date code
  • and can't move away because you're dependent on the DB time sync, etc

Fixing Column Data With Filters

The resort table contains an active column that is a char(1).

It should be a boolean, so:

package SkiResort::Model::LegacySchema::Result::Resort;

...

__PACKAGE__->load_components( "FilterColumn" );

__PACKAGE__->filter_column( active => {
    filter_to_storage   => sub { return $_[1] ? 'Y' : 'N'; },
    filter_from_storage => sub { return defined $_[1] && $_[1] =~ /Y/i ? 1 : 0; },
});

Cleans up significantly:

if ( $model->resultset( "Resort" )->first->active ) {
    ...
}
  • DBIx::Class::FilterColumn
  • ./examples/slides/fixing_column_data_with_filters.sh
  • en/decryption of columns

Fixing Column Data With Filters

How about sanity checking a column that contains a CSV list?

__PACKAGE__->filter_column( column_with_csv => {

    filter_from_storage   => sub { return [ split( ',',$_[1] ) ]; },

    filter_to_storage => sub {
        my ( $self,$values ) = @_;

        foreach my $value ( @{ $values // [] } ) {
            # maybe we can check $value here - if it's supposed
            # to be a reference to another table then we could
            # check $value exists in the child table? if not we
            # throw an exception
        }

        return join( ',',@{ $values // [] } ) || undef;
    },
});
  • Text::CSV
  • Manually enforce referential integrity

Complex Queries

my $sql = "Some complex SQL we don't want to rewrite as SQL::Abstract";

Options:

  • Not a sign of a legacy schema
  • You can rewrite most (if not all) SQL, but you need to find a balance
  • DBIC_TRACE if you need to debug your SQL::Abstract struct (more later)

dbh_do

my @results = $schema->storage->dbh_do(
    sub {
        my ( $storage,$dbh,@binds ) = @_;

        # this is just pure DBI
        $dbh->selectrow_array( "Complex SQL Here",{},@binds );
    },
    @binds,
);
  • dbh_do, low level db handle (with exceptions)
  • straight out of the docs
  • using this rather than ->_dbh or ->dbh ensures correct exception handling and reconnection

Virtual Views

package SkiResort::Model::LegacySchema::Result::PistesForRating;

use base qw/DBIx::Class::Core/;

__PACKAGE__->table_class('DBIx::Class::ResultSource::View');
__PACKAGE__->table('pistes_for_rating');
__PACKAGE__->result_source_instance->is_virtual(1);
__PACKAGE__->add_columns( "name", {
    data_type     => "varchar( 255 )",
    default_value => \"null",
    is_nullable   => 1,
} );

__PACKAGE__->result_source_instance->view_definition( "
    SELECT piste.name
    FROM   piste
    WHERE  rating = ?
" );

1;
  • note the is_virtual flag

Virtual Views

$model->resultset( "PistesForRating" )
    ->search( {},{ bind => [ $rating ] } )
  • example: ./examples/slides/virtual_views.sh Easy
  • restricted to a specific number of binds, or are we? (next slide)

Virtual Views Extended

package SkiResort::Model::LegacySchema::Result::PistesForRatingMatchingString;

use base qw/SkiResort::Model::LegacySchema::Result::PistesForRating/;

__PACKAGE__->table('pistes_for_rating_like');
__PACKAGE__->result_source_instance->view_definition(
    __PACKAGE__->SUPER::result_source_instance->view_definition . "
    AND piste.name like ?
" );

1;

And:

$model->resultset( "PistesForRatingMatchingString" )
    ->search( {},{ bind => [ $rating,"%$string%" ] } )
  • you might want to avoid any pistes next to lakes, so:
  • example: ./examples/slides/virtual_views_extended.sh Easy Lac
  • virtual views are just classes, so we can trivially extend them

Reports

DBIx::Class::Report

  • Just link and show perldoc here, explain ALPHA code
  • It's just building up virtual views at runtime

Query Tracing and Profiling

  • DBIC_TRACE=1 or DBIC_TRACE=1=/path/to/output/file
  • DBIC_TRACE_PROFILE=console - for pretty printing
  • DBI_TRACE=1 # 2,3,SQL,... - if you're using dbh_do
use Carp qw/ cluck longmess shortmess /;

sub resultset {
    my ( $self,$table ) = @_;

    if ( my $trace = $ENV{DBIC_TRACE} ) {
        my ( $level,$trace_file ) = split( /=/,$trace );
        my $mess = $level == 1
            ? shortmess( "RESULTSET: $table" )
            : longmess( "RESULTSET: $table" );
        if ( $trace_file ) {
            open( my $fh,'>>',$trace_file ) || cluck( ... );
            print $fh $mess;
            close( $fh );
        } else {
            print STDERR $mess;
        }
    }

    return $self->SUPER::resultset( $table );
}
  • what i've been using in the examples
  • console_monochrome if you don't want colours/want to see exact bind sites
  • put the sub in your Schema.pm module (or base class)
  • shows you potential ->resultset calls missing prefetch
  • as seen in examples, e.g. ./examples/slides/adding_relationships.sh
  • been meaning to write something to parse the out of this, but...

Connection Options

UTF-8

mysql_enable_utf8 => 1,

Almost certainly

mysql_auto_reconnect => 1,

If only new code is going to use the dbic route:

$schema->storage->on_connect_do( [
    "SET sql_mode=STRICT_ALL_TABLES",
] );
  • or whatever the flag for other db engines is
  • see the perldoc for your DBD driver
  • STRICT_ALL_TABLES - for mysql

Gotchas

Prefetch prefetch prefetch.

Test your app with a representative dataset (where possible).

Keep your ResultSource classes up to date.

  • "ORMs are slow", no you're probably not using it correctly
  • representative data set + DBIC_TRACE to find optimisation points
  • up to date using the script shown in "Generating Classes"

In Summary

  • Thin controllers, fat model
    • The ORM is not your model
    • Use it in your model
  • DBIx::Class can be thought of as a toolkit not just an ORM, nor an opinionated framework
  • Define relationships even if they don't exist as FKs in the schema
  • Filters can help you fix problems with data
    • So you don't need to faff about downstream
    • And give you objects
    • Or you can sanity check data before it gets into the db
  • You can get to the low level handle if you have a complex query
    • Or use virtual views so you DRY
    • And again get objects

Stuff I Didn't Cover

Logging

Deployment

Helpers

  • show metacpan page for Helpers (there are loads)

There's Loads Of Other Stuff

curl -XPOST api.metacpan.org/v0/release/_search?size=100 -d '{
    "query": {
        "wildcard" : {
            "release.distribution" : "DBIx-Class*"
        }
    },
    "size" : 5000,
    "filter" : {
        "term" : {
            "status" : "latest"
        }
    },
    "fields": [
        "release.distribution",
        "release.date",
        "provides"
    ]
}' | jq -r '.hits.hits[].fields.distribution' | sort
  • 100 distributions of the form DBIx::Class:: when i last checked

Questions?

Links and resources:

Battling a Legacy Schema With DBIx::Class Lee Johnson YAPC::{NA,EU} 2016 who has experience with DBIx::Class? with what they would call a legacy schema? there will be live demos! suggest following along on screen so code examples are clear