a-complex-orm-faster-than-sql



a-complex-orm-faster-than-sql

0 11


a-complex-orm-faster-than-sql

:sparkles: :boom: :fire: :zap:

On Github Ocramius / a-complex-orm-faster-than-sql

Marco Pivetta

I annoy people on twitter as @Ocramius.

I write dangerous software on Github.

I contribute to

WTH is Doctrine?

An incubator for persistence-oriented libraries

WTH is Doctrine 2 ORM?

Object Relational Mapper

Inspired by Hibernate and the JPA (JSR-317)

Based on a DBAL (DataBase Abstraction Layer)

Saves/Loads POPO through an SQL RDBMS

What is a POPO?

POPO enable DDD

As long as your domain logic is free from dependencies, you can model it as you prefer

You don't really need Doctrine ORM for that, but it helps.

Not a talk about DDD

If you want more of this, check DDDinPHP!

Using the ORM's EntityManager

Persisting an Invoice

Requires precise order of operations!

COMPLICATED!

1 - save the Address

INSERT INTO address
(street, city, zip, country)
VALUES
('Zemědělská 1', 'Brno', '61300', 'CZ');

2 - save the User

INSERT INTO user
(username, name, surname, address_id)
VALUES
('Ocramius', 'Marco', 'Pivetta', LAST_INSERT_ID());

3 - save the Invoice

INSERT INTO invoice
(created, user_id)
VALUES
(NOW(), LAST_INSERT_ID());

4 - save the Line Items

SET @invoice_id = LAST_INSERT_ID();
INSERT INTO line_item
(invoice_id, product_id)
VALUES
(@invoice_id, 123),
(@invoice_id, 456);

Operation Order = bugs!

Saving an Invoice with the ORM

$user = new User(
    'Ocramius',
    'Marco',
    'Pivetta',
    new Address('Zemědělská 1', 'Brno', '61300', 'CZ')
);

$product1 = $entityManager->find('Product', 123);
$product2 = $entityManager->find('Product', 456);

$invoice = new Invoice($user, [
    new LineItem($product1),
    new LineItem($product2)
]);

$entityManager->persist($invoice);
$entityManager->flush();

The ORM is also transactional

Calling Doctrine\ORM\EntityManager#flush() with invalid data causes a ROLLBACK and an exception.

ORMs are cool, but...

ORMs are SLOW

Abstraction costs

It's pretty much always at the Performance cost.

ORMs are (typically) leaky abstractions

Systems like Hibernate are intentionally designed as "leaky abstractions" so that it's possible to easily mix in native SQL where necessary. Gavin King

An API for every use-case

Why hit screws with a hammer?

Why use nails with a screwdriver?

Pure OOP API

$user = $entityManager->find('User', 123);

$entityManager->remove($user);
$entityManager->persist($thing);

$entityManager->flush();
$user        = $repository->find(123);
$activeUsers = $repository->findBy(['active' => true]);

Pure OOP graph Traversal

$user = $entityManager->find('User', 123);

$city = $user->getAddress()->getCity();

Criteria API

$adultBirthDate = new DateTime('-18 years');

$adultsCriteria = new Doctrine\Common\Collections\Criteria();

$adultsCriteria->andWhere(
    $criteria->expr()->lte('birthDate', $adultBirthDate)
);

$adults = $repository->matching($criteria);

DQL: Doctrine Query Language

$users = $entityManager->createQuery('
SELECT
    u
FROM
    Users u
JOIN
    u.address a
JOIN
    a.city c
WHERE
    c.name = :cityName
    ')
    ->setParameter('cityName', 'Brno)
    ->getResult();

Native SQL

(if all else fails, and you use voodoo)

$query = $entityManager->createNativeQuery(
    <<<'SQL'
WITH RECURSIVE t(n) AS (
    VALUES (1)
    UNION ALL
    SELECT n + 1 FROM t WHERE n < 100
)
SELECT sum(n) AS foo FROM t;
'SQL'
    );

$rsm = new ResultSetMappingBuilder($entityManager);

$rsm->addScalarResult('foo', 'sum');
$query->setResultSetMapping($rsm);


$sum = $query->getResult();

Some ORM internals knowledge needed!

The ORM Hydrates and Extracts data

Hydration

Extraction

Everything you read from DB is kept in memory

Flushing is expensive!

When you call EntityManager#flush(), the UnitOfWork is iterated!

Only 1 object per ID in memory

You can avoid a lot of queries!

Objects must be in a valid state

Caching

Caching is at the core of the ORM architecture

Doctrine ORM has 4 caches

Metadata Cache

You need this in production!

Caches ORM bootstrapping information

Query Cache

Caches query generation:

SELECT u FROM Doctrine\Tests\Models\CMS\CmsUser u
SELECT
    c0_.id AS id_0,
    c0_.status AS status_1,
    c0_.username AS username_2,
    c0_.name AS name_3
FROM cms_users c0_

Do not use it with dynamic DQL!

Result Cache

Caches query results:

$usersCount = $entityManager
    ->createQuery('SELECT COUNT(u) FROM User u')
    ->setResultCacheDriver($redisCache)
    ->useResultCache(true)
    ->setResultCacheLifeTime(7200)
    ->getResult();

Cache ID depends on query string and parameters!

Second Level Cache

Also called SLC or L2 Cache

New as of Doctrine ORM 2.5-DEV *

* To be released soon

Why the SLC?

SQL RDBMS are slow at executing simple fetches

SELECT * FROM articles WHERE id = :id

That's part of why MongoDB, CouchDB and other NoSQL DBs are very successful!

What is a L2 Cache?

How does the SLC work?

Saves entity state in the DB and in the Cache

Tries to load entities from cache first

Assumptions:

Only the ORM writes to DB

Inserts and updates handled by EntityManager#flush()

Setting up the SLC:

When setting up the ORM:

$configuration->setSecondLevelCacheEnabled();

$configuration
    ->getSecondLevelCacheConfiguration()
    ->setCacheFactory(new DefaultCacheFactory(
        new RegionsConfiguration(),
        new RedisCache()
    ));

Mark your entities as "cacheable"

/**
 * @Cache(usage="NONSTRICT_READ_WRITE")
 */
class MyEntity
{
    // ...
}

SLC caching modes

READ_ONLY - no updates allowed, good for immutable data READ_WRITE - doctrine locks the cache internally, preventing race conditions NONSTRICT_READ_WRITE - allows updates, no locking, for sites with rare updates

SLC regions

Very useful when different data types require different cache backends

DEMO TIME!

AWESOME, HUH?

CONCLUSION:

You saw the tools, so please, don't do this:

Bad ORM usage:

$users = $entityManager->getRepository('User')->findAll();

$approvedUsers = array_filter(
    $users,
    function (User $user) {
        return $user->getEmailVerification()->isApproved();
    }
);

Know the tool, and how to use it

Also know when to not use an ORM

  • Reporting
  • Dealing with huge amounts of data
  • Dealing with dynamic data types in the schema
ORMs don't kill your DB: developers do Guilherme Blanco

Thanks for listening!

https://joind.in/16256

@Ocramius