No Right Way

A discussion on Databases and a guide to CouchDB and Cloudant

ANY VIEWS OR OPINIONS EXPRESSED IN THIS PRESENTATION ARE THOSE OF THE AUTHOR, AND DO NOT NECESSARILY REPRESENT OFFICIAL POSITIONS, STRATEGIES OR OPINIONS OF INTERNATIONAL BUSINESS MACHINES (IBM) CORPORATION.

THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.

WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED.

IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.

IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS); ORALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENT GOVERNING THE USE OF IBM SOFTWARE.

Who the are you…?

Mike Elsmore

Developer Advocate for Cloudant

mike.elsmore@uk.ibm.com

@ukmadlz

Kinds of database

Currently Databases are classified as either Relational or NoSQL

NoSQL itself is just a capture term to describe, none relational data structures.

As with RDBs having multiple flavours and brands

Relational Databases

MySQL
MariaDB
MS SQL
Oracle
IBM DB2
PostGres

NoSQL

MongoDB
CouchDB
CouchBase
Riak
Cassandra

What is NoSQL

The main separators from RDB to NoSQL, apart from the obvious lack of SQL

Variation: Graph, XML, JSON & triples

New query languages: you can have an alternative to SQL, so possibly simpler or structure-specific

'Schema less': no rigid schema enforced by the DBMS

Programmer friendly (we hope): easily to programmatically navigate the structure

May not guarantee full ACID behavior

Atomicity – Consistency – Isolation – Durability

May have a distributed, fault-tolerant, elastic architecture

What's the appeal

Data Model Flexibility

Elastic (automatic) scale in/out

Lower-cost operational data management platform for thousands & millions of users

Data Model Flexibility

Data models that are native to the application space (e.g. JSON)
No “schema-first” requirement: rapid and agile development process

Elastic (automatic) scale in/out

Easy elasticity and scalability to multiple racks (10s to 100s of severs)
Supports dynamic workloads
Optimized for web scale and extreme performance
Ease of replication

Lower-cost operational data management platform for thousands  millions of users

Increase in volumes of data, retention requirements (3-15 years)
Commodity hardware and pay-for services
Fault-tolerance and high availability

CAP Theorem

Taken from

CAP Theorem, also knows as Brewer's Theorem from Eric Brewer

Consistency: all nodes belonging to system see the same data at the same time
Availability: a guarantee that every request receives a response of success or failure
Partition tolerance: system continues to operate despite message loss or failure of parts of system

Given a data partition, will you prioritize consistency or availability?

And this applies to Relation how?

When to use Relational Database Management Systems (RDBMS)

Data normalization is critical for elimination of redundancy and ensuring master data consistency
Many “justifications” for using relational databases are cultural, not technical e.g. Solution already built with relational DB  resistance to change
Prioritization of data availability and consistency lends RDBMS well to handling transactional, reporting, log, and warehouse data
Analytics and BI tooling are valid reasons for maintaining a relational database, but this too is rapidly changing

According to the CAP Theorem, it is impossible for relational databases to be partition tolerant

So long as RDBMS prioritizes availability and consistency, they are unable to scale out (horizontally)!
Vertical scaling is the alternative, but this practice becomes prohibitively expensive and is not sustainable

And NoSQL fits in here…?

Most NoSQL technologies have been built for scale, which means that they fit either a CP or AP model

Popular datastores like CouchBase & Mongo try to be CP, but fallback to AP when things get tough

CouchDB follows an AP approach from the start using an eventual consistency

Key Value Stores

Columnar Stores

Graph Stores

Document Stores

Key Value Stores

Cassandra
Riak
MemcacheDB
HBase
pickleDB

Columnar Stores

Cassandra
HBase

Document Stores - 50% of NoSQL DBs are document based

CouchDB and because of it Cloudant
MongoDB
Redis
CouchBase
Engine Yard

Graph Store

*dex
Neo4j
InfiniteGraph
Sesame

                
        {
             "firstName“ : "John",
             "lastName" : "Smith",
             "age“ : 25,
             "address" :
             {
                 "streetAddress“ : "21 2nd Street",
                 "city" : "New York",
                 "state“ : "NY",
                 "postalCode" : "10021"
             },
             "phoneNumber":
             [
                 {
                   "type" : "home",
                   "number“ : "212 555-1234"
                 },
                 {
                   "type" : "fax",
                   "number“ : "646 555-4567"
                 }
             ]
         }

And they all use JSON or some derivative, that's basically JSON but a different name

How does this apply to CAP theorem

As I said, all these datastore use or fallback to an AP approach. Which means…

Instead of prioritizing consistency and availability, shift focus towards ensuring availability and partition tolerance

Unlikely to find a scenario where loss in availability would be tolerable
Selecting for partition tolerance opens up possibility for horizontal scaling!
Distribution over cluster also improves availability In aggregate the cluster is more reliable than the individual nodes that comprise it

This comes at the cost of a weakened consistency model

Can no longer guarantee that all nodes (and clients connected to these nodes) share identical versions of the same data at a given moment
The result is an “eventual consistency” model The premise that all nodes in a distributed system will eventually share the same versioning of all data, given sufficient time

Why use MongoDB

It's quick

It's easy to use

            
              db.unicorns.insert({name: 'Aurora', gender: 'f', weight: 450});
              db.unicorns.find();
              db.system.indexes.find();

It's quick, to get started you:

download the binaries (server and client)
create and set the config file
launch the DB binary

Written around JS, which is why it works AMAZINGLY within the MEAN stack, means simple chained

Why use Cloudant / CouchDB

It works as a HTTP API

It's eventually consistant

It's managed - if you use Cloudant

Why carry on with Relational

http://sqoop.apache.org/

As all the NoSQL DB's have merits and they work well when implemented well, they just aren't cut out for everything

With the fact that SQL takes Consistency and Availability as it's primary factors it makes it the only real choice for sensitive transactional data

For example: financial information, the availability must be there to read data, but to change the information for taking money out it must be consistant (don't want money out in 2 locations)

Also for all the Big Data stuff you REALLY can just use Sqoop and process it in Hadoop separately

The End

Slides

Kinds of database – CAP Theorem – Key Value Stores

ukmadlz

Kinds of database – CAP Theorem – Key Value Stores

0 0

no-right-way

No Right Way

Who the are you…?

Mike Elsmore

Developer Advocate for Cloudant

Kinds of database

Relational Databases

NoSQL

What is NoSQL

What's the appeal

CAP Theorem

And this applies to Relation how?

And NoSQL fits in here…?

Key Value Stores

Columnar Stores

Graph Stores

Document Stores

How does this apply to CAP theorem

Why use MongoDB

Why use Cloudant / CouchDB

Why carry on with Relational

The End

Kinds of database – CAP Theorem – Key Value Stores

ukmadlz

Kinds of database – CAP Theorem – Key Value Stores

0 0 (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/platform.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

no-right-way

No Right Way

Who the are you…?

Mike Elsmore

Developer Advocate for Cloudant

Kinds of database

Relational Databases

NoSQL

What is NoSQL

What's the appeal

CAP Theorem

And this applies to Relation how?

And NoSQL fits in here…?

Key Value Stores

Columnar Stores

Graph Stores

Document Stores

How does this apply to CAP theorem

Why use MongoDB

Why use Cloudant / CouchDB

Why carry on with Relational

The End

0 0