On Github ukmadlz / no-right-way
A discussion on Databases and a guide to CouchDB and Cloudant
ANY VIEWS OR OPINIONS EXPRESSED IN THIS PRESENTATION ARE THOSE OF THE AUTHOR, AND DO NOT NECESSARILY REPRESENT OFFICIAL POSITIONS, STRATEGIES OR OPINIONS OF INTERNATIONAL BUSINESS MACHINES (IBM) CORPORATION.
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.
WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED.
IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.
IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS); ORALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENT GOVERNING THE USE OF IBM SOFTWARE.
Currently Databases are classified as either Relational or NoSQL
NoSQL itself is just a capture term to describe, none relational data structures.
As with RDBs having multiple flavours and brands
The main separators from RDB to NoSQL, apart from the obvious lack of SQL
Variation: Graph, XML, JSON & triples
New query languages: you can have an alternative to SQL, so possibly simpler or structure-specific
'Schema less': no rigid schema enforced by the DBMS
Programmer friendly (we hope): easily to programmatically navigate the structure
May not guarantee full ACID behavior
Atomicity – Consistency – Isolation – Durability
May have a distributed, fault-tolerant, elastic architecture
Data Model Flexibility
Elastic (automatic) scale in/out
Lower-cost operational data management platform for thousands & millions of users
Data Model Flexibility
Elastic (automatic) scale in/out
Lower-cost operational data management platform for thousands millions of users
CAP Theorem, also knows as Brewer's Theorem from Eric Brewer
Given a data partition, will you prioritize consistency or availability?
When to use Relational Database Management Systems (RDBMS)
According to the CAP Theorem, it is impossible for relational databases to be partition tolerant
Most NoSQL technologies have been built for scale, which means that they fit either a CP or AP model
Popular datastores like CouchBase & Mongo try to be CP, but fallback to AP when things get tough
CouchDB follows an AP approach from the start using an eventual consistency
Key Value Stores
Columnar Stores
Document Stores - 50% of NoSQL DBs are document based
Graph Store
{ "firstName“ : "John", "lastName" : "Smith", "age“ : 25, "address" : { "streetAddress“ : "21 2nd Street", "city" : "New York", "state“ : "NY", "postalCode" : "10021" }, "phoneNumber": [ { "type" : "home", "number“ : "212 555-1234" }, { "type" : "fax", "number“ : "646 555-4567" } ] }
And they all use JSON or some derivative, that's basically JSON but a different name
As I said, all these datastore use or fallback to an AP approach. Which means…
Instead of prioritizing consistency and availability, shift focus towards ensuring availability and partition tolerance
This comes at the cost of a weakened consistency model
It's quick
It's easy to use
db.unicorns.insert({name: 'Aurora', gender: 'f', weight: 450}); db.unicorns.find(); db.system.indexes.find();
It's quick, to get started you:
Written around JS, which is why it works AMAZINGLY within the MEAN stack, means simple chained
It works as a HTTP API
It's eventually consistant
It's managed - if you use Cloudant
As all the NoSQL DB's have merits and they work well when implemented well, they just aren't cut out for everything
With the fact that SQL takes Consistency and Availability as it's primary factors it makes it the only real choice for sensitive transactional data
For example: financial information, the availability must be there to read data, but to change the information for taking money out it must be consistant (don't want money out in 2 locations)
Also for all the Big Data stuff you REALLY can just use Sqoop and process it in Hadoop separately