nosql-is-a-lie-osdc-2016



nosql-is-a-lie-osdc-2016

0 0


nosql-is-a-lie-osdc-2016


On Github ukmadlz / nosql-is-a-lie-osdc-2016

/ ukmadlz

NoSQL is a lie

Legal Disclaimer

  • © IBM Corporation 2015. All Rights Reserved.
  • The information contained in this publication is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this publication, it is provided AS IS without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this publication or any other materials. Nothing contained in this publication is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software.
  • References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results.
  • If the text contains performance statistics or references to benchmarks, insert the following language; otherwise delete: Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
  • If the text includes any customer examples, please confirm we have prior written approval from such customer and insert the following language; otherwise delete: All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer.
  • Please review text for proper trademark attribution of IBM products. At first use, each product name must be the full name and include appropriate trademark symbols (e.g., IBM Lotus® Sametime® Unyte™). Subsequent references can drop “IBM” but should include the proper branding (e.g., Lotus Sametime Gateway, or WebSphere Application Server). Please refer to http://www.ibm.com/legal/copytrade.shtml for guidance on which trademarks require the ® or ™ symbol. Do not use abbreviations for IBM product names in your presentation. All product names must be used as adjectives rather than nouns. Please list all of the trademarks that you use in your presentation as follows; delete any not included in your presentation. IBM, the IBM logo, Lotus, Lotus Notes, Notes, Domino, Quickr, Sametime, WebSphere, UC2, PartnerWorld and Lotusphere are trademarks of International Business Machines Corporation in the United States, other countries, or both. Unyte is a trademark of WebDialogs, Inc., in the United States, other countries, or both.
  • If you reference Adobe® in the text, please mark the first use and include the following; otherwise delete: Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries.
  • If you reference Java™ in the text, please mark the first use and include the following; otherwise delete: Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.
  • If you reference Microsoft® and/or Windows® in the text, please mark the first use and include the following, as applicable; otherwise delete: Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both.
  • If you reference Intel® and/or any of the following Intel products in the text, please mark the first use and include those that you use as follows; otherwise delete: Intel, Intel Centrino, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
  • If you reference UNIX® in the text, please mark the first use and include the following; otherwise delete: UNIX is a registered trademark of The Open Group in the United States and other countries.
  • If you reference Linux® in your presentation, please mark the first use and include the following; otherwise delete: Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others.
  • If the text/graphics include screenshots, no actual IBM employee names may be used (even your own), if your screenshots include fictitious company names (e.g., Renovations, Zeta Bank, Acme) please update and insert the following; otherwise delete: All references to [insert fictitious company name] refer to a fictitious company and are used for illustration purposes only.

Who am I?

Mike Elsmore

Developer Advocate

mike.elsmore@uk.ibm.com

< rant >

NoSQL

- Originally coined by Carlo Strozzi in 1998 for Strozzi_NoSQL, which was relational but without SQL interface

Catch All

try {
    SQL
} catch (Exception $e) {
    Must be NoSQL
}

Not SQL

It's a nasty backronym

- Reintroduced by Johan Oskarsson of Last.fm in 2009 to describe all upcoming DB tech

SQL on NoSQL

SPARQL

CQL (Cassandra Query Language)

Couchbase SQL

Schemaless

Yes, they'll accept anything…but

Schema

Because how else do you know what you’re getting out?

No NoSQL Experts

Many Primary Types

So many distinct types of databases

"X" Expert

</ rant >

Enter CAP Theorem

Consistency, Availability and Partition Tolerance

Consistent

In = Out

Available

Partition Tolerant

Know's where to look

Pick two?

Uncertainty principle

Why is this important?

History

Consistent & Available

Ignoring the Partition Tolerance by being in the same place

Distributed Systems and Databases

Needs to know what machine X data is on

Partition Tolerance & ___________

Design Decision

The reason why most NoSQL Databases are either AP or CP

NewSQL

Google Spanner

CockroachDB

Other Database Types

Object, Tabular, Tuple, Triple/Quad store (RDF), Multimodel, Etc

Skipping Multi-model

Different indexing & lookups on the same stored data.

Key Value Datastores

Popular Key-Value Datastores

Redis, Memcached, Riak KV, Hazelcast, Ehcache, Aerospike, Oracle Coherence, Berkeley DB, Amazon SimpleDB, Oracle NoSQL, Infinispan, LevelDB, GridGain, ZODB, GT.M, NCache, RocksDB, WiredTiger, WebSphere eXtreme Scale, Tokyo Cabinet, Project Voldemort, XAP, Hibari, MapDB, Tokyo Tyrant, STSdb, Scalaris, GlobalsDB, HyperDex, Kyoto Cabinet, Tarantool, LightCloud, ScaleOut StateServer, Upscaledb, Quasardb, Bangdb, BergDB, Cachelot.io, CodernityDB, CortexDB, Elliptics, Helium, HyperLevelDB, Kyoto Tycoon, LedisDB, Nanolat, Resin, TomP2P

What is a Key-Value Datastore?

- data storage paradigm designed for storing, retrieving, and managing associative arrays, a data structure more commonly known today as a dictionary or hash - Dictionaries contain a collection of objects, or records, which in turn have many different fields within them, each containing data - Usually an AP system

Why use a Key-Value store?

- simple model makes them simple to use and powerful Use Cases - Session Stores - Fast search lookups - Queues

Document Datastores

- Considered to be a subclass of key-value - That means: - Records do not need to have a uniform structure, i.e. different records may have different columns. - The types of the values ​​of individual columns can be different for each record. - Columns can have more than one value (arrays). - Records can have a nested structure. - Document stores often use internal notations, which can be processed directly in applications, mostly JSON. JSON documents of course can also be stored as pure text in key-value stores or relational database systems. That would, however, require client-side processing of the structures, which has the disadvantage that the features offered by document stores (such as secondary indexes) are not available.

Popular Document Datastores

Cloudant, CouchDB, MongoDB, Couchbase, RethinkDB, RavenDB, GemFire, PouchDB, Microsoft Azure DocumentDB, Datameer, CloudKit, Mnesia, Google Cloud Datastore, TokuMX, Clusterpoint, Terrastore, RaptorDB, EJDB, SisoDb, WhiteDB, Sequoiadb, JasDB, LokiJS, DensoDB, Djondb, FaunaDB, FleetDB, SenseiDB

What is a Document Datastore?

- Considered a subclass of key-value datastores - Relies on the document to provide the meta data to optimise and build further queries - Uses techniques like MapReduce to query - Uses search systems like Apache Lucene for advanced querying

Why use a Document Datastore?

- Operational Datastore - Flexibility in changing the data model whilst presenting the same responses - The majority are designed with AP in mind - Once you model around eventual consistency your about have a lot of reads and writes

Column Datastores

- Wide Column Datastores - Wide column stores, also called extensible record stores, store data in records with an ability to hold very large numbers of dynamic columns. Since the column names as well as the record keys are not fixed, and since a record can have billions of columns, wide column stores can be seen as two-dimensional key-value stores. - Wide column stores share the chracteristic of being schema-free with document stores, however the implementation is very different.

Popular Column Datastores

Cassandra, HBase, Accumulo, Hypertable, Google Cloud Bigtable, ScyllaDB

What is a Column Datastore?

- Does use tables, rows and columns for the storage model - Kinda relational - However the names and format of columns can change between rows

Why use a Column Datastore?

- Can use it for operational storage - But due to how you model it, Relational DBs - Is amazing for timeseries - Massive distribution - Network failure

Graph Datastores

- Graph DBMS, also called graph-oriented DBMS or graph database, represent data in graph structures as nodes and edges, which are relationships between nodes. They allow easy processing of data in that form, and simple calculation of specific properties of the graph, such as the number of steps needed to get from one node to another node. - Graph DBMSs usually don't provide indexes on all nodes, direct access to nodes based on attribute values is not possible in these cases.

Popular Graph Datastores

Neo4j, Titan, Giraph, InfiniteGraph, Sparksee, HyperGraphDB, FlockDB, VelocityGraph, InfoGrid, GraphBase

What are Graph Datastores?

- Well it's graph structures - nodes -> edges - Take advantage of distributed computing to cope with X Million+ relationships - Relies more on the relationships than the meta data

Why use Graph Datastores?

- You can do the same in SQL, if the traversal is fixed - Allows for complex iterative and cyclical queries Best use cases - Social Graphs (erdos) - Recommendation engines - Fraud detection

Questions?

© IBM Corporation 2016. All Rights Reserved.