About Me
Felipe Fernández
Some craftsmanship related topics
- BDD
- Outside-in development
- Focus on delivery
- Minimise waste
About the talk
- 1. Cassandra rationale
- 2. Cassandra internals
- 3. Data modeling
1. Cassandra rationale
- Modern problems like: scalability, high availability, locality, low latency
- Scalability because of internet scale
- HA because of internet timezones
- Locality because of speed of light
- Low latency because of millenials
Cassandra rationale
RDBMS Approach to modern world challenges
- Third normal form
- Favour space over time
- Stack overflow to the rescue
- NF1: A table cell must not contain more than one value.
- NF2: NF1, plus all non-primary-key columns must depend on all primary key columns.
- NF3: NF2, plus non-primary key columns may not depend on each other.
Cassandra rationale
ACID Properties
- Atomicity
- Consistency
- Isolation
- Durability
- ACID properties are referred to a single operation or to transaction
- Those properties are a safety net, but they are not for free
- They are really hard to implement in a distributed system.
Cassandra rationale
Relational Data Modeling
- Denormalisation
- Referential integrity
- Strong uniqueness
- Indexes
- Joins
- Aggregations
- Our tables have strong connection with our entity/relationship model
- Application can be more relaxed about checking referential integrity as the DB will fail explicitly
- Primary key serves for the purpose of uniqueness, again DB will fail
- Indexes give us flexibility when we design our tables, as we can add them whenever we want
- If the data is not contained in a single table we need to join some of them, creating some locks and likely contention over those resources
- There are several aggregation functions like: min, max, count, sum...
Cassandra rationale
RDBMS Problems
- Scale up
- Sharding
- Availability
- Locality
- Usual approach of RDBMS is scaling up buying more resources for a single machine, that's expensive and hits a limit quite soon
- A database shard is a horizontal partition of data. Sharding makes the system tolerant to partitioning
- If you stick with spof is not possible to achive HA. Distributed ACID is really hard to achieve.
- Again if you have a single instance you can't colocate data to different regions
Cassandra rationale
Cassandra to the rescue
- Distributed peer to peer architecture
- Scale out
- Low latency
- CAP theorem
- Non master/slave, so no spof.
- Commodity servers, perfect for cloud computing
- Denormalisation and tables based on queries. Time over space trade-off
- It drops consistency over partition tolerancy and availability. It provides tunable consistency though.
Cassandra rationale
Right tool for the right job
- Big data
- Higher storage costs
- Change mindset
- If your data fits in a machine is not big data. Avoid the overkill and complications of distributed systems if possible
- Storage is cheaper and cheaper, but it's important to note that we're trading time for space, no such thing as free lunch
- You can't simply plug-in cassandra into your old rdbms based system. A massive refactor needs to be done.
2. Cassandra internals
- Based on Google BigTable and Amazon Dynamo.
- Used by Netflix or Spotify.
- Open source project backed by Datastax.
Cassandra internals
Distribution internals
- Data is partitioned around the ring
- Coordinator node
- Replication factor
- Consistency level
- Every partition has a key, using some hash algorithm, we can get the location of the node where that data will live.
- Clients talk with a coordinator node, that is randomly assigned every time. That coordinator apply the hash function.
- RF per keyspace. We need to have different replicas to ensure high availability even when some node is down.
- CL per query/command. Coordinator will wait until required number of nodes ack the request. Could be ONE, QUORUM, ALL...
Cassandra internals
Distribution internals
- Column family
- Keyspace
- Node
- Rack
- Data center
- Cluster
- Or table. Cassandra has some confusing names around tables, keys and so on. In the end, collection of partitions.
- Collection of CF. RF is set at this level. Schema or namespace.
- Could be a host. Every node has a range in the partition ring, so lookups can be extremely fast.
- Availability zones in AWS. Your replication strategy should be aware of this and not placing replicas on the same rack, otherwise is pointless.
- Could be one in USA and another one in Ireland. RF works like a mirror, and there are CL that address this scenario.
- The whole cassandra system.
Cassandra internals
Storage internals: Write path
- Commit log
- Memtable
- SSTable
- Compaction
- Tombstones
- Append-only structure. Serves for durability in case of crash. Kind of a backup of memtable
- Second write. One per CF. Serves like a first cache for sstables
- SStable are writen when memtable are full so they are flushed into disk
- Merging SStables to make the read faster, to remove old data and to remove deleted data.
- Those are markers for deletion, that will happen on compaction
- No writes or updates in place. Append-only immutable structures allowing sequential disk or in-memory ones. Extremely fast
Cassandra internals
Storage internals: Read path
- Timestamps
- Read from Memtable and SSTables
- Caches
- It's important to have a synchronised clock in different hosts. Timestamp is used for read resolution.
- It chooses last cells/colums/values depending on timestamp.
- There are a lot of caches that avoid the access to disk, making reads extremely fast.
3. Data modeling
- Without a proper data modeling Cassandra won't be able to deliver expected perfomance
- Some of the ideas from relational world are still valuable
- With this section we'll understand why Cassandra aligns with previously presented Craftsmanship concepts.
Data modeling
Partition
- Partition key
- Clustering key
- Primary key
- Remember that a CF is collection of partitions. Partition key defines in which node and in which unit of a CF our data is going to live
- Cassandra has the concept of nesting data. Those are cluster around this key. This key also defines the internal order of the rows inside a partition
- Primary key is composed of previous keys. That primary key ensures uniqueness of data, but through upserts.
- Table is a two-dimensional view of a multi-dimensional column family.
Data modeling
Anti patterns
- Secondary indexes
- Join tables
- Allow filtering
- We can equality query by partition key and inequality query by clustering keys. Using secondary indexes into regular colums has a great performance penalty as it needs to go to different nodes to find the data.
- Joins are not even possible in Cassandra. You can do application joins, but those are disencouraged too.
- Kind of table scan, so only useful for development purposes
Data modeling
Conceptual Modeling
- Entity-relationship diagrams
- Chen notation
- Technology agnostic
- Set up entities, relationships, cardinality between them and attributes.
- This analysis can be done disregarding Cassandra.
Data modeling
Logical Modeling
- Query based approach
- Chebotko notation
- Mapping rules and patterns
- We need to think in our queries or workloads. Based on that we can define different tables for different needs
- Different cardinalities usually have same logical designs. Uniqueness, ordering and key queries are easy to extract in patterns too.
Data modeling
Physical modeling
- Fleshing out details
- Optimizations
- Providing types for the columns
- Partition too big
- Duplication in linear growth
- Table too big or too small
Data modeling
Mindset change
- Data consistency
- Transactions
- Concurrent access
- As we have duplications we need to keep consistent the data. You need to it at application level as Cassandra doesn't give you that.
- You could use logged batches, but they're a performance penalty.
- You could use counters or lighweight transactions. But probably the better is duplicate the data in different partitions.
Cassandra for craftsmen developers
Created by Felipe Fernández / @felipefzdz