On Github makmanalp / simpleflake_presentation
Created by Mali Akmanalp / @makmanalp
Mali Akmanalp Platform / Data Engineer @Custommade
Simple, right?
let's say you have a database of silly cats ...
cat1, cat2, cat3 ...
CREATE TABLE kittens ( `id` BIGINT AUTO_INCREMENT, PRIMARY KEY(`id`) );
But when you start getting more data ...
But now all your nodes have to talk to (and maybe wait on) each other!
Still goes over the net
More infrastructure
SPOF (kinda)
>>> import uuid >>> uuid.uuid4() UUID('74691173-e69d-440d-8172-dd63c97d1e87')
standard
great language / db support
but ...
236abc75-f7e5-11e2-bc8a-b88d1204f9a2
Number of 100-nanosecond intervals since the adoption of the Gregorian calendar in the West.
and besides ...236abc75-f7e5-11e2-bc8a-b88d1204f9a2
3 2 1
10765432100123456789
Timestamp + Machine ID + Sequence Number
more infrastructure ...
I'm pretty sure our ops guy Wes time travels to handle what's already on his plate.
10765432100123456789
Timestamp (41b) + Random Number (23b)
>>> from simpleflake import simpleflake >>> simpleflake() 3594162604452825250LEasy as pie
>>> from simpleflake import parse_simpleflake >>> parse_simpleflake(3594162604452825250L) SimpleFlake(timestamp=1375160370.606, random_bits=6768802L)
Chances of collision at 100 inserts / sec.
1.0787 x 10^-9At avg. 100 inserts / second, chances you'll get at least two in the same millisecond:
PDF[PoissonDistribution[0.1], 2] = 0.00452419For two requests in the same millisecond, chances you'll get the same number out of 2^23:
n^2/2m = (2)^2/(2*2^23) = 2.3842 × 10^-7Totally backwards compatible with snowflake.
But they're a pain ...