Why would we ever build a distributed computing platform in Node?



Why would we ever build a distributed computing platform in Node?

0 1


2015-NodeSummit

Slides for our presentation at NodeSummit 2015

On Github bithound / 2015-NodeSummit

Why would we ever build a distributed computing platform in Node?

Presented by Gord Tannerwww.bithound.io / @bithoundio

Who am I?

www.bithound.io

Gord Tanner (@gordtanner) Co Founder / CTO

Founding Team

http://www.communitech.ca/main-communitech/view-from-the-loo-bithound-house-builds-on-downtown-momentum/#.VNTzk8Z-9TY
          Dan, PJ and myself
          Kitchener Waterloo ontario Canada
          Old restored house in the downtown core.
          

What do we do?

Code quality, maintainability, and stability for all of your public and private project repositories.

www.bitHound.io

Problem:

                      > time jshint .

                      ---------------------
                      (lots of errors here)
                      ---------------------

                      7368 errors

                      real  0m29.321s
                      user  0m28.627s
                      sys 0m0.508s

          
- Do this for 1K+ commits in a project
- There are many more things to look at

Waiting sucks

We knew from day 1 this was a concurrent problem

concurrency

In computer science, concurrency is a property of systems in which several computations are executing simultaneously, and potentially interacting with each other. The computations may be executing on multiple cores in the same chip, preemptively time-shared threads on the same processor, or executed on physically separated processors.

Why JavaScript?

  • go: coroutines
  • c#, java
  • ruby, python
https://chrome.google.com/webstore/detail/ripple-emulator-beta/geelfhphabnejjhdalkjhgipohgpdnoc?hl=en

is

concurrency

In computer science, concurrency is a property of systems in which several computations are executing simultaneously, and potentially interacting with each other. The computations may be executing on multiple cores in the same chip, preemptively time-shared threads on the same processor, or executed on physically separated processors.

Shared Memory

Concurrent components communicate by altering the contents of shared memory locations (exemplified by Java and C#). This style of concurrent programming usually requires the application of some form of locking (e.g., mutexes, semaphores, or monitors) to coordinate between threads. A program that properly implements any of these is said to be thread-safe . http://en.wikipedia.org/wiki/Concurrent_computing

Message Passing

Concurrent components communicate by exchanging messages (exemplified by Scala, Erlang and occam). The exchange of messages may be carried out asynchronously, or may use a synchronous "rendezvous" style in which the sender blocks until the message is received. Asynchronous message passing may be reliable or unreliable (sometimes referred to as "send and pray"). Message-passing concurrency tends to be far easier to reason about than shared-memory concurrency, and is typically considered a more robust form of concurrent programming. http://en.wikipedia.org/wiki/Concurrent_computing
Node is a platform for easily building fast, scalable network applications. Node uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices. http://nodejs.org/

The cost of I/O

    L1-cache                  3 cycles
    L2-cache                 14 cycles
    RAM                     250 cycles
    Disk             41 000 000 cycles
    Network         240 000 000 cycles

          
http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/

Async IO

        function fibonacci(n) {
          if (n < 2)
            return 1;
          else
            return fibonacci(n-2) + fibonacci(n-1);
        }

        http.createServer(function (req, res) {
          res.writeHead(200, {'Content-Type': 'text/plain'});
          res.end(fibonacci(40));
        }).listen(1337, "127.0.0.1");
        
https://www.semitwist.com/mirror/node-js-is-cancer.html

Am i going to "create" another distributed framework?

"this is a very personal choice"

https://github.com/bithound/farm.bithound.io It is a very personal choice, choose the tools that work for you, the network design that works for you.
ØMQ looks like an embeddable networking library but acts like a concurrency framework.
                  .------------.        .--------------.
                  |            |        |              |
                  | TCP socket +------->|              | ZAP!
                  |            | BOOM!  |  0MQ socket  |
                  '------------'        |              |  POW!!
                    ^    ^    ^         |              |
                    |    |    |         '--------------'
                    |    |    |
                    |    |    |
                    |    |    '--------- Spandex
                    |    |
                    |    '-------------- Cosmic rays

                   Illegal radioisotopes from
                   secret Soviet atomic city
          

Sockets that carry atomic messages across various transports like:

  • in-process
  • inter-process (shared memory)
  • TCP
  • multicast

You can connect sockets N-to-N with patterns like

  • fan-out, fan-in
  • publish-subscribe
  • task distribution
  • request-reply

Tasks

  • Think of the atomic chunks of work in your app
  • You are probably wrong
  • understand your project
  • assess your applications needs
  • only then define your tasks
Build in a way that you can adapt.

Roles

                   .------------.             .------------.
                   |            |             |            |
                   |   MASTER   |             |   SLAVE    |
                   |            |             |            |
                   .------------.             .------------.

                   - starts jobs              - works on jobs
                   - breaks down work
                   - listens to status

          
Master = web app
     .------------. Hey, can you do this thing for me?  .------------.
     |            |------------------------------------>|            |
     |   MASTER   |  Sure! here you go buddy            |   SLAVE    |
     |            |<------------------------------------|            |
     '------------'                                     '------------'
          
                                                           .------------.
                                                          .------------.|
                                                         .------------.||
     .------------.  here are a bunch of things to do!  .------------.|||
     |            |------------------------------------>|            |||'
     |   MASTER   |  Sure! here you go buddy            |   SLAVE    ||'
     |            |<------------------------------------|            |'
     '------------'                                     '------------'
          
     .--------. do this!     .-------. thats a lot of work! .-------.
     |        |------------->|       |  can you help me?    |       |
     | MASTER |  here you go | SLAVE |--------------------->| SLAVE |
     |        |<-------------|       |<---------------------|       |
     '--------'              '-------'                      '-------'
          
          .what is the difference?.
          |                       |
          V                       V
     .--------. do this!     .-------. thats a lot of work! .-------.
     |        |------------->|       |  can you help me?    |       |
     | MASTER |  here you go | SLAVE |--------------------->| SLAVE |
     |        |<-------------|       |<---------------------|       |
     '--------'              '-------'                      '-------'
     
          

There is no difference

     .--------. do this!     .--------. thats a lot of work! .--------.
     |        |------------->|        |  can you help me?    |        |
     | WORKER |  here you go | WORKER |--------------------->| WORKER |
     |        |<-------------|        |<---------------------|        |
     '--------'              '--------'                      '--------'
     
          
Except some people jsut want to tell others to do work and not work themselves (ie our webserver)
            var farm = require('farm');

            // I need someone to do something for me
            var task = {};
            farm.jobs.send(task, function (err, result) { });

            // I have a bunch stuff people can work on for me
            var jobs = [j1, j2, j3, j4, .....];
            farm.jobs.distribute(tasks, function (err, result) { });

            // global cluster pub / sub 
            farm.events.publish('new_sha', {sha: '', owner: '', repo: ''});
            farm.events.subscribe('new_sha', function (event) { });

            farm.worker(function (task, callback) {
               //do stuff
               callback(err, result);
            });
          
            var farm = require('farm');

            // I need someone to do something for me
            var task = {};
            farm.jobs.send(task, function (err, result) { });

            // I have a bunch stuff people can work on for me
            var jobs = [j1, j2, j3, j4, .....];
            farm.jobs.distribute(tasks, function (err, result) { });

            // global cluster pub / sub 
            farm.events.publish('new_sha', {sha: '', owner: '', repo: ''});
            farm.events.subscribe('new_sha', function (event) { });

            farm.worker(function (task, callback) {
               //do stuff
               callback(err, result);
            });
          
            var farm = require('farm');

            // I need someone to do something for me
            var task = {};
            farm.jobs.send(task, function (err, result) { });

            // I have a bunch stuff people can work on for me
            var jobs = [j1, j2, j3, j4, .....];
            farm.jobs.distribute(tasks, function (err, result) { });

            // global cluster pub / sub 
            farm.events.publish('new_sha', {sha: '', owner: '', repo: ''});
            farm.events.subscribe('new_sha', function (event) { });

            farm.worker(function (task, callback) {
               //do stuff
               callback(err, result);
            });
          
            var farm = require('farm');

            // I need someone to do something for me
            var task = {};
            farm.jobs.send(task, function (err, result) { });

            // I have a bunch stuff people can work on for me
            var jobs = [j1, j2, j3, j4, .....];
            farm.jobs.distribute(tasks, function (err, result) { });

            // global cluster pub / sub 
            farm.events.publish('new_sha', {sha: '', owner: '', repo: ''});
            farm.events.subscribe('new_sha', function (event) { });

            farm.worker(function (task, callback) {
               //do stuff
               callback(err, result);
            });
          
            var farm = require('farm');

            // I need someone to do something for me
            var task = {};
            farm.jobs.send(task, function (err, result) { });

            // I have a bunch stuff people can work on for me
            var jobs = [j1, j2, j3, j4, .....];
            farm.jobs.distribute(tasks, function (err, result) { });

            // global cluster pub / sub 
            farm.events.publish('new_sha', {sha: '', owner: '', repo: ''});
            farm.events.subscribe('new_sha', function (event) { });

            farm.worker(function (task, callback) {
               //do stuff
               callback(err, result);
            });
          

Launch Day!

what works on one machine well, doesn't always work as well on 200 machines

Dev Env (on OSX)

                               #-------------#
                               |             |
                               |   WEB APP   |
                               |             |
                               '-------------'
                                      |
                                      v               
                                .------------.
                                |            |
                                |   BROKER   |
                                |            |
                                #------------#
                                      |
                                      v               
                                .------------.
                                |            |
                                |   WORKER   |
                                |            |
                                #------------#

          

Prod Env

                               #-------------#
                               |             |
                               |   WEB APP   |
                               |             |
                               '-------------'
                                      |
                                      v               
                                .------------.
                                |            |
                                |   BROKER   |
                                |            |
                                #------------#
                                      |
                                .------------.
                                |  100s OF   |
                                |   BOXES    |
                                '------------'
                                      |
                      .---------------+---------------.
                      |               |               |
                      v               v               v
                .------------.  .------------.  .------------.
                |            |  |            |  |            |
                |   WORKER   |  |   WORKER   |  |   WORKER   |
                |            |  |            |  |            |
                #------------#  #------------#  #------------#

          
  • 100s of boxes that didn't talk to each other
  • 100s of boxes that tried to all clone at the same time
  • all of our "sharing" code assumptions were wrong
https://www.vagrantup.com/

PROTIP: If you are developing a distributed application, you should work in a distributed environment

Pets or Cattle?

  • Services
  • Computers
  • Jobs
You should expect things to fail. If something fails, just try again. Schedule for it to run later or try right away. If it keeps failing then you can look into it.

Rube Goldberg Machine

UNIX WAY. Develop a series of tools that can be chained together and run atomically

You are the one

"The function of the One is now to return to the Source allowing a temporary dissimination of the code you carry reinserting the Prime Program." - The Architect

So why node?

  • Async IO and simple concurrency
  • it was an amazing languages to quickly glue a wide array of tools and programs together.
  • command line integration was awesome

Thank you!

www.bithound.io

www.bithound.io / @bithoundio