On Github unicredit / spark-talk-slides
Presentation at UniCredit
Spark MapReduce + Cascading + Scalding (Pig)
Oh hey, these are some notes. They'll be hidden in your presentation, but you can see them if you open the speaker notes window (hit 's' on your keyboard).Spark MapReduce + Cascading + Scalding (Pig)
Spark Streaming Storm
Spark MapReduce + Cascading + Scalding (Pig)
Spark Streaming Storm
Spark SQL Impala (Hive)
Spark MapReduce + Cascading + Scalding (Pig)
Spark Streaming Storm
Spark SQL Impala (Hive)
GraphX Giraph
Spark MapReduce + Cascading + Scalding (Pig)
Spark Streaming Storm
Spark SQL Impala (Hive)
GraphX Giraph
MLLib Mahout
Spark MapReduce + Cascading + Scalding (Pig)
Spark Streaming Storm
Spark SQL Impala (Hive)
GraphX Giraph
MLLib Mahout
Spark JobServer ???
All of these are API compatible
The underlying abstraction is adataset
The underlying abstraction is adistributed dataset
The underlying abstraction is aresilient distributed dataset
Now go to the examplesBuilt on Scala and Akka
LOC comparisonSpeed comes as a consequence
Run by yourself the benchmark
The opposite is also true :-?
Available for
counts = file.flatMap(lambda line: line.split(" ")) \
   .map(lambda word: (word, 1)) \
   .reduceByKey(lambda a, b: a + b)
				JavaRDD<String> words = file.flatMap(new FlatMapFunction<String, String>() {
  public Iterable<String> call(String s) { return Arrays.asList(s.split(" ")); }
});
JavaPairRDD<String, Integer> pairs = words.map(new PairFunction<String, String, Integer>() {
  public Tuple2<String, Integer> call(String s) { return new Tuple2<String, Integer>(s, 1); }
});
JavaPairRDD<String, Integer> counts = pairs.reduceByKey(new Function2<Integer, Integer>() {
  public Integer call(Integer a, Integer b) { return a + b; }
});
					(if you really want)
(ns com.fire.kingdom.flambit
  (:require [flambo.api :as f]))
;; NOTE: we are using the flambo.api/fn not clojure.core/fn
(-> (f/text-file sc "data.txt")   ;; returns an unrealized lazy dataset
    (f/map (f/fn [s] (count s)))  ;; returns RDD array of length of lines
    (f/reduce (f/fn [x y] (+ x y)))) ;; returns a value, should be 1406
					See Flambo.