Data Structures & Ruby – Aarti Parikh



Data Structures & Ruby – Aarti Parikh

0 15


data-structures-ruby

Data Structures in Ruby

On Github aarti / data-structures-ruby

Data Structures & Ruby

By

Aarti Parikh

Introduction, background

Why study DS?

Search, Organize and Manipulate information Reduces complexity Efficiency in solving algorithms
Data Structures are building blocks that enable us to search, organize and manipulate information so we can discover meaning and solve problems. Maintainable code for complex problems Picking the right ds can reduces complexity

Terms, Definitions & Properties

Linear Vs Non Linear Abstract Data type Time, Space Complexity & Big O Notation Ordering in a data structure Mutable vs Immutable Concurrent Duplication

Learn more NIST- Official dictionary

Linear Data structures, Arrays & Lists Non Linear Data structures, Trees, Graphs Abstract Data type is a Model: Functional definition of a DS separate from its implementation, set of functions and constraints. Properties ( how they store data) What is the run time of ( insert, search, read ) What is the space complexity to store Is ther ordering How does the ds deal with concurrency Are duplicates allowed.

Big (O) notation

Growth rate of a function or a program

    # O(n) 
    a = [1..1000]
    a.each { |n| print n}  
    # O(n^2) 
    a,b = (1..10),(1..10)
    a.each do |m|
      b.each do |n|
        p m*n
      end  
    end 
    # O(n!) All possible permuattions of a string
    "ruby".chars.to_a.permutation.map &:join
    => ["ruby", "ruyb", "rbuy", "rbyu", "ryub", "rybu", "urby", "uryb", "ubry", "ubyr", "uyrb", "uybr", "bruy", "bryu", "bury", "buyr", "byru", "byur", "yrub", "yrbu", "yurb", "yubr", "ybru", "ybur"]
The letter O is used because the growth rate of a function is also referred to as order of the function. Performance as data increases Number of loops

Big (O) continued

Classic example Binary Search in sorted array

    # O(log(n)) 
    class Array
      def binary_search(val, low=0, high=(length - 1))
        return nil if high < low
        mid = (low + high) / 2
        case
          when self[mid] > val then binary_search(val, low, mid-1)
          when self[mid] < val then binary_search(val, mid+1, high)
          else mid
        end
      end
    end
    # O(2^n) Recursive or Naive Fibonacci
    def fibonacci(n)
       n <= 1 ? n :  fibonacci( n - 1 ) + fibonacci( n - 2 ) 
    end

Big(O) enables selection of DS

Insert Search Retrieve Space Used

Handy Cheatsheet

We look at big O properties of DS to determine the choice based on problem domain

Common DS

  • Array
  • Hash, Bloom Filter
  • Stack, Queue, Dequeue
  • Linked Lists, Circular Buffers
  • Set, BitArray
  • Tree, Binary Search Tree, Heap, Trie
  • Graph
Grouped by properties, some of the one one we will look at

DS in Ruby Core & Std

  • Array
  • Hash
  • Set, SortedSet
  • Matrix
  • Vector
  • Range
Not that many to choose from

DS in Java

Java Collections Framework

Extensive use of DS not in Java Core

Enough DS to choose from that there is a flowchart for selecting the right one.

Why DS not as distinct in Ruby

Ruby's API blends queues, stacks, sets, arrays and lists in a few classes, for a richer api.

Array

Hash

  • Key/Vaue Pair
  • O(1) insert, search, delete
  • Internally an array
  • Uses a hash function & division method to fit data into an array

How it works

How the Hash works in Ruby

How to implement one in Ruby

Hash implementation

Implementing DS in Ruby

  • Understand Ruby API's and implementation better
  • A nice refresher/ academic exercise
  • Discover libraries

Let's start

Stack

  • FIFO
  • Operations supported
    • push
    • pop
    • peek
  • Implementation varies
    • Array
    • LinkedList

Stack Implementation using Array as storage

Queue

  • LIFO
  • Operations supported
    • shift
    • unshift
  • Implementation varies
    • Array
    • LinkedList

Queue Implementation using Forwardable and delgates so Storage is again Array

Deque

Set

  • Unordered elements
  • Distinct elements
  • Tests for membership/inclusion not necessarily retrieval
  • No set in Core till 1.9.3
  • MRI Ruby implemented as a Hash
  • Faster then [1,2,3].uniq! since it is just a Hash
  • No literal notation, use Set.new
  • The expected operations in a Set -member? -union | -intersection & -difference - -xor ^
  • Used in rails for routing, caching etc.

SortedSet

  • Same properties of Set but ordered elements
  • MRI Ruby implements natively in C with a Red-Black Tree DS
  • Sets in Java are also implemented in a similiar way.
    • Java 's TreeSet ~ Ruby's SortedSet
    • Java 's HashSet ~ Set

CustomSet

  • Internally stored as an array
  • Supports instantiation with any enumerable type
  • Includes Enumerable and implements each to get some methods for free
  • Return types should enable chaining
  • Return types should not allow access to internal DS.

Implementation

MultiSet

  • members allowed to appear more than once.
    • {a, a, b}
  • the order of elements is irrelevant: The multisets
    • {a, a, b} and {a, b, a} are equal.
  • it provides the count or multiplicity of an element in the set.
    • count (a) = 2
  • it can return the distinct elements of the set.
    • {a, b}
  • it returns the cardinality of the set, total number of elements including duplicates.
    • 6

Implementation

BitArray

  • Set like DS, but dense packed bits
  • Implemented using an array as storage

Implementation

Bloom Filter

  • Check membership in a set
  • Always gives a right answer if element is not present
  • false positive if present
  • Great for disk I/O
  • Safe browsing
  • Mix of hash & Set
  • uses multiple hash functions to verify membership

LinkedList

Circular Buffer

  • Implemented using an array as storage
  • Overwrites after full

Tree

  • Hierarchical DS
  • n-Ary tree
  • Binary Tree
  • Binary Search Tree
  • Heap
  • Trie
  • quick terminolgy
    • root
    • leaf
    • level
    • path
    • siblings
  • Real world usage, File system, RDBMS

BST

  • Binary tree with ordering
  • left < right
  • no duplicates,
  • Search/Insert/delete O(log n)

Heap

  • Binary tree
  • Max/Min at root
  • No order left/right
  • Max heap, min heap

Trie

  • A tree with prefix stored
  • Space efficient when there is repetition
  • Example storing urls
     www -> meetup -> com -> ruby -> sv -> 
                          -> java
                          -> hiking
                   -> net        

Matrix

  • Creating multi dimensional arrays in Ruby
      a =[][]
      a = Array.new() {} n dimensional matrix
  • Instead use Matrix class
  Matrix.build(3, 3) {|row, col| 0 }
  => 0, 0, 0
     0, 0, 0
     0, 0, 0

Graphs & other DS

  • Graphs
    • Singly linked list is a basic graph
    • Tree with loops
  • Sparse Matrices
  • Space Partitioning Trees
  • K-D tree

Conclusion

Github repo

Please email feedback to aartiwithcode@gmail.com

Thanks for listening