Data Structures & Ruby
By
Aarti Parikh
Introduction, background
Why study DS?
Search, Organize and Manipulate information
Reduces complexity
Efficiency in solving algorithms
Data Structures are building blocks that enable us to search, organize and manipulate information so we can discover meaning and solve problems.
Maintainable code for complex problems
Picking the right ds can reduces complexity
Terms, Definitions & Properties
Linear Vs Non Linear
Abstract Data type
Time, Space Complexity & Big O Notation
Ordering in a data structure
Mutable vs Immutable
Concurrent
Duplication
Learn more NIST- Official dictionary
Linear Data structures, Arrays & Lists
Non Linear Data structures, Trees, Graphs
Abstract Data type is a Model: Functional definition of a DS
separate from its implementation, set of functions and
constraints.
Properties ( how they store data)
What is the run time of ( insert, search, read )
What is the space complexity to store
Is ther ordering
How does the ds deal with concurrency
Are duplicates allowed.
Big (O) notation
Growth rate of a function or a program
# O(n)
a = [1..1000]
a.each { |n| print n}
# O(n^2)
a,b = (1..10),(1..10)
a.each do |m|
b.each do |n|
p m*n
end
end
# O(n!) All possible permuattions of a string
"ruby".chars.to_a.permutation.map &:join
=> ["ruby", "ruyb", "rbuy", "rbyu", "ryub", "rybu", "urby", "uryb", "ubry", "ubyr", "uyrb", "uybr", "bruy", "bryu", "bury", "buyr", "byru", "byur", "yrub", "yrbu", "yurb", "yubr", "ybru", "ybur"]
The letter O is used because the growth rate of a function is also referred to as order of the function.
Performance as data increases
Number of loops
Big (O) continued
Classic example Binary Search in sorted array
# O(log(n))
class Array
def binary_search(val, low=0, high=(length - 1))
return nil if high < low
mid = (low + high) / 2
case
when self[mid] > val then binary_search(val, low, mid-1)
when self[mid] < val then binary_search(val, mid+1, high)
else mid
end
end
end
# O(2^n) Recursive or Naive Fibonacci
def fibonacci(n)
n <= 1 ? n : fibonacci( n - 1 ) + fibonacci( n - 2 )
end
Big(O) enables selection of DS
Insert
Search
Retrieve
Space Used
Handy Cheatsheet
We look at big O properties of DS to determine the choice based on problem domain
Common DS
- Array
- Hash, Bloom Filter
- Stack, Queue, Dequeue
- Linked Lists, Circular Buffers
- Set, BitArray
- Tree, Binary Search Tree, Heap, Trie
- Graph
Grouped by properties, some of the one one we will look at
DS in Ruby Core & Std
- Array
- Hash
- Set, SortedSet
- Matrix
- Vector
- Range
Not that many to choose from
DS in Java
Java Collections Framework
Extensive use of DS not in Java Core
Enough DS to choose from that there is a flowchart for selecting the right one.
Why DS not as distinct in Ruby
Ruby's API blends queues, stacks, sets, arrays and lists in a few classes, for a richer api.
Implementing DS in Ruby
- Understand Ruby API's and implementation better
- A nice refresher/ academic exercise
- Discover libraries
Let's start
Set
- Unordered elements
- Distinct elements
- Tests for membership/inclusion not necessarily retrieval
- No set in Core till 1.9.3
- MRI Ruby implemented as a Hash
- Faster then [1,2,3].uniq! since it is just a Hash
- No literal notation, use Set.new
- The expected operations in a Set
-member?
-union |
-intersection &
-difference -
-xor ^
- Used in rails for routing, caching etc.
SortedSet
- Same properties of Set but ordered elements
- MRI Ruby implements natively in C with a Red-Black Tree DS
- Sets in Java are also implemented in a similiar way.
- Java 's TreeSet ~ Ruby's SortedSet
- Java 's HashSet ~ Set
CustomSet
- Internally stored as an array
- Supports instantiation with any enumerable type
- Includes Enumerable and implements each to get some methods for free
- Return types should enable chaining
- Return types should not allow access to internal DS.
Implementation
MultiSet
- members allowed to appear more than once.
- the order of elements is irrelevant: The multisets
- {a, a, b} and {a, b, a} are equal.
- it provides the count or multiplicity of an element in the set.
- it can return the distinct elements of the set.
- it returns the cardinality of the set, total number of elements including duplicates.
Implementation
BitArray
- Set like DS, but dense packed bits
- Implemented using an array as storage
Implementation
Bloom Filter
- Check membership in a set
- Always gives a right answer if element is not present
- false positive if present
- Great for disk I/O
- Safe browsing
- Mix of hash & Set
- uses multiple hash functions to verify membership
Circular Buffer
- Implemented using an array as storage
- Overwrites after full
Tree
- Hierarchical DS
- n-Ary tree
- Binary Tree
- Binary Search Tree
- Heap
- Trie
- quick terminolgy
- root
- leaf
- level
- path
- siblings
- Real world usage, File system, RDBMS
BST
- Binary tree with ordering
- left < right
- no duplicates,
- Search/Insert/delete O(log n)
Heap
- Binary tree
- Max/Min at root
- No order left/right
- Max heap, min heap
Trie
- A tree with prefix stored
- Space efficient when there is repetition
- Example storing urls
www -> meetup -> com -> ruby -> sv ->
-> java
-> hiking
-> net
Matrix
- Creating multi dimensional arrays in Ruby
a =[][]
a = Array.new() {} n dimensional matrix
Matrix.build(3, 3) {|row, col| 0 }
=> 0, 0, 0
0, 0, 0
0, 0, 0
Graphs & other DS
- Graphs
- Singly linked list is a basic graph
- Tree with loops
- Sparse Matrices
- Space Partitioning Trees
- K-D tree
Conclusion
Github repo
Please email feedback to aartiwithcode@gmail.com
Thanks for listening