Understanding Persistent Data Structures

Lesley Lai

@LesleyLai6

http://lesleylai.info/

Referential transparency

  • An expression always evaluates to the same result in any context
  • Make both you and the compiler easier to reason about the code
  • What we want to achieve

Referential transparency

  • Not True when combine with mutation
  • Shared mutable states are hard to reason about
const xs = [1, 2, 3];
const ys = xs;
console.log(xs); // [1, 2, 3]
ys[0] = 0;
console.log(xs); // [0, 2, 3]

Persistent data structures

  • Old values are preserved
  • First studied as imperative data structures

Persistent data structures

Level of persistency

  • partially persistent
  • fully persistent
  • confluently persistent

Immutable Array

module ImmutableArray = struct
  let set (arr : 'a array) (pos : int) (value : 'a) =
    if (pos >= 0) && (pos < (Array.length arr)) then
      let copied = Array.copy arr in
      Array.unsafe_set copied pos value;
      Some copied
    else
      None
      
  ...
end

let x = [|1;2;3;4;5|] (* [1,2,3,4,5] *)
let y = ImmutableArray.set x 0 42 (* Some [42,2,3,4,5] *)

1

x0

2

3

4

5

6

7

8

9

Immutable Array

1

x0

2

3

4

5

6

7

8

9

1

3

3

4

5

6

7

8

9

x1

Immutable Array

1

x0

2

3

4

5

6

7

8

9

1

3

3

4

5

6

7

8

9

x1

Immutable Array

x0

Boxed Type

Object 1

Object 2

Object 2'

Object 3

...

...

x1

Immutable Array

  • The most efficient iteration and random access
  • Compact
  • All manipulations are O(n)

Lists

Lists

1

x0

let x0 = [1]

Lists

1

2

x0

3

x1

3 :: 2 :: x0
let x0 = [1] in
let x1 =

Lists

1

2

x0

3

x1

4

x2

4 :: x1
let x2 =
let x0 = [1] in
let x1 = 3 :: 2 :: x0 in

Lists

1

2

x0

3

x1

4

x2

5

x3

5 :: x1
let x1 = 3 :: 2 :: x0 in
let x0 = [1] in
let x3 =
let x2 = 4 :: x1 in

List Implementaion

type 'a myList =
  | Nil
  | Cons of 'a * 'a myList

Using List

let map (func: ('a -> 'b)) (list: 'a myList): 'b myList =
    let rec helper xs acc =
            match xs with
              | Nil -> acc
              | Cons (head, tail) -> helper tail (Cons (func head, acc))
    in
    helper list Nil

Pattern match on the inductive definition.

Eg:

Prefer higher-order functions (map/fold/...) to raw loop/recursion

But don't over generalize

["a"; "b"; "c"; "d"]
 |> ([1; 2; 3; 4]
     |> List.fold_left2 (fun acc x y -> (x, y) :: acc) [])
 |> List.rev

vs

let zip xs ys
  = List.fold_left2 (fun acc x y -> (x, y) :: acc) [] xs ys
  |> List.rev;

zip ["a"; "b"; "c"; "d"] [1; 2; 3; 4]

List Operations

O(1)

O(n)

  • Concat
  • Insert
  • Append
  • Random access
  • Update
  • map
  • filter
  • foldl, foldr
  • Prepend
  • Head
  • Tail

List Performance

Your lists are stacks. Using them as "random accessing sequence" is wrong!

List Performance

List.nth lst 9

Don't do the following:

if List.length xs == 0 then
   ...
else
   ...
let rec reverse (xs : 'a list) : 'a list =
  match xs with
  | [] -> []
  | head :: tail -> (reverse tail) @ [head]
  

Phil Bagwell

RRB-Trees: Efficient Immutable Vectors
https://infoscience.epfl.ch/record/169879/files/RMTrees.pdf

Relaxed Radix Balanced Tree

a b c d e f g h i j
k l m n o p q r s t
u v w
a b c d
e f g h
i j k l
m n o p
q r s t
u v w
a b c d
e f g h
i j k l
m n o p
q r s t
u v w
a b c d
e f g h
i j k l
m n o p
q r s t
u v w

Radix Balanced Tree

a b c d
e f g h
i j k l
m n o p
q r s t
u v w

Radix Balanced Tree

Implement a "Vector", "Dynamic Array"

a b c d
e f g h
i j k l
m n o p
q r s t
u v w

Radix Balanced Tree Search

17 → 01 00 01

Vector.get 17
a b c d
e f g h
i j k l
m n o p
q r s t
u v w

Radix Balanced Tree Search

Vector.get 17

17 → 01 00 01

a b c d
e f g h
i j k l
m n o p
q r s t
u v w

Radix Balanced Tree Search

Vector.get 17

17 → 01 00 01

a b c d
e f g h
i j k l
m n o p
q s t
u v w

Radix Balanced Tree Search

Vector.get 17

17 → 01 00 01

r

a b c d
e f g h
i j k l
m n o p
q s t
u v w

Radix Balanced Tree Update

Vector.set 17 "R"

r

q s t

R

a b c d
e f g h
i j k l
m n o p
q s t
u v w

Radix Balanced Tree Update

Vector.set 17 "R"

r

q s t

R

a b c d
e f g h
i j k l
m n o p
q s t
u v w

Radix Balanced Tree Update

Vector.set 17 "R"

r

q s t

R

Relaxed Radix Balanced Tree

a b c d
e f g h
i j k l
m n o p
q r s
t u
16 19 23
2 4
3
v w

RRB-tree Operations

O(log(n))

  • Random access
  • Update
  • Append
  • Slice left/right
  • Concat
  • Insert
  • Prepend

Hash Array Mapped Trie

Hash Array Mapped Trie

Implement "Unordered Set/Map", "Hash Set/Map", "Dictionary"

"Trie"

Tries were first described by René de la Briandais in 1959. The term trie was coined two years later by Edward Fredkin, who pronounces it /ˈtr/ (as "tree"), after the middle syllable of retrieval. However, other authors pronounce it /ˈtr/ (as "try"), in an attempt to distinguish it verbally from "tree".

Hash Array Mapped Trie

Using hash values as indices

n

hash('n') → 01 00 01

m
l e k
f a
g c b
o h
j i d

HAMT Operations

O(log(n))

  • Random access
  • Delete
  • Insert

Common Optimizations

Left Shadow

a b c d
a b c

v1

v1'

You don't need to allocate what is not there

Performance Tradeoff of Node Size

Advantage of Small Nodes

Advantage of Big Nodes

  • Less copy
  • Less indirection
  • Better cache locality

Hierarchical vs Flat

"Transient"

  • Convert to to imperative data structure temporarily
  • Clojure example:
(defn vrange [n]
  (loop [i 0 v []]
    (if (< i n)
      (recur (inc i) (conj v i))
      v)))

(defn vrange2 [n]
  (loop [i 0 v (transient [])]
    (if (< i n)
      (recur (inc i) (conj! v i))
      (persistent! v))))

;; benchmarked (Java 1.8, Clojure 1.7)
(def v (vrange 1000000))    ;; 73.7 ms
(def v2 (vrange2 1000000))  ;; 19.7 ms

Persistent Data Structure Advantages

  • Minimum copying
  • Compact history
  • Fast comparison
  • Thread safe for free

Persistent Data Structure Disadvantages

  • Performance overhead
  • Shared ownership (need garbage collection)

Some other interesting data structures

  • Persistent red-black tree
  • Finger Tree
  • leftist heap, Binomial Heaps, Brodal Queue (Priority Queue)
  • Physicists Queue, Banker Queue, and Real-time Queue (Lazy Queue)

Resources

Resources

Dr. Matthew Hammer
CSCI 4830-016 Principles of Functional Programming
http://matthewhammer.org/courses/pfp-s18/

Thank you!