Purely Dysfunctional Data Structures
Practical Side of Implementing Immutable Structures
Immutability
List<String> transform(List<String> list);
List<String> transform(List<String> list) {
list.add(this.state.get(0));
list.add(SOME_CONSTANT);
this.orderPizza();
if (MOON.getLunarPhase() == FULL) list.remove(7);
this.moreState.addAll(list);
OtherClass.mutableStaticField = list;
return list;
}
Immutability helps in minimizing the number of invalid states
User user = new User();
user.setName(...);
user.setSurname(...);
user.setPhone(...);
// ...
user.set(...);
User user = new User(); // 2^0
user.setName(...);
user.setSurname(...);
user.setPhone(...);
// ...
user.set(...);
User user = new User(); // 2^0
user.setName(...); // 2^1
user.setSurname(...); // 2^2
user.setPhone(...); // 2^3
// ...
user.set(...); // 2^N
User user = new User();
Set<Users> users = new HashSet<>();
users.add(user);
user.setName(42);
users.contains(user); //false
User user = new User(name, surname);
Error space reduced
Immutability enables multiple optimizations
Objects can be memoized
Properties can be precomputed (hashes, sizes, etc)
2 Element List (JDK9)
static final class List12<E> extends AbstractImmutableList<E> {
@Stable
private final E e0;
@Stable
private final E e1;
}
@Override
public E get(int index) {
if (index == 0) {
return e0;
} else if (index == 1 && e1 != null) {
return e1;
}
throw outOfBounds(index);
}
An object is immutable when:
its state cannot be modified after construction
all its fields are final*
"this" reference doesn't escape during construction
Java Concurrency in Practice, Brian Goetz
Is String immutable?
final class String {
// ...
private final byte[] value;
private final byte coder;
private int hash;
// ...
}
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
hash = h = isLatin1() ? StringLatin1.hashCode(value)
: StringUTF16.hashCode(value);
}
return h;
}
It depends.
Does immutable even exist?
public static void main(String[] args) throws Exception {
VarHandle VALUE = MethodHandles.privateLookupIn(
String.class, MethodHandles.lookup())
.findVarHandle(String.class, "value", byte[].class);
Object kotlin = VALUE.get("Kotlin");
Object java = VALUE.get("Java!!");
System.arraycopy(java, 0, kotlin, 0, Array.getLength(kotlin));
System.out.println("Kotlin"); // Java!!
}
"Safely Shoot Yourself in the Foot with Java 9" by Heinz Kabutz
Immutable objects are not always thread-safe
The state of a new collection instance may not have been "published" (in the sense of the Java Memory Model specification), so that an unsynchronized non-volatile read from another thread may observe the object in an invalid state
If someone unsafely publishes new ::("", Nil) or Vector(1) to another thread,
that thread could observe a inconsistent state (...)
Designing Immutable APIs
java.util.List
static abstract class AbstractImmutableList<E>
extends AbstractImmutableCollection<E> ... {
// all mutating methods throw UnsupportedOperationException
void add(...) { throw uoe(); }
boolean addAll(...) { throw uoe(); }
E remove(...) { throw uoe(); }
void replaceAll(...) { throw uoe(); }
E set(...) { throw uoe(); }
void sort(...) { throw uoe();
}
The easy and not-very user-friendly way
And it violates Liskov's Substitution Principle
The key to user-friendly APIs of immutable structures are "mutating" methods returning a copy of the source
static abstract class AbstractImmutableList<E>
extends AbstractImmutableCollection<E> ... {
// all mutating methods return a changed copy
List<E> add(...) { return copyWith(...) }
List<E> addAll(...) { return copyWithAll(...) }
List<E> remove(...) { return copyWithout(...) }
List<E> replaceAll(...) { return copyWithReplaced(...) }
List<E> set(...) { return copyAndSet(...) }
List<E> sort(...) { return copySorted(...) }
}
static abstract class AbstractImmutableList<E>
extends AbstractImmutableCollection<E> ... {
// all mutating methods return a changed copy
List<E> add(...) { return copyWith(...) }
List<E> addAll(...) { return copyWithAll(...) }
List<E> remove(...) { return copyWithout(...) }
List<E> replaceAll(...) { return copyWithReplaced(...) }
List<E> set(...) { return copyAndSet(...) }
List<E> sort(...) { return copySorted(...) }
}
That's quite a lot of copying
... and iterating
"Haskell computations produce a lot of memory garbage - much more than conventional imperative languages."
(...)
"It's not uncommon to produce 1gb of data per second ..."
https://wiki.haskell.org/GHC/Memory_Management
Persistent Data Structures
A data structure that supports multiple versions is called persistent
while a data structure that allows only a single version at a time is called ephemeral
"Making data structures persistent" - James R. Driscoll, Neil Sarnak, Daniel D. K. Sleator, and Robert E. Tarjan
Brute-force copying
Fat nodes (think in-memory event-sourcing for data structures)
Structural sharing
Structural sharing
Gain efficiency by minimizing copying and maximizing reuse of existing elements
Rich Hickey’s “Clojure Concurrency”
Purely Functional Data Structures by Grzegorz Piwowarek
String
A Persistent Data Structure?
public String substring(int beginIndex) {
// ... JDK 8
return (beginIndex == 0)
? this
: new String(value, beginIndex, subLen);
}
Could avoid a new String allocation and would be perfectly safe
...but could create memory leaks so String's constructor does the copy anyway
ArrayList
Doesn't work with mutable structures
public List<E> subList(int fromIndex, int toIndex) {
subListRangeCheck(fromIndex, toIndex, size);
return new SubList(this, 0, fromIndex, toIndex);
}
Source can be mutated which can lead to sneaky bugs
For example:
List<Integer> numbers = // ...
List<Integer> transformed = guava.Lists.transform(numbers, i -> i);
List<Integer> someNumbers = transformed.subList(0, 3);
Collections.shuffle(numbers);
System.out.println(numbers); // [2,3,4,5...]
System.out.println(someNumbers); // [2,3,4]
Persistent Singly-Linked List
class Cons<T> implements List<T> {
private final T head;
private final List<T> tail;
}
List<T> prepend(T element) {
return new Cons<>(element, this);
}
Zero copy and constant time
"Purely Functional Data Structures in Scala", Vladimir Kostyukov
What about List#append?
List<T> append(T element) {
return foldRight(of(element), (x, xs) -> xs.prepend(x));
}
O(n)
Acceptable for lists since we have O(1) additions
Immutable Structures are thread-safe
...but not always useful for multithreaded scenarios
From the book "King Midas and the Golden Touch" by Marie C Craft
If we want to share "updates" of an immutable structure, we need to manually control access to the root reference
private volatile ImmutableList<Integer> queue;
// ..
// major contention here!
private synchronized void prepend(Integer e) {
queue = queue.prepend(e);
}
...or just use a dedicated mutable data structure
...or a different way of sharing updates
Same applies to any immutable structure
private volatile String text;
private synchronized void add(String postfix) {
text = text + postfix;
}
We could do the same lock-free with CAS but that would generate even more garbage and further stress GC
Persistent Set
class Set<T> {
private final List<T> set;
}
Highly inefficient because of O(N) element access
Can we do something... smarter?
We could use a tree
Path copying
Path copying
Subtrees not on the path can be shared
Persistent Queue
class Cons<T> implements List<T>, Queue<T> {
private final List<T> queue;
}
Works great as a LIFO stack
Queues exhibit different access patterns than lists which makes this implementation highly inefficient
Efficient Persistent Queue
class Cons<T> implements List<T>, Queue<T> {
private final List<T> front;
private final List<T> rear; // in opposite direction
}
Queue#new
private Queue(List<T> front, List<T> rear) {
this.front = front.isEmpty() ? rear.reverse() : front;
this.rear = front.isEmpty() ? front : rear;
}
Queue#enqueue
@Override
public Queue<T> enqueue(T element) {
return new Queue<>(front, rear.prepend(element));
}
Queue#dequeue
public Option<Tuple2<T, Q>> dequeue() {
return isEmpty()
? Option.none()
: Option.some(Tuple.of(head(), tail()));
}
Amortized constant-time! (mostly O(1), rarely O(n))
List#reverse is performed infrequently
Most likely not acceptable for real-time systems
Can we do something... smarter?
We could use a tree
2-3 Tree
...if it's so smart, why does Scala use Banker's Queue?
It's much more cache-friendly*
Persistent Map
class PMap<K, V> implements Map<K, V> {
private final List<Entry<K, V> map;
}
Advantages: it works
Disadvantages: performs nowhere near HashMap
Can we do something... smarter?
Lookup: log_32(N)
log_32(Integer.MAX_VALUE) ~ 6
log_32(Long.MAX_VALUE) ~ 13
Key takeaways:
Efficient immutability involves minimization of copying and maximization of structural sharing
Simple beats complex
Immutability is not a silver bullet
Trade-offs are everywhere
Purely Functional Data Structures, Chris Okasaki
Red-Black Trees in a Functional Setting, Chris Okasaki
Ideal Hash Trees, Phil Bagwell
Fast And Space Efficient Trie Searches, Phil Bagwell
Cache-Aware Lock-Free Concurrent Hash Tries, Prokopec, Bagwell, Odersky
https://shipilev.net/jvm/anatomy-quarks/11-moving-gc-locality/
Senior Software Engineer @ Hazelcast
Thank You!