In order to develop develop high-performance yet correct distributed applications, it is essential to understand how your application's needs map to consistency. This tutorial aims to clarify and give the audience a working understanding of the crowded space of consistency models. What does a given model provide? What are the costs?
A distributed system has data scattered and replicated across nodes, separated by slow and unreliable networks. Designers face an inherent trade-off between system cost and application cost. If the system is "strongly consistent," it masks the ugly details, at the cost of constantly synchronising, and even stalling when the network is down. Going to a "weakly consistent" model can significantly improve cost and performance, but weakens the guarantees, moving some of the power and of the responsibility onto the application.
The strongest consistency models have three remarkable features: atomic transactions, causal order, and absence of concurrent updates. Each of these features helps to maintain a different class of application invariants, and each has a cost. Relaxing one of the features lowers the cost and weakens the corresponding invariants, mostly independently of the others. We will illustrate each of the three axes, the associated class of invariants, and the underlying mechanisms.
Consistency models that allow concurrent updates are tricky, since data replicas may diverge. To address this issue, application developers can build upon libraries of CRDTs (Conflict-free Replicated Data Types). A CRDT is a data type that encapsulates divergence control and resolution, and that ensures that data necessarily converges to correct state, thanks to some simple mathematical properties. We will spend some time explaining the concepts, design and implementation of CRDTs.
We will also show how developers can leverage the two other dimensions, atomicity and causality, to ensure correctness of their applications.
- application developers, system developers.
The CRDT abstraction (Conflct-free Replicated Data Type) is a useful tool for building highly scalable and available applications. CRDTs combine three key intuitions: (i) Encapsulate distribution inside the boundaries of a data type, so that ordinary programmers can build applications by combining CRDTs from a library; (ii) support "write now, propagate later" concurrent updates to ensure scalability, availability and performance; (iii) some simple of mathematical properties that ensure that data replicas converge safely. This talk presents the principles and design of some useful CRDTs. We discuss the CRDT approach in the context of some real applications of our industrial partners. We present some of the difficulties and limitations of CRDTs, in the more general perspective of consistency and scalability. We present recent extensions that address some of these limitations, in particular formaintaining application invariants.
Nuno Preguiça is curently Associate Professor at FCT - Univ. NOVA de Lisboa and leads the Computer Systems group at NOVA LINCS. He does research in the area of distributed systems, mobile computing and data management. He co-proposed the concept of CRDTs and a number of extensions to CRDTs, including variants for enforcing global invariants and for efficient execution of distributed computations.