Christopher Meiklejohn

Workshop: Conflict-Free Replicated Data Types
A Brief History of Distributed Programming: RPC

LASP Creator

Workshop: Conflict-Free Replicated Data Types

  • Part 1: CRDTs

When building applications that must operate in a distributed environment, application developers have to embrace the challenges of concurrency.  One of the most difficult challenges of concurrency is maintaining consistency, when state can be accessed and modified by multiple actors.

Traditionally, developers have had two choices for dealing with the problem. The first approach is, under concurrent modification, one of the updates wins through arbitration: picking one of the concurrent values by some other metric, such as when the operation was performed. This approach is usually problematic because the arbitration mechanism is usually nondeterministic and leads to applications that are prone to race conditions or data loss.  The second approach is storing multiple values and leaving “semantic resolution”, or choosing how to merge or reduce the values to a single value, to the application developer. Again, this approach is problematic because programming deterministic merge functions is non-trivial, ad-hoc, and error-prone.

Conflict-free Replicated Data Types (CRDTs) provide an alternative to this problem. CRDTs provide data structures that are equipped with deterministic merge functions that can be applied by the system automatically during concurrent modification. This mechanism alleviates the user from having to program complicated merge functions and allows the user to work with just single values: simplifying the task of concurrent programming.

During this tutorial, we introduce a open source reference implementation of CRDTs in Erlang, and walk through understanding where the concurrency issues arise and how CRDTs can be applied to make your concurrent applications easier to program and understand.

  • Part 2: Antidote

Cloud-scale applications must be highly available and offer low latency responses while serving millions of users around the world. To meet this need, applications have to carefully choose a high performance distributed database. NoSQL-style data stores  - providing high throughput and availability even under network partition - replaced in this setting traditional databases. They usually exhibit a low-level key-value API and expose data inconsistencies that arise due to asynchronous communication among the servers. It takes significant effort and expertise from programmers to deal with these inconsistencies and develop correct applications on top of these databases.

For example, consider that your application stores a counter which counts the number of ads displayed to a user. For scalability, the database replicates all data in different locations. What will be the value of the counter, when it is incremented at two locations at the same time? As an application programmer, you have to detect such concurrent updates and resolve conflicting modifications.

The Antidote datastore provides features that aid programmers to write correct applications, while providing high performance and horizontal scalability, from a single machine to geo-replicated deployments, with the added guarantees of causal Highly-Available Transactions, and provable absence of data corruption due to concurrency.

CRDT support:
Antidote supports replicated data types such as counters, sets, maps, and sequences that tolerate concurrent updates and partial failures.

Highly-Available Transactions:
Applications often need to maintain some relation between updates to different objects. For example, in a social networking application, a reply to some post should be visible to a user only after observing the post. Antidote maintains such relations by providing causal consistency across all replicas and atomic multi-object updates. Thus, programmers can program their application on top of Antidote without worrying about the inconsistencies arising due to concurrent updates in different replicas.

Geo-replication:
Antidote is designed to run on multiple servers in geo-distributed locations. To provide fast responses to read and write requests, Antidote automatically replicates data in different locations and serves the requests from the nearest location without contacting a remote server. In this tutorial we will give a guided tour of Antidote’s API and will navigate you through its rich semantic features by hands-on demos.

A Brief History of Distributed Programming: RPC

While many of the distributed systems we operate today are built with language like Java and Go, distributed programming has a long history of innovation and adoption of its ideas. This include innovations seen all throughout the various fields of computing: novel type systems for dynamic languages; the concept of the promise, now a standard programming technique in web development;  and unified models of programming when data lives across nodes. Some of these ideas had major impact, while some fell incredibly short. Many technically superior ideas were not adopted simply because they were too “research” focused.

During this talk, we will present the history of RPC and why RPC may not be the best abstraction for building your next distributed application.