Professor Gregory Chockler


Professor in Computer Science
PhD (Hebrew University of Jerusalem)
+44 (0)1483 682651
06 BB 02

About

Areas of specialism

Trustworthy and secure distributed systems; Large-scale distributed systems and clouds; Blockchain consensus

Research

Research interests

Supervision

Postgraduate research supervision

Publications

Alejandro Naser Pastoriza, Gregory Chockler, Alexey Gotsman Fault-tolerant computing with unreliable channels, In: arXiv.org

We study implementations of basic fault-tolerant primitives, such as consensus and registers, in message-passing systems subject to process crashes and a broad range of communication failures. Our results characterize the necessary and sufficient conditions for implementing these primitives as a function of the connectivity constraints and synchrony assumptions. Our main contribution is a new algorithm for partially synchronous consensus that is resilient to process crashes and channel failures and is optimal in its connectivity requirements. In contrast to prior work, our algorithm assumes the most general model of message loss where faulty channels are flaky, i.e., can lose messages without any guarantee of fairness. This failure model is particularly challenging for consensus algorithms, as it rules out standard solutions based on leader oracles and failure detectors. To circumvent this limitation, we construct our solution using a new variant of the recently proposed view synchronizer abstraction, which we adapt to the crash-prone setting with flaky channels.

Manuel Bravo, Gregory Chockler, Alexey Gotsman Liveness and Latency of Byzantine State-Machine Replication (Extended Version), In: arXiv (Cornell University)

Byzantine state-machine replication (SMR) ensures the consistency of replicated state in the presence of malicious replicas and lies at the heart of the modern blockchain technology. Byzantine SMR protocols often guarantee safety under all circumstances and liveness only under synchrony. However, guaranteeing liveness even under this assumption is nontrivial. So far we have lacked systematic ways of incorporating liveness mechanisms into Byzantine SMR protocols, which often led to subtle bugs. To close this gap, we introduce a modular framework to facilitate the design of provably live and efficient Byzantine SMR protocols. Our framework relies on a view abstraction generated by a special SMR synchronizer primitive to drive the agreement on command ordering. We present a simple formal specification of an SMR synchronizer and its bounded-space implementation under partial synchrony. We also apply our specification to prove liveness and analyze the latency of three Byzantine SMR protocols via a uniform methodology. In particular, one of these results yields what we believe is the first rigorous liveness proof for the algorithmic core of the seminal PBFT protocol.

Joseph Izraelevitz, Gaukas Wang, Rhett Hanscom, Kayli Silvers, Tamara Silbergleit Lehman, Gregory Chockler, Alexey Gotsman (2022)Acuerdo: Fast Atomic Broadcast over RDMA, In: Proceedings of the 51st International Conference on Parallel Processing Association for Computing Machinery (ACM)

Atomic broadcast protocols ensure that messages are delivered to a group of machines in some total order, even when some of these machines can fail. These protocols are key to making distributed services fault-tolerant, as their total order guarantee allows keeping multiple service replicas in sync. But, unfortunately, atomic broadcast protocols are also notoriously expensive. We present a new protocol, called Acuerdo, that improves atomic broadcast performance by using remote direct memory addressing (RDMA). Acuerdo is built from the ground up to perform communication using one-side RDMA writes, which do not use the CPU of the remote machine, and is explicitly designed to minimize waiting on the critical path. Our experimental results demonstrate that Acuerdo provides raw throughput comparable to or exceeding other RDMA atomic broadcast protocols, while improving latency by almost 2𝑥.

Karla Vargas, Gregory Chockler (2022)Distributed Oracle for Estimating Global Network Delay with Known Error Bounds, In: Networked Systems: 10th International Conference, NETYS 2022, Virtual Event, May 17–19, 2022, Proceedingspp. 201-221 Springer

Partially synchronous models are often assumed for designing distributed protocols because they capture realistic timing assumptions, such as the asynchronous and synchronous periods that the system can experience. In some of these models, protocols need to estimate network delays. Some protocols fix the global message delay bound for all executions, which leads to sub-optimal solutions in terms of latency, because this bound must be chosen conservatively. And other protocols employ delay estimation mechanisms that only give an upper bound on the delay without quantifying the estimation error. The performance of these protocols depends on how close their estimations are in relation to the actual network delay. For instance, some Byzantine consensus protocols use timeouts based on this estimation. We formalize this problem as the Global Delay Bound Estimation (GDBE) and address it by introducing a distributed oracle that enriches partial synchronous models. This oracle produces estimates of the channel delays that allow processes to derive an efficient global bounded estimate. Oracles and global bounded estimates, provide a framework that facilitates the design of protocols for partially synchronous models and the analysis of their time complexity. We formalize the properties of the oracle and the proposed framework and show that it can be implemented in the presence of crash failures. In contrast, we prove that GDBE cannot be solved in the Byzantine failure model, and show how to circumvent this impossibility using an extra assumption. Finally, we show how to use our framework to implement a view synchronizer thus obtaining an efficient solution for Byzantine consensus.

Manuel Bravo, Gregory Chockler, Alexey Gotsman (2022)Liveness and Latency of Byzantine State-Machine Replication

Byzantine state-machine replication (SMR) ensures the consistency of replicated state in the presence of malicious replicas and lies at the heart of the modern blockchain technology. Byzantine SMR protocols often guarantee safety under all circumstances and liveness only under synchrony. However, guaranteeing liveness even under this assumption is nontrivial. So far we have lacked systematic ways of incorporating liveness mechanisms into Byzantine SMR protocols, which often led to subtle bugs. To close this gap, we introduce a modular framework to facilitate the design of provably live and efficient Byzantine SMR protocols. Our framework relies on a view abstraction generated by a special SMR synchronizer primitive to drive the agreement on command ordering. We present a simple formal specification of an SMR synchronizer and its bounded-space implementation under partial synchrony. We also apply our specification to prove liveness and analyze the latency of three Byzantine SMR protocols via a uniform methodology. In particular, one of these results yields what we believe is the first rigorous liveness proof for the algorithmic core of the seminal PBFT protocol.

Manuel Bravo, Gregory Chockler, Alexey Gotsman Making Byzantine Consensus Live (Extended Version), In: arXiv.org

Partially synchronous Byzantine consensus protocols typically structure their execution into a sequence of views, each with a designated leader process. The key to guaranteeing liveness in these protocols is to ensure that all correct processes eventually overlap in a view with a correct leader for long enough to reach a decision. We propose a simple view synchronizer abstraction that encapsulates the corresponding functionality for Byzantine consensus protocols, thus simplifying their design. We present a formal specification of a view synchronizer and its implementation under partial synchrony, which runs in bounded space despite tolerating message loss during asynchronous periods. We show that our synchronizer specification is strong enough to guarantee liveness for single-shot versions of several well-known Byzantine consensus protocols, including HotStuff, Tendermint, PBFT and SBFT. We furthermore give precise latency bounds for these protocols when using our synchronizer. By factoring out the functionality of view synchronization we are able to specify and analyze the protocols in a uniform framework, which allows comparing them and highlights trade-offs.

Manuel Bravo, Gregory Chockler, Alexey Gotsman (2022)Making Byzantine consensus live, In: Distributed Computing35pp. 503-532 Springer

Partially synchronous Byzantine consensus protocols typically structure their execution into a sequence of views, each with a designated leader process. The key to guaranteeing liveness in these protocols is to ensure that all correct processes eventually overlap in a view with a correct leader for long enough to reach a decision. We propose a simple view synchronizer abstraction that encapsulates the corresponding functionality for Byzantine consensus protocols, thus simplifying their design. We present a formal specification of a view synchronizer and its implementation under partial synchrony, which runs in bounded space despite tolerating message loss during asynchronous periods. We show that our synchronizer specification is strong enough to guarantee liveness for single-shot versions of several well-known Byzantine consensus protocols, including PBFT and HotStuff. We furthermore give precise latency bounds for these protocols when using our synchronizer. By factoring out the functionality of view synchronization we are able to specify and analyze the protocols in a uniform framework, which allows comparing them and highlights trade-offs.

Gregory Chockler, Alexey Gotsman (2021)Multi-shot distributed transaction commit, In: Distributed Computing34(4)pp. 301-318 Springer

Atomic Commit Problem (ACP) is a single-shot agreement problem similar to consensus, meant to model the properties of transaction commit protocols in fault-prone distributed systems. We argue that ACP is too restrictive to capture the complexities of modern transactional data stores, where commit protocols are integrated with concurrency control, and their executions for different transactions are interdependent. As an alternative, we introduce Transaction Certification Service (TCS), a new formal problem that captures safety guarantees of multi-shot transaction commit protocols with integrated concurrency control. TCS is parameterized by a certification function that can be instantiated to support common isolation levels, such as serializability and snapshot isolation. We then derive a provably correct crash-resilient protocol for implementing TCS through successive refinement. Our protocol achieves a better time complexity than mainstream approaches that layer two-phase commit on top of Paxos-style replication.