Professor Gregory Chockler


Professor in Computer Science
PhD (Hebrew University of Jerusalem)
+44 (0)1483 682651
06 BB 02

About

Areas of specialism

Trustworthy and secure distributed systems; Large-scale distributed systems and clouds; Blockchain consensus

Research

Research interests

Supervision

Postgraduate research supervision

Publications

Sergey Egorov, Gregory Chockler, Brijesh Dongol, Dan O'Keeffe, Seyyedmohammadsadegh Keshavarzi (2024)Mangosteen: Fast Transparent Durability for Linearizable Applications using NVM Mangosteen: Fast Transparent Durability for Linearizable Applications using NVM, In: Proceedings of the 2024 USENIX Annual Technical Conference (USENIX ATC 2024)pp. 799-815 USENIX

The advent of byte-addressable non-volatile memory (NVM) technologies has enabled the development of low-latency high-throughput durable applications, i.e., applications that are capable of recovering from full-system crashes. However, programming such applications is error-prone as efficiency gains often require fine-grained (programmer-controlled) management of low-level persistence instructions. We propose Mangosteen, a high-level programming framework that allows developers to transform an existing lineariz-able in-memory application to a corresponding durably lin-earizable version using NVM. Our framework's API consists of a set of callback hooks that interpose on an application's request processing flow with minimal developer effort. Man-gosteen executes client operations on DRAM and persists their effects using binary instrumentation and redo logging. Mangosteen's concurrency control facilitates batching of read-write requests to minimize the cost of persistence, while allowing read-only requests to execute concurrently. A novel intra-batch deduplication mechanism further reduces persistence overheads for common OLTP workloads. Our empirical evaluation results show that Mangosteen-enabled applications outperform state-of-the-art solutions across the entire spectrum of read-write ratios. In particular, the Mangosteen-based version of Redis demonstrates throughput gains of between 2×–5× in comparison to prior work.

Manuel Bravo, Gregory Chockler, Alexey Gotsman, Alejandro Naser Pastoriza, Christian Roldan (2024)Vertical Atomic Broadcast and Passive Replication, In: 38th International Symposium on Distributed Computing (DISC 2024)31910pp. 10:1-10:19 Dagstuhl Publishing

Atomic broadcast is a reliable communication abstraction ensuring that all processes deliver the same set of messages in a common global order. It is a fundamental building block for implementing fault-tolerant services using either active (aka state-machine) or passive (aka primary-backup) replication. We consider the problem of implementing reconfigurable atomic broadcast, which further allows users to dynamically alter the set of participating processes, e.g., in response to failures or changes in the load. We give a complete safety and liveness specification of this communication abstraction and propose a new protocol implementing it, called Vertical Atomic Broadcast, which uses an auxiliary service to facilitate reconfiguration. In contrast to prior proposals, our protocol significantly reduces system downtime when reconfiguring from a functional configuration by allowing it to continue processing messages while agreement on the next configuration is in progress. Furthermore, we show that this advantage can be maintained even when our protocol is modified to support a stronger variant of atomic broadcast required for passive replication. 2012 ACM Subject Classification Theory of computation → Distributed computing models

Manuel Bravo, Gregory Chockler, Alexey Gotsman (2024)Liveness and latency of Byzantine state-machine replication, In: Distributed computing37pp. 177-205 Springer

Byzantine state-machine replication (SMR) ensures the consistency of replicated state in the presence of malicious replicas and lies at the heart of the modern blockchain technology. Byzantine SMR protocols often guarantee safety under all circumstances and liveness only under synchrony. However, guaranteeing liveness even under this assumption is nontrivial. So far we have lacked systematic ways of incorporating liveness mechanisms into Byzantine SMR protocols, which often led to subtle bugs. To close this gap, we introduce a modular framework to facilitate the design of provably live and efficient Byzantine SMR protocols. Our framework relies on a view abstraction generated by a special SMR synchronizer primitive to drive the agreement on command ordering. We present a simple formal specification of an SMR synchronizer and its bounded-space implementation under partial synchrony. We also apply our specification to prove liveness and analyse the latency of three Byzantine SMR protocols via a uniform methodology. In particular, one of these results yields what we believe is the first rigorous liveness proof for the algorithmic core of the seminal PBFT protocol.

Armando Castañeda, Gregory Chockler, Brijesh Dongol, Ori Lahav (2024)What Cannot Be Implemented on Weak Memory?, In: 38th International Symposium on Distributed Computing (DISC 2024)319pp. 11:1-11:22 Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

We present a general methodology for establishing the impossibility of implementing certain concurrent objects on different (weak) memory models. The key idea behind our approach lies in characterizing memory models by their mergeability properties, identifying restrictions under which independent memory traces can be merged into a single valid memory trace. In turn, we show that the mergeability properties of the underlying memory model entail similar mergeability requirements on the specifications of objects that can be implemented on that memory model. We demonstrate the applicability of our approach to establish the impossibility of implementing standard distributed objects with different restrictions on memory traces on three memory models: strictly consistent memory, total store order, and release-acquire. These impossibility results allow us to identify tight and almost tight bounds for some objects, as well as new separation results between weak memory models, and between well-studied objects based on their implementability on weak memory models.

Alejandro Naser Pastoriza, Gregory Chockler, Alexey Gotsman (2024)Fault-Tolerant Computing with Unreliable Channels, In: 27th International Conference on Principles of Distributed Systems (OPODIS 2023)286pp. 21:1-21:21 Schloss Dagstuhl – Leibniz-Zentrum für Informatik

We study implementations of basic fault-tolerant primitives, such as consensus and registers, in message-passing systems subject to process crashes and a broad range of communication failures. Our results characterize the necessary and sufficient conditions for implementing these primitives as a function of the connectivity constraints and synchrony assumptions. Our main contribution is a new algorithm for partially synchronous consensus that is resilient to process crashes and channel failures and is optimal in its connectivity requirements. In contrast to prior work, our algorithm assumes the most general model of message loss where faulty channels are flaky, i.e., can lose messages without any guarantee of fairness. This failure model is particularly challenging for consensus algorithms, as it rules out standard solutions based on leader oracles and failure detectors. To circumvent this limitation, we construct our solution using a new variant of the recently proposed view synchronizer abstraction, which we adapt to the crash-prone setting with flaky channels. 2012 ACM Subject Classification Theory of computation → Distributed computing models

Gregory Chockler, Alexey Gotsman (2018)Multi-Shot Distributed Transaction Commit, In: 32nd International Symposium on Distributed Computing (DISC 2018)121pp. 14:1-14:18 Schloss Dagstuhl – Leibniz-Zentrum für Informatik

Atomic Commit Problem (ACP) is a single-shot agreement problem similar to consensus, meant to model the properties of transaction commit protocols in fault-prone distributed systems. We argue that ACP is too restrictive to capture the complexities of modern transactional data stores, where commit protocols are integrated with concurrency control, and their executions for different transactions are interdependent. As an alternative, we introduce Transaction Certification Service (TCS), a new formal problem that captures safety guarantees of multi-shot transaction commit protocols with integrated concurrency control. TCS is parameterized by a certification function that can be instantiated to support common isolation levels, such as serializability and snapshot isolation. We then derive a provably correct crash-resilient protocol for implementing TCS through successive refinement. Our protocol achieves a better time complexity than mainstream approaches that layer two-phase commit on top of Paxos-style replication. 2012 ACM Subject Classification Theory of computation → Distributed computing models

Alexey Gotsman, Anatole Lefort, Gregory Chockler (2019)White-Box Atomic Multicast, In: Proceedings / International Conference on Dependable Systems and Networks ; sponsored by IEEE Computer Society Technical Committee on Fault-Tolerant Computing, IFIP Working Group 104 on Dependable Computing and Fault Tolerance ; in cooperation with AT & T [and others]pp. 176-187 IEEE

Atomic multicast is a communication primitive that delivers messages to multiple groups of processes according to some total order, with each group receiving the projection of the total order onto messages addressed to it. To be scalable, atomic multicast needs to be genuine, meaning that only the destination processes of a message should participate in ordering it. In this paper we propose a novel genuine atomic multicast protocol that in the absence of failures takes as low as 3 message delays to deliver a message when no other messages are multicast concurrently to its destination groups, and 5 message delays in the presence of concurrency. This improves the latencies of both the fault-tolerant version of classical Skeen's multicast protocol (6 or 12 message delays, depending on concurrency) and its recent improvement by Coelho et al. (4 or 8 message delays). To achieve such low latencies, we depart from the typical way of guaranteeing fault-tolerance by replicating each group with Paxos. Instead, we weave Paxos and Skeen's protocol together into a single coherent protocol, exploiting opportunities for white-box optimisations. We experimentally demonstrate that the superior theoretical characteristics of our protocol are reflected in practical performance pay-offs.

Manuel Bravo, Gregory Chockler, Alexey Gotsman (2020)Making Byzantine Consensus Live, In: 34th International Symposium on Distributed Computing (DISC 2020)179pp. 23:1-23:17 Schloss Dagstuhl – Leibniz-Zentrum für Informatik

Partially synchronous Byzantine consensus protocols typically structure their execution into a sequence of views, each with a designated leader process. The key to guaranteeing liveness in these protocols is to ensure that all correct processes eventually overlap in a view with a correct leader for long enough to reach a decision. We propose a simple view synchronizer abstraction that encapsulates the corresponding functionality for Byzantine consensus protocols, thus simplifying their design. We present a formal specification of a view synchronizer and its implementation under partial synchrony, which runs in bounded space despite tolerating message loss during asynchronous periods. We show that our synchronizer specification is strong enough to guarantee liveness for single-shot versions of several well-known Byzantine consensus protocols, including HotStuff, Tendermint, PBFT and SBFT. We furthermore give precise latency bounds for these protocols when using our synchronizer. By factoring out the functionality of view synchronization we are able to specify and analyze the protocols in a uniform framework, which allows comparing them and highlights trade-offs. 2012 ACM Subject Classification Theory of computation → Distributed computing models

Alejandro Naser Pastoriza, Gregory Chockler, Alexey Gotsman Fault-tolerant computing with unreliable channels, In: arXiv.org

We study implementations of basic fault-tolerant primitives, such as consensus and registers, in message-passing systems subject to process crashes and a broad range of communication failures. Our results characterize the necessary and sufficient conditions for implementing these primitives as a function of the connectivity constraints and synchrony assumptions. Our main contribution is a new algorithm for partially synchronous consensus that is resilient to process crashes and channel failures and is optimal in its connectivity requirements. In contrast to prior work, our algorithm assumes the most general model of message loss where faulty channels are flaky, i.e., can lose messages without any guarantee of fairness. This failure model is particularly challenging for consensus algorithms, as it rules out standard solutions based on leader oracles and failure detectors. To circumvent this limitation, we construct our solution using a new variant of the recently proposed view synchronizer abstraction, which we adapt to the crash-prone setting with flaky channels.

Manuel Bravo, Gregory Chockler, Alexey Gotsman Liveness and Latency of Byzantine State-Machine Replication (Extended Version), In: arXiv (Cornell University)

Byzantine state-machine replication (SMR) ensures the consistency of replicated state in the presence of malicious replicas and lies at the heart of the modern blockchain technology. Byzantine SMR protocols often guarantee safety under all circumstances and liveness only under synchrony. However, guaranteeing liveness even under this assumption is nontrivial. So far we have lacked systematic ways of incorporating liveness mechanisms into Byzantine SMR protocols, which often led to subtle bugs. To close this gap, we introduce a modular framework to facilitate the design of provably live and efficient Byzantine SMR protocols. Our framework relies on a view abstraction generated by a special SMR synchronizer primitive to drive the agreement on command ordering. We present a simple formal specification of an SMR synchronizer and its bounded-space implementation under partial synchrony. We also apply our specification to prove liveness and analyze the latency of three Byzantine SMR protocols via a uniform methodology. In particular, one of these results yields what we believe is the first rigorous liveness proof for the algorithmic core of the seminal PBFT protocol.

Joseph Izraelevitz, Gaukas Wang, Rhett Hanscom, Kayli Silvers, Tamara Silbergleit Lehman, Gregory Chockler, Alexey Gotsman (2022)Acuerdo: Fast Atomic Broadcast over RDMA, In: Proceedings of the 51st International Conference on Parallel Processing Association for Computing Machinery (ACM)

Atomic broadcast protocols ensure that messages are delivered to a group of machines in some total order, even when some of these machines can fail. These protocols are key to making distributed services fault-tolerant, as their total order guarantee allows keeping multiple service replicas in sync. But, unfortunately, atomic broadcast protocols are also notoriously expensive. We present a new protocol, called Acuerdo, that improves atomic broadcast performance by using remote direct memory addressing (RDMA). Acuerdo is built from the ground up to perform communication using one-side RDMA writes, which do not use the CPU of the remote machine, and is explicitly designed to minimize waiting on the critical path. Our experimental results demonstrate that Acuerdo provides raw throughput comparable to or exceeding other RDMA atomic broadcast protocols, while improving latency by almost 2𝑥.

Karla Vargas, Gregory Chockler (2022)Distributed Oracle for Estimating Global Network Delay with Known Error Bounds, In: Networked Systems: 10th International Conference, NETYS 2022, Virtual Event, May 17–19, 2022, Proceedingspp. 201-221 Springer

Partially synchronous models are often assumed for designing distributed protocols because they capture realistic timing assumptions, such as the asynchronous and synchronous periods that the system can experience. In some of these models, protocols need to estimate network delays. Some protocols fix the global message delay bound for all executions, which leads to sub-optimal solutions in terms of latency, because this bound must be chosen conservatively. And other protocols employ delay estimation mechanisms that only give an upper bound on the delay without quantifying the estimation error. The performance of these protocols depends on how close their estimations are in relation to the actual network delay. For instance, some Byzantine consensus protocols use timeouts based on this estimation. We formalize this problem as the Global Delay Bound Estimation (GDBE) and address it by introducing a distributed oracle that enriches partial synchronous models. This oracle produces estimates of the channel delays that allow processes to derive an efficient global bounded estimate. Oracles and global bounded estimates, provide a framework that facilitates the design of protocols for partially synchronous models and the analysis of their time complexity. We formalize the properties of the oracle and the proposed framework and show that it can be implemented in the presence of crash failures. In contrast, we prove that GDBE cannot be solved in the Byzantine failure model, and show how to circumvent this impossibility using an extra assumption. Finally, we show how to use our framework to implement a view synchronizer thus obtaining an efficient solution for Byzantine consensus.

Manuel Bravo, Gregory Chockler, Alexey Gotsman (2022)Liveness and Latency of Byzantine State-Machine Replication

Byzantine state-machine replication (SMR) ensures the consistency of replicated state in the presence of malicious replicas and lies at the heart of the modern blockchain technology. Byzantine SMR protocols often guarantee safety under all circumstances and liveness only under synchrony. However, guaranteeing liveness even under this assumption is nontrivial. So far we have lacked systematic ways of incorporating liveness mechanisms into Byzantine SMR protocols, which often led to subtle bugs. To close this gap, we introduce a modular framework to facilitate the design of provably live and efficient Byzantine SMR protocols. Our framework relies on a view abstraction generated by a special SMR synchronizer primitive to drive the agreement on command ordering. We present a simple formal specification of an SMR synchronizer and its bounded-space implementation under partial synchrony. We also apply our specification to prove liveness and analyze the latency of three Byzantine SMR protocols via a uniform methodology. In particular, one of these results yields what we believe is the first rigorous liveness proof for the algorithmic core of the seminal PBFT protocol.

Manuel Bravo, Gregory Chockler, Alexey Gotsman Making Byzantine Consensus Live (Extended Version), In: arXiv.org

Partially synchronous Byzantine consensus protocols typically structure their execution into a sequence of views, each with a designated leader process. The key to guaranteeing liveness in these protocols is to ensure that all correct processes eventually overlap in a view with a correct leader for long enough to reach a decision. We propose a simple view synchronizer abstraction that encapsulates the corresponding functionality for Byzantine consensus protocols, thus simplifying their design. We present a formal specification of a view synchronizer and its implementation under partial synchrony, which runs in bounded space despite tolerating message loss during asynchronous periods. We show that our synchronizer specification is strong enough to guarantee liveness for single-shot versions of several well-known Byzantine consensus protocols, including HotStuff, Tendermint, PBFT and SBFT. We furthermore give precise latency bounds for these protocols when using our synchronizer. By factoring out the functionality of view synchronization we are able to specify and analyze the protocols in a uniform framework, which allows comparing them and highlights trade-offs.

Manuel Bravo, Gregory Chockler, Alexey Gotsman (2022)Making Byzantine consensus live, In: Distributed Computing35pp. 503-532 Springer

Partially synchronous Byzantine consensus protocols typically structure their execution into a sequence of views, each with a designated leader process. The key to guaranteeing liveness in these protocols is to ensure that all correct processes eventually overlap in a view with a correct leader for long enough to reach a decision. We propose a simple view synchronizer abstraction that encapsulates the corresponding functionality for Byzantine consensus protocols, thus simplifying their design. We present a formal specification of a view synchronizer and its implementation under partial synchrony, which runs in bounded space despite tolerating message loss during asynchronous periods. We show that our synchronizer specification is strong enough to guarantee liveness for single-shot versions of several well-known Byzantine consensus protocols, including PBFT and HotStuff. We furthermore give precise latency bounds for these protocols when using our synchronizer. By factoring out the functionality of view synchronization we are able to specify and analyze the protocols in a uniform framework, which allows comparing them and highlights trade-offs.

Gregory Chockler, Alexey Gotsman (2021)Multi-shot distributed transaction commit, In: Distributed Computing34(4)pp. 301-318 Springer

Atomic Commit Problem (ACP) is a single-shot agreement problem similar to consensus, meant to model the properties of transaction commit protocols in fault-prone distributed systems. We argue that ACP is too restrictive to capture the complexities of modern transactional data stores, where commit protocols are integrated with concurrency control, and their executions for different transactions are interdependent. As an alternative, we introduce Transaction Certification Service (TCS), a new formal problem that captures safety guarantees of multi-shot transaction commit protocols with integrated concurrency control. TCS is parameterized by a certification function that can be instantiated to support common isolation levels, such as serializability and snapshot isolation. We then derive a provably correct crash-resilient protocol for implementing TCS through successive refinement. Our protocol achieves a better time complexity than mainstream approaches that layer two-phase commit on top of Paxos-style replication.