Optimistic Byzantine fault tolerance

被引：17

作者：

Zhao, Wenbing ^{[1
]}

机构：

[1] Cleveland State Univ, Dept Elect Engn & Comp Sci, Cleveland, OH 44115 USA

来源：

INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS | 2016年 / 31卷 / 03期

基金：

美国国家科学基金会;

关键词：

Optimistic Byzantine fault tolerance; replica consistency; Byzantine agreement; collaborative editing; event stream processing; conflict-free replicated data types;

D O I：

10.1080/17445760.2015.1078802

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The primary concern of traditional Byzantine fault tolerance is to ensure strong replica consistency by executing incoming requests sequentially according to a total order. Speculative execution at both clients and server replicas has been proposed as a way of reducing the end-to-end latency. In this article, we introduce optimistic Byzantine fault tolerance. Optimistic Byzantine fault tolerance aims to achieve higher throughput and lower end-to-end latency by using a weaker replica consistency model. Instead of ensuring strong safety as in traditional Byzantine fault tolerance, nonfaulty replicas are brought to a consistent state periodically and on-demand in optimistic Byzantine fault tolerance. Not all applications are suitable for optimistic Byzantine fault tolerance. We identify three types of applications, namely, realtime collaborative editing, event stream processing, and services constructed with conflict-free replicated data types, as good candidates for applying optimistic Byzantine fault tolerance. Furthermore, we provide a design guideline on how to achieve eventual consistency and how to recover from conflicts at different replicas. In optimistic Byzantine fault tolerance, a replica executes a request immediately without first establishing a total order of the message, and Byzantine agreement is used only to establish a common state synchronization point and the set of individual states needed to resolve conflicts. The recovery mechanism ensures both replica consistency and the validity of the system by identifying and removing the operations introduced by faulty clients and server replicas.

引用

页码：254 / 267

页数：14

共 23 条

[1]

Baquero C., 1999, Operating Systems Review, V33, P90, DOI 10.1145/334598.334614

[2] CAP Twelve Years Later: How the "Rules" Have Changed [J].

Brewer, Eric .

COMPUTER, 2012, 45 (02) :23-29

[3] Practical byzantine fault tolerance and proactive recovery [J].

Castro, M ;

Liskov, B .

ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2002, 20 (04) :398-461

[4]

CHAI H, 2014, 12 IEEE INT C DEP AU, P109, DOI DOI 10.1109/DASC.2014.28

[5]

CHAI H, 2014, 5 IEEE INT C SOFTW E, P758

[6]

Chai H., 2014, P IEEE INT C SERV CO

[7]

Cowling James, 2006, P 7 S OP SYST DES IM

[8]

ELLIS CA, 1989, SIGMOD REC, V18, P399, DOI 10.1145/66926.66963

[9]

Etzion O., 2010, EVENT PROCESSING ACT

[10] IMPOSSIBILITY OF DISTRIBUTED CONSENSUS WITH ONE FAULTY PROCESS [J].

FISCHER, MJ ;

LYNCH, NA ;

PATERSON, MS .

JOURNAL OF THE ACM, 1985, 32 (02) :374-382

← 1 2 3 →