-
1
-
-
0032000230
-
Message Logging: Pessimistic, Optimistic, Causal and Optimal
-
Feb
-
L. Alvisi and K. Marzullo, "Message Logging: Pessimistic, Optimistic, Causal and Optimal," IEEE Trans. Software Eng., vol. 24, no. 2, pp. 149-159, Feb. 1998.
-
(1998)
IEEE Trans. Software Eng
, vol.24
, Issue.2
, pp. 149-159
-
-
Alvisi, L.1
Marzullo, K.2
-
2
-
-
0038823138
-
Solving Large Quadratic Assignment Problems on Computational Grids
-
K. Anstreicher, N. Brixius, J.-P. Goux, and J. Linderoth, "Solving Large Quadratic Assignment Problems on Computational Grids," Math. Programming, vol. 91, no. 3, 2002.
-
(2002)
Math. Programming
, vol.91
, Issue.3
-
-
Anstreicher, K.1
Brixius, N.2
Goux, J.-P.3
Linderoth, J.4
-
3
-
-
84866225421
-
A Communication-Induced Checkpointing Protocol That Ensures Rollback-Dependency Trackability
-
97, p
-
R. Baldoni, "A Communication-Induced Checkpointing Protocol That Ensures Rollback-Dependency Trackability," Proc. 27th Int'l Symp. Fault-Tolerant Computing (FTCS '97), p. 68, 1997.
-
(1997)
Proc. 27th Int'l Symp. Fault-Tolerant Computing (FTCS
, pp. 68
-
-
Baldoni, R.1
-
4
-
-
27144556171
-
A Hybrid Message Logging-CIC Protocol for Constrained Checkpointability
-
F. Baude, D. Caromel, C. Delb, and L. Henrio, "A Hybrid Message Logging-CIC Protocol for Constrained Checkpointability," Proc. European Conf Parallel Processing (EuroPar '05), pp. 644-653, 2005.
-
(2005)
Proc. European Conf Parallel Processing (EuroPar '05)
, pp. 644-653
-
-
Baude, F.1
Caromel, D.2
Delb, C.3
Henrio, L.4
-
5
-
-
84884662651
-
MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes
-
Nov
-
G. Bosilca et al., "MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes," Proc. ACM/IEEE Conf. Supercomputing (SC '02), Nov. 2002.
-
(2002)
Proc. ACM/IEEE Conf. Supercomputing (SC '02)
-
-
Bosilca, G.1
-
6
-
-
60449096682
-
MPICH-V2: A Fault Tolerant MPI for Volatile Nodes Based on the Pessimistic Sender Based Message Logging
-
A. Bouteiller et al., "MPICH-V2: A Fault Tolerant MPI for Volatile Nodes Based on the Pessimistic Sender Based Message Logging," Proc. ACM/IEEE Conf. Supercomputing (SC '03), pp. 1-17, 2003.
-
(2003)
Proc. ACM/IEEE Conf. Supercomputing (SC '03)
, pp. 1-17
-
-
Bouteiller, A.1
-
7
-
-
84944901411
-
Coordinated Checkpoint versus Message Log for Fault Tolerant MPI
-
A. Bouteiller, P. Lemarinier, G. Krawezik, and F. Cappello, "Coordinated Checkpoint versus Message Log for Fault Tolerant MPI," Proc. Fifth IEEE Int'l Conf. Cluster Computing (Cluster '03), p. 242, 2003.
-
(2003)
Proc. Fifth IEEE Int'l Conf. Cluster Computing (Cluster '03)
, pp. 242
-
-
Bouteiller, A.1
Lemarinier, P.2
Krawezik, G.3
Cappello, F.4
-
9
-
-
0022020346
-
Distributed Snapshots: Determining Global States of Distributed Systems
-
K.M. Chandy and L. Lamport, "Distributed Snapshots: Determining Global States of Distributed Systems," ACM Trans. Computer Systems, vol. 3, no. 1, pp. 63-75, 1985.
-
(1985)
ACM Trans. Computer Systems
, vol.3
, Issue.1
, pp. 63-75
-
-
Chandy, K.M.1
Lamport, L.2
-
10
-
-
0042078549
-
A Survey of Rollback-Recovery Protocols in Message-Passing Systems
-
Sept
-
E.N. Elnozahy, L. Alvisi, Y.-M. Wang, and D.B. Johnson, "A Survey of Rollback-Recovery Protocols in Message-Passing Systems," ACM Computing Surveys, vol. 34, no. 3, pp. 375-408, Sept. 2002.
-
(2002)
ACM Computing Surveys
, vol.34
, Issue.3
, pp. 375-408
-
-
Elnozahy, E.N.1
Alvisi, L.2
Wang, Y.-M.3
Johnson, D.B.4
-
11
-
-
0031622953
-
The Implementation of the Cilk-5 Multithreaded Language
-
98, pp
-
M. Frigo, C.E. Leiserson, and K.H. Randall, "The Implementation of the Cilk-5 Multithreaded Language," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI '98), pp. 212-223, 1998.
-
(1998)
Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI
, pp. 212-223
-
-
Frigo, M.1
Leiserson, C.E.2
Randall, K.H.3
-
12
-
-
84908538816
-
Athapascan-1: On-Line Building Data Flow Graph in a Parallel Language
-
98, pp
-
F. Galilée, J.-L. Roch, G. Cavalheiro, and M. Doreille, "Athapascan-1: On-Line Building Data Flow Graph in a Parallel Language," Proc. Seventh Int'l Conf. Parallel Architectures and Compilation Techniques (PACT '98), pp. 88-95, 1998.
-
(1998)
Proc. Seventh Int'l Conf. Parallel Architectures and Compilation Techniques (PACT
, pp. 88-95
-
-
Galilée, F.1
Roch, J.-L.2
Cavalheiro, G.3
Doreille, M.4
-
13
-
-
70350117203
-
-
A Large Scale Nation-Wide Infrastructure for Grid Research
-
A Large Scale Nation-Wide Infrastructure for Grid Research, Grid5000, https://www.grid5000.fr, 2006.
-
(2006)
Grid5000
-
-
-
14
-
-
33947141945
-
Theft-Induced Checkpointing for Reconfigurable Dataflow Applications
-
May
-
S. Jafar, A. Krings, T. Gautier, and J.-L. Roch, "Theft-Induced Checkpointing for Reconfigurable Dataflow Applications," Proc. IEEE Electro/Information Technology Conf (EIT '05), May 2005.
-
(2005)
Proc. IEEE Electro/Information Technology Conf (EIT '05)
-
-
Jafar, S.1
Krings, A.2
Gautier, T.3
Roch, J.-L.4
-
15
-
-
27144432456
-
A Checkpoint/Recovery Model for Heterogeneous Dataflow Computations Using Work-Stealing
-
Aug.-Sept
-
S. Jafar, T. Gautier, A. Krings, and J.-L. Roch, "A Checkpoint/Recovery Model for Heterogeneous Dataflow Computations Using Work-Stealing," Proc. European Conf Parallel Processing (EuroPar '05), pp. 675-684, Aug.-Sept. 2005.
-
(2005)
Proc. European Conf Parallel Processing (EuroPar '05)
, pp. 675-684
-
-
Jafar, S.1
Gautier, T.2
Krings, A.3
Roch, J.-L.4
-
16
-
-
60449089144
-
A Probabilistic Approach for Task and Result Certification of Large-Scale Distributed Applications in Hostile Environments
-
P. Sloot et al, eds, Feb
-
A.W. Krings, J.-L. Roch, S. Jafar, and S. Varrette, "A Probabilistic Approach for Task and Result Certification of Large-Scale Distributed Applications in Hostile Environments," Proc. European Grid Conf (EGC '05), P. Sloot et al., eds., Feb. 2005.
-
(2005)
Proc. European Grid Conf (EGC '05)
-
-
Krings, A.W.1
Roch, J.-L.2
Jafar, S.3
Varrette, S.4
-
18
-
-
84976699318
-
The Byzantine Generals Problem
-
July
-
L. Lamport, M. Pease, and R. Shostak, "The Byzantine Generals Problem," ACM Trans. Programming Languages and Systems, vol. 4, no. 3, pp. 382-401, July 1982.
-
(1982)
ACM Trans. Programming Languages and Systems
, vol.4
, Issue.3
, pp. 382-401
-
-
Lamport, L.1
Pease, M.2
Shostak, R.3
-
19
-
-
0003912256
-
Check-point and Migration of UNIX Processes in the Condor Distributed Processing System,
-
Technical Report CS-TR-97-1346, Univ. of Wisconsin, Madison
-
M. Litzkow, T. Tannenbaum, J. Basney, and M. Livny, "Check-point and Migration of UNIX Processes in the Condor Distributed Processing System," Technical Report CS-TR-97-1346, Univ. of Wisconsin, Madison, 1997.
-
(1997)
-
-
Litzkow, M.1
Tannenbaum, T.2
Basney, J.3
Livny, M.4
-
20
-
-
0030417771
-
Exploiting DataFlow for Fault-Tolerance in a Wide-Area Parallel System
-
96, pp
-
A. Nguyen-Tuong, A. Grimshaw, and M. Hyett, "Exploiting DataFlow for Fault-Tolerance in a Wide-Area Parallel System," Proc. 15th Symp. Reliable Distributed Systems (SRDS '96), pp. 2-11, 1996.
-
(1996)
Proc. 15th Symp. Reliable Distributed Systems (SRDS
, pp. 2-11
-
-
Nguyen-Tuong, A.1
Grimshaw, A.2
Hyett, M.3
-
21
-
-
84944041103
-
A Case for Redundant Arrays of Inexpensive Disks (RAID)
-
88, pp
-
D.A. Patterson, G. Gibson, and R.H. Katz, "A Case for Redundant Arrays of Inexpensive Disks (RAID)," Proc. ACM SIGMOD '88, pp. 109-116, 1988.
-
(1988)
Proc. ACM SIGMOD
, pp. 109-116
-
-
Patterson, D.A.1
Gibson, G.2
Katz, R.H.3
-
23
-
-
0016829070
-
System Structure for Software Fault Tolerance
-
B. Randell, "System Structure for Software Fault Tolerance," Proc. Int'l Conf. Reliable Software, pp. 437-449, 1975.
-
(1975)
Proc. Int'l Conf. Reliable Software
, pp. 437-449
-
-
Randell, B.1
-
24
-
-
0036499242
-
Sabotage-Tolerance Mechanisms for Volunteer Computing Systems
-
L. Sarmenta, "Sabotage-Tolerance Mechanisms for Volunteer Computing Systems," Future Generation Computer Systems, vol. 18, no. 4, 2002.
-
(2002)
Future Generation Computer Systems
, vol.18
, Issue.4
-
-
Sarmenta, L.1
-
25
-
-
0039285280
-
Asynchrony in Parallel Computing: From Dataflow to Multithreading
-
J. Silc, B. Robic, and T. Ungerer, "Asynchrony in Parallel Computing: from Dataflow to Multithreading," Progress in Computer Research, pp. 1-33, 2001.
-
(2001)
Progress in Computer Research
, pp. 1-33
-
-
Silc, J.1
Robic, B.2
Ungerer, T.3
-
26
-
-
0029713612
-
CoCheck: Checkpointing and Process Migration for MPI
-
96, pp, Apr
-
G. Stellner, "CoCheck: Checkpointing and Process Migration for MPI," Proc. 10th Int'l Parallel Processing Symp. (IPPS '96), pp. 526-531, Apr. 1996.
-
(1996)
Proc. 10th Int'l Parallel Processing Symp. (IPPS
, pp. 526-531
-
-
Stellner, G.1
-
27
-
-
0022112420
-
Optimistic Recovery in Distributed Systems
-
R. Strom and S. Yemini, "Optimistic Recovery in Distributed Systems," ACM Trans. Computer Systems, vol. 3, no. 3, pp. 204-226, 1985.
-
(1985)
ACM Trans. Computer Systems
, vol.3
, Issue.3
, pp. 204-226
-
-
Strom, R.1
Yemini, S.2
-
28
-
-
0032155082
-
Portable and Fault-Tolerant Software Systems
-
Sept./Oct
-
V. Strumpen, "Portable and Fault-Tolerant Software Systems," IEEE Micro, vol. 18, no. 5, pp. 22-32, Sept./Oct. 1998.
-
(1998)
IEEE Micro
, vol.18
, Issue.5
, pp. 22-32
-
-
Strumpen, V.1
-
31
-
-
33646939369
-
Fault-Tolerance, Malleability and Migration for Divide-and-Conquer Applications on the Grid
-
Apr
-
G. Wrzesinska, R. van Nieuwpoort, J. Maassen, and H.E. Bal, "Fault-Tolerance, Malleability and Migration for Divide-and-Conquer Applications on the Grid," Proc. 19th Int'l Parallel and Distributed Processing Symp. (IPDPS '05), p. 13a, Apr. 2005.
-
(2005)
Proc. 19th Int'l Parallel and Distributed Processing Symp. (IPDPS '05)
-
-
Wrzesinska, G.1
van Nieuwpoort, R.2
Maassen, J.3
Bal, H.E.4
-
32
-
-
0142066947
-
Selecting the Right Data Distribution Scheme for a Survivable Storage System,
-
Technical Report CMU-CS-01-120, Carnegie Mellon Univ, May
-
J.J. Wylie et al., "Selecting the Right Data Distribution Scheme for a Survivable Storage System," Technical Report CMU-CS-01-120, Carnegie Mellon Univ., May 2001.
-
(2001)
-
-
Wylie, J.J.1
-
33
-
-
20444463494
-
FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI
-
Sept
-
G. Zheng, L. Shi, and L.V. Kalé, "FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI," Proc. Sixth IEEE Int'l Conf. Cluster Computing (Cluster '04), pp. 93-103, Sept. 2004.
-
(2004)
Proc. Sixth IEEE Int'l Conf. Cluster Computing (Cluster '04)
, pp. 93-103
-
-
Zheng, G.1
Shi, L.2
Kalé, L.V.3
|