-
2
-
-
0003820750
-
An overview of checkpointing in uniprocessor and distributed systems, focusing on implementation and performance
-
J. S. Plank, "An Overview of Checkpointing in Uniprocessor and Distributed Systems, Focusing on Implementation and Performance," Technical Report of University of Tennessee, UT-CS, 1997.
-
(1997)
Technical Report of University of Tennessee
, vol.UT-CS
-
-
Plank, J.S.1
-
3
-
-
0022020346
-
Distributed snapshots: Determinining global states of distributed systems
-
M. Chandy, L. Lamport. "Distributed Snapshots: Determinining Global States of Distributed Systems," ACM Trans. on Computing Systems, 3(1), 1985.
-
(1985)
ACM Trans. on Computing Systems
, vol.3
, Issue.1
-
-
Chandy, M.1
Lamport, L.2
-
4
-
-
0023090161
-
Checkpointing and recovery rollback for distributed systems
-
R. Koo, S. Toueg, "Checkpointing and Recovery Rollback for Distributed Systems," IEEE Trans. on Software Engineering, Vol. SE-13, No.1, 1987.
-
(1987)
IEEE Trans. on Software Engineering
, vol.SE-13
, Issue.1
-
-
Koo, R.1
Toueg, S.2
-
6
-
-
0026869241
-
Analysis and modeling of correlated failures in multicomputer systems
-
D. Tang, R. K. Iyer, "Analysis and Modeling of Correlated Failures in Multicomputer Systems," IEEE Trans. on Computers, Vol. 41, Num. 5, 1992.
-
(1992)
IEEE Trans. on Computers
, vol.41
, Issue.5
-
-
Tang, D.1
Iyer, R.K.2
-
7
-
-
84976846528
-
A first order approximation to the optimum checkpoint interval
-
J. W. Young, "A First Order Approximation to the Optimum Checkpoint Interval," Communications of the ACM, Vol. 17, Num. 9, 1974.
-
(1974)
Communications of the ACM
, vol.17
, Issue.9
-
-
Young, J.W.1
-
10
-
-
0032597646
-
The average availability of parallel checkpointing systems and its importance in selecting runtime parameters
-
J. S. Plank, M. G. Thomason, "The Average Availability of Parallel Checkpointing Systems and Its Importance in Selecting Runtime Parameters," IEEE Proc. Int'l Symp. on fault-Tolerant Computing, 1999.
-
(1999)
IEEE Proc. Int'l Symp. on Fault-tolerant Computing
-
-
Plank, J.S.1
Thomason, M.G.2
-
11
-
-
9144223280
-
Checkpointing for peta-scale systems: A look into the future of practical rollback-recovery
-
E. N. Elnozahy, J. S. Plank, W. K. Fuchs, "Checkpointing for Peta-Scale Systems: A Look into the Future of Practical Rollback-Recovery," IEEE Trans. on Dependable and Secure Computing, Vol. 1, Num. 2, 2004.
-
(2004)
IEEE Trans. on Dependable and Secure Computing
, vol.1
, Issue.2
-
-
Elnozahy, E.N.1
Plank, J.S.2
Fuchs, W.K.3
-
13
-
-
0025467711
-
A bridging model for parallel computation
-
L. G. Valiant, "A Bridging Model for Parallel Computation" Communications of the ACM, Vol. 33, 1990
-
(1990)
Communications of the ACM
, vol.33
-
-
Valiant, L.G.1
-
15
-
-
0036504721
-
Models of parallel applications with large computation and I/O requirements
-
E. Rosti, et al., "Models of Parallel Applications with Large Computation and I/O Requirements," IEEE Trans. on Software Engineering, Vol.28, Num. 3, 2002.
-
(2002)
IEEE Trans. on Software Engineering
, vol.28
, Issue.3
-
-
Rosti, E.1
-
17
-
-
0002290354
-
The completion time of a job on multimode systems
-
G. Kulkarni, V. F. Nicola, K. S. Trivedi, "The Completion Time of a Job on Multimode Systems," Advances in Applied Probability, Vol 19, 1987.
-
(1987)
Advances in Applied Probability
, vol.19
-
-
Kulkarni, G.1
Nicola, V.F.2
Trivedi, K.S.3
-
20
-
-
0022734032
-
A measurement-based model for workload dependence of CPU errors
-
R. Iyer, D. Rossetti, "A Measurement-based Model for Workload Dependence of CPU Errors," IEEE Trans. on Computers, Vol. C-35, 1986.
-
(1986)
IEEE Trans. on Computers
, vol.C-35
-
-
Iyer, R.1
Rossetti, D.2
-
22
-
-
0033314330
-
IBM S/390 parallel enterprise server G5 fault tolerance: A historical perspective
-
L. Spainhower, T. A. Gregg, "IBM S/390 Parallel Enterprise Server G5 Fault Tolerance: A Historical Perspective," IBM Journal of Research and Development, Vol. 43, Num. 5/6, 1999.
-
(1999)
IBM Journal of Research and Development
, vol.43
, Issue.5-6
-
-
Spainhower, L.1
Gregg, T.A.2
-
25
-
-
27544480041
-
Modeling coordinated checkpointing for large-scale supercomputers
-
L. Wang et al., "Modeling Coordinated Checkpointing for Large-Scale Supercomputers," Technical Report of University of Illinois, 2005.
-
(2005)
Technical Report of University of Illinois
-
-
Wang, L.1
|