-
2
-
-
84870548923
-
An overview of the BlueGene/L supercomputer
-
N. Adiga, G. Almasi, G. Almasi, Y. Aridor, R. Barik, D. Beece, R. Bellofatto, G. Bhanot, R. Bickford, M. Blumrich et al., "An overview of the BlueGene/L supercomputer," in Supercomputing, ACM/IEEE 2002 Conference, 2002, pp. 60-60.
-
(2002)
Supercomputing, ACM/IEEE 2002 Conference
, pp. 60-60
-
-
Adiga, N.1
Almasi, G.2
Almasi, G.3
Aridor, Y.4
Barik, R.5
Beece, D.6
Bellofatto, R.7
Bhanot, G.8
Bickford, R.9
Blumrich, M.10
-
3
-
-
0037619265
-
Web search for a planet: The Google cluster architecture
-
L. Barroso, J. Dean, and U. Holzle, "Web search for a planet: The Google cluster architecture," IEEE micro, vol. 23, no. 2, pp. 22-28, 2003.
-
(2003)
IEEE Micro
, vol.23
, Issue.2
, pp. 22-28
-
-
Barroso, L.1
Dean, J.2
Holzle, U.3
-
5
-
-
0022020346
-
Distributed snapshots: Determining global states of distributed systems
-
M. Chandy and L. Lamport, "Distributed snapshots: Determining global states of distributed systems," ACM Transactions on Computing Systems (TCOS), vol. 3, no. 1, pp. 63-75, 1985.
-
(1985)
ACM Transactions on Computing Systems (TCOS)
, vol.3
, Issue.1
, pp. 63-75
-
-
Chandy, M.1
Lamport, L.2
-
6
-
-
77954948567
-
On Disk-based and Diskless Checkpointing for Parallel and Distributed Systems: An Empirical Analysis
-
N. Kofahi, S. Al-Bokhitan, and A. Al-Nazer, "On Disk-based and Diskless Checkpointing for Parallel and Distributed Systems: An Empirical Analysis," Information Technology Journal, vol. 4, no. 4, pp. 367-376, 2005.
-
(2005)
Information Technology Journal
, vol.4
, Issue.4
, pp. 367-376
-
-
Kofahi, N.1
Al-Bokhitan, S.2
Al-Nazer, A.3
-
7
-
-
0032179680
-
Diskless checkpointing
-
J. S. Plank, K. Li, and M. A. Puening, "Diskless checkpointing," IEEE Transactions on Parallel and Distributed Systems, vol. 9, no. 10, p. 972, 1998.
-
(1998)
IEEE Transactions on Parallel and Distributed Systems
, vol.9
, Issue.10
, pp. 972
-
-
Plank, J.S.1
Li, K.2
Puening, M.A.3
-
8
-
-
0031124071
-
Consistent global checkpoints that contain a given set of local checkpoints
-
Y. M. Wang, "Consistent global checkpoints that contain a given set of local checkpoints," IEEE Transactions on Computers, vol. 46, no. 4, pp. 456-468, 1997.
-
(1997)
IEEE Transactions on Computers
, vol.46
, Issue.4
, pp. 456-468
-
-
Wang, Y.M.1
-
9
-
-
27944436654
-
Optimal asynchronous garbage collection for rdt checkpointing protocols
-
R. Schmidt, I. C. Garcia, F. Pedone, and L. E. Buzato, "Optimal asynchronous garbage collection for rdt checkpointing protocols," in ICDCS '05: Proceedings of the 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05), 2005.
-
ICDCS '05: Proceedings of the 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05), 2005
-
-
Schmidt, R.1
Garcia, I.C.2
Pedone, F.3
Buzato, L.E.4
-
10
-
-
70449581816
-
-
University of Tennessee, Knoxville, TN, USA, Tech. Rep.
-
J. S. Plank and K. Li, "Faster checkpointing with n + 1 parity," University of Tennessee, Knoxville, TN, USA, Tech. Rep., 1993.
-
(1993)
Faster Checkpointing with N + 1 Parity
-
-
Plank, J.S.1
Li, K.2
-
12
-
-
0024641589
-
Efficient dispersal of information for security, load balancing, and fault tolerance
-
M. Rabin, "Efficient dispersal of information for security, load balancing, and fault tolerance," Journal of the ACM (JACM), vol. 36, no. 2, pp. 335-348, 1989.
-
(1989)
Journal of the ACM (JACM)
, vol.36
, Issue.2
, pp. 335-348
-
-
Rabin, M.1
-
14
-
-
0028994249
-
Algorithm-based diskless checkpointing for fault tolerant matrix operations
-
J. Plank, Y. Kim, and J. Dongarra, "Algorithm-based diskless checkpointing for fault tolerant matrix operations," in Fault-Tolerant Computing, 1995. FTCS-25. Digest of Papers., Twenty-Fifth International Symposium on, 1995, pp. 351-360.
-
(1995)
Fault-Tolerant Computing, 1995. FTCS-25. Digest of Papers., Twenty-Fifth International Symposium on
, pp. 351-360
-
-
Plank, J.1
Kim, Y.2
Dongarra, J.3
-
16
-
-
31844451082
-
Fault tolerant high performance computing by a coding approach
-
New York, NY, USA: ACM Press
-
Z. Chen, G. E. Fagg, E. Gabriel, J. Langou, T. Angskun, G. Bosilca, and J. Dongarra, "Fault tolerant high performance computing by a coding approach," in PPoPP '05: Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming. New York, NY, USA: ACM Press, 2005, pp. 213-223.
-
(2005)
PPoPP '05: Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
, pp. 213-223
-
-
Chen, Z.1
Fagg, G.E.2
Gabriel, E.3
Langou, J.4
Angskun, T.5
Bosilca, G.6
Dongarra, J.7
-
18
-
-
0032285092
-
Using two-level stable storage for efficient checkpointing
-
L. Silva and J. Silva, "Using two-level stable storage for efficient checkpointing," Software, IEE Proceedings, vol. 145, pp. 198-201, 1998.
-
(1998)
Software, IEE Proceedings
, vol.145
, pp. 198-201
-
-
Silva, L.1
Silva, J.2
-
20
-
-
0042078549
-
A survey of rollback-recovery protocols in message-passing systems
-
E. N. Elnozahy, D. Johnson, and Y.M.Yang, "A survey of rollback-recovery protocols in message-passing systems," ACM Comput. Surv., vol. 34, no. 3, pp. 375-408, 2002.
-
(2002)
ACM Comput. Surv.
, vol.34
, Issue.3
, pp. 375-408
-
-
Elnozahy, E.N.1
Johnson, D.2
Yang, Y.M.3
-
21
-
-
0017996760
-
Time, clocks, and the ordering of events in a distributed system
-
L. Lamport, "Time, clocks, and the ordering of events in a distributed system," Communications of the ACM, vol. 21, no. 7, pp. 558-565, 1978.
-
(1978)
Communications of the ACM
, vol.21
, Issue.7
, pp. 558-565
-
-
Lamport, L.1
-
23
-
-
0029255243
-
Necessary and sufficient conditions for consistent global snapshots
-
R. H. B. Netzer and J. Xu, "Necessary and sufficient conditions for consistent global snapshots," IEEE Transactions on Parallel and Distributed Systems, vol. 6, no. 2, pp. 165-169, 1995.
-
(1995)
IEEE Transactions on Parallel and Distributed Systems
, vol.6
, Issue.2
, pp. 165-169
-
-
Netzer, R.H.B.1
Xu, J.2
-
24
-
-
0029305383
-
Checkpoint space reclamation for uncoordinated checkpointing in message-passing systems
-
Y.-M. Wang, P.-Y. Chung, I.-J. Lin, and W. Fuchs, "Checkpoint space reclamation for uncoordinated checkpointing in message-passing systems," IEEE Transactions on Parallel and Distributed Systems, vol. 6, no. 5, pp. 546-554, 1995.
-
(1995)
IEEE Transactions on Parallel and Distributed Systems
, vol.6
, Issue.5
, pp. 546-554
-
-
Wang, Y.-M.1
Chung, P.-Y.2
Lin, I.-J.3
Fuchs, W.4
-
25
-
-
85084159983
-
Libckpt: Transparent checkpointing under unix
-
USENIX Association
-
J. S. Plank, M. Beck, G. Kingsley, and K. Li, "Libckpt: transparent checkpointing under unix," in USENIX 1995 Technical Conference Proceedings. USENIX Association, 1995.
-
(1995)
USENIX 1995 Technical Conference Proceedings
-
-
Plank, J.S.1
Beck, M.2
Kingsley, G.3
Li, K.4
|