-
1
-
-
84976789801
-
The recovery box: Using fast recovery to provide high availability in the UNIX environment
-
M. Baker and M. Sullivan, "The recovery box: Using fast recovery to provide high availability in the UNIX environment," in Proceedings of Summer USENIX Technical Conference, 1992.
-
(1992)
Proceedings of Summer USENIX Technical Conference
-
-
Baker, M.1
Sullivan, M.2
-
2
-
-
33746779994
-
MPICH-V: A multiprotocol automatic fault tolerant MPI
-
A. Bouteiller, T. Herault, G. Krawezik, P. Lemarinier, F. Cappello, "MPICH-V: A multiprotocol automatic fault tolerant MPI," International Journal of High Performance Computing and Applications, vol. 20(3), pp. 319-333, 2005.
-
(2005)
International Journal of High Performance Computing and Applications
, vol.20
, Issue.3
, pp. 319-333
-
-
Bouteiller, A.1
Herault, T.2
Krawezik, G.3
Lemarinier, P.4
Cappello, F.5
-
4
-
-
9144223280
-
Checkpointing for peta-scale systems: A look into the future of practical rollback-recovery
-
E. Elnozahy and J. Plank, "Checkpointing for peta-scale systems: A look into the future of practical rollback-recovery," IEEE Trans. on Dependable and Secure Computing, vol. 1(2), pp. 97-108, 2004.
-
(2004)
IEEE Trans. on Dependable and Secure Computing
, vol.1
, Issue.2
, pp. 97-108
-
-
Elnozahy, E.1
Plank, J.2
-
6
-
-
48049114689
-
Berkeley lab checkpoint/restart (BLCR) for Linux clusters
-
P. Hargrove and J. Duell, "Berkeley lab checkpoint/restart (BLCR) for Linux clusters," in Proceedings of SciDAC, 2006.
-
(2006)
Proceedings of SciDAC
-
-
Hargrove, P.1
Duell, J.2
-
8
-
-
53349100980
-
Adaptive fault management of parallel applications for high performance computing
-
in press
-
Z. Lan and Y. Li, "Adaptive fault management of parallel applications for high performance computing," IEEE Trans. on Computers, in press.
-
IEEE Trans. on Computers
-
-
Lan, Z.1
Li, Y.2
-
9
-
-
0028485392
-
Low-latency, concurrent checkpointing for parallel programs
-
K. Li, J. Naughton and J. Plank, "Low-latency, concurrent checkpointing for parallel programs," IEEE Trans. Parallel and Distributed Systems, vol. 5(8), pp. 874-879, 1994.
-
(1994)
IEEE Trans. Parallel and Distributed Systems
, vol.5
, Issue.8
, pp. 874-879
-
-
Li, K.1
Naughton, J.2
Plank, J.3
-
10
-
-
0035390088
-
A variational calculus approach to optimal checkpoint placement
-
Y. Ling, J. Mi and X. Lin, "A variational calculus approach to optimal checkpoint placement," IEEE Trans. Computers, vol. 50(7), pp. 699-708, 2001.
-
(2001)
IEEE Trans. Computers
, vol.50
, Issue.7
, pp. 699-708
-
-
Ling, Y.1
Mi, J.2
Lin, X.3
-
11
-
-
0345044000
-
Process migration
-
D. Milojičić, F. Douglis, Y. Paindaveine, R. Wheeler and S. Zhou, "Process migration," ACM Comput. Surv., vol. 32(3), pp. 241-299, 2000.
-
(2000)
ACM Comput. Surv
, vol.32
, Issue.3
, pp. 241-299
-
-
Milojičić, D.1
Douglis, F.2
Paindaveine, Y.3
Wheeler, R.4
Zhou, S.5
-
12
-
-
53349117725
-
-
NCSA web site
-
NCSA web site, http://teragrid.ncsa.uiuc.edu.
-
-
-
-
15
-
-
0004015896
-
Recovery-oriented computing (ROC): Motivation, definition, techniques, and case studies,
-
UCB//CSD-02-1175
-
D. Patterson et al., "Recovery-oriented computing (ROC): Motivation, definition, techniques, and case studies," UC Berkeley Computer Science Technical Report UCB//CSD-02-1175, 2002.
-
(2002)
UC Berkeley Computer Science Technical Report
-
-
Patterson, D.1
-
16
-
-
0033077475
-
Memory exclusion: Optimizing the performance of checkpointing systems
-
J. Plank, Y. Chen and K. Li and M. Beck and G. Kingsley, "Memory exclusion: Optimizing the performance of checkpointing systems," Software - Practice and Experience, vol. 29(2), pp. 125-142, 1999.
-
(1999)
Software - Practice and Experience
, vol.29
, Issue.2
, pp. 125-142
-
-
Plank, J.1
Chen, Y.2
Li, K.3
Beck, M.4
Kingsley, G.5
-
17
-
-
0032179680
-
Diskless checkpointing
-
J. Plank, K. Li and M. Puening, "Diskless checkpointing," IEEE Trans. Parallel and Distributed Systems, vol. 9(10), pp. 972-986, 1998.
-
(1998)
IEEE Trans. Parallel and Distributed Systems
, vol.9
, Issue.10
, pp. 972-986
-
-
Plank, J.1
Li, K.2
Puening, M.3
-
18
-
-
0035201417
-
Processor allocation and checkpoint interval selection in cluster computing systems
-
J. Plank and M. Thomason, "Processor allocation and checkpoint interval selection in cluster computing systems," Journal of Parallel and Distributed Computing, vol. 61(11), pp. 1570-1590, 2001.
-
(2001)
Journal of Parallel and Distributed Computing
, vol.61
, Issue.11
, pp. 1570-1590
-
-
Plank, J.1
Thomason, M.2
-
19
-
-
0033721199
-
The cost of recovery in message logging protocols
-
S. Rao, L. Alvisi and H. Vin, "The cost of recovery in message logging protocols," IEEE Trans. on Knowledge and Data Engineering, vol. 12(2), pp. 160-173, 2000.
-
(2000)
IEEE Trans. on Knowledge and Data Engineering
, vol.12
, Issue.2
, pp. 160-173
-
-
Rao, S.1
Alvisi, L.2
Vin, H.3
-
20
-
-
12444268355
-
On the feasibility of incremental checkpointing for scientific computing
-
J. Sancho, F. Petrini, G. Johnson, J. Fernandez and E. Frachtenberg, "On the feasibility of incremental checkpointing for scientific computing," in Proceedings of International Parallel and Distributed Processing Symposium, 2004.
-
(2004)
Proceedings of International Parallel and Distributed Processing Symposium
-
-
Sancho, J.1
Petrini, F.2
Johnson, G.3
Fernandez, J.4
Frachtenberg, E.5
-
21
-
-
53349127182
-
-
SPEC CPU 2006 benchmark website, http://www.spec.org/cpu2006/.
-
SPEC CPU 2006 benchmark website, http://www.spec.org/cpu2006/.
-
-
-
-
24
-
-
0029251277
-
The Condor distributed processing system
-
T. Tannenbaum and M. Litzkow, "The Condor distributed processing system," Dr. Dobb's Journal, vol. 227, pp. 40-48, 1995.
-
(1995)
Dr. Dobb's Journal
, vol.227
, pp. 40-48
-
-
Tannenbaum, T.1
Litzkow, M.2
-
25
-
-
0031388399
-
Impact of checkpoint latency on overhead ratio of a checkpointing scheme
-
N. Vaidya, "Impact of checkpoint latency on overhead ratio of a checkpointing scheme," IEEE Trans. on Computers, vol. 46(8), pp. 942-947, 1997.
-
(1997)
IEEE Trans. on Computers
, vol.46
, Issue.8
, pp. 942-947
-
-
Vaidya, N.1
-
26
-
-
84976846528
-
A first order approximation to the optimal checkpoint interval
-
J. Young, "A first order approximation to the optimal checkpoint interval," Comm. ACM, vol. 17(9), pp. 530-531, 1974.
-
(1974)
Comm. ACM
, vol.17
, Issue.9
, pp. 530-531
-
-
Young, J.1
-
27
-
-
12844271066
-
Dynamic tracking of page miss ratio curve for memory management
-
P. Zhou, V. Pandey, J. Sundaresan, A. Raghuraman, Y. Zhou and S. Kumar, "Dynamic tracking of page miss ratio curve for memory management," in Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems, 2004.
-
(2004)
Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems
-
-
Zhou, P.1
Pandey, V.2
Sundaresan, J.3
Raghuraman, A.4
Zhou, Y.5
Kumar, S.6
|