-
1
-
-
0027802111
-
-
A. C. Palaniswamy, and P. A. Wilsey, An analytical comparison of periodic checkpointing and incremental state saving. In Proc. of the Seventh Workshop on Parallel and Distributed Simulation San Diego, California, United States, May 16 - 19, 1993. R. Bagrodia and D. Jefferson, Eds. PADS '93. ACM Press, New York, NY, pp. 127-134.
-
A. C. Palaniswamy, and P. A. Wilsey, "An analytical comparison of periodic checkpointing and incremental state saving". In Proc. of the Seventh Workshop on Parallel and Distributed Simulation (San Diego, California, United States, May 16 - 19, 1993. R. Bagrodia and D. Jefferson, Eds. PADS '93. ACM Press, New York, NY, pp. 127-134.
-
-
-
-
2
-
-
34547424386
-
Cooperative checkpointing: A robust approach to large-scale systems reliability
-
Cairns, Australia, June
-
A. J. Oliner, L. Rudolph, R. K. Sahoo, "Cooperative checkpointing: a robust approach to large-scale systems reliability". In Proc. of the 20th Annual International Conference on Supercomputing (ICS), Cairns, Australia, June 2006, pp.14-23.
-
(2006)
Proc. of the 20th Annual International Conference on Supercomputing (ICS)
, pp. 14-23
-
-
Oliner, A.J.1
Rudolph, L.2
Sahoo, R.K.3
-
4
-
-
33751062653
-
The overhead model of wordlevel and page-level incremental checkpointing
-
April 23-27, Dijon, France, pp
-
J. Heo, Y. Cho, G. Jeon , H. Kimm, "The overhead model of wordlevel and page-level incremental checkpointing". Proc. of the 2006 ACM symposium on Applied computing, April 23-27, 2006, Dijon, France, pp.1493-1294.
-
(2006)
Proc. of the 2006 ACM symposium on Applied computing
, pp. 1493-1294
-
-
Heo, J.1
Cho, Y.2
Jeon, G.3
Kimm, H.4
-
5
-
-
85084159983
-
Transparent checkpointing under UNIX
-
J.S. Plank, M. Beck, G. Kingsley, and K. Li, "Transparent checkpointing under UNIX". In Proceedings of the USENIX Winter 1995 Technical Conference, pp. 213-223.
-
Proceedings of the USENIX Winter 1995 Technical Conference
, pp. 213-223
-
-
Plank, J.S.1
Beck, M.2
Kingsley, G.3
Li, K.4
-
6
-
-
0004097019
-
Compressed differences: An algorithm for fast incremental checkpointing
-
Technical Report CS-95-302, University of Tennessee at Knoxville
-
J.S. Plank, J. Xu, and R.H. Netzer, 1995a. "Compressed differences: an algorithm for fast incremental checkpointing". Technical Report CS-95-302, University of Tennessee at Knoxville.
-
(1995)
-
-
Plank, J.S.1
Xu, J.2
Netzer, R.H.3
-
7
-
-
50649089750
-
-
J.C. Sancho, F. Petrini, G. Johnson, E. Frachtenberg, On the feasibility of incremental checkpointing for scientific computing, Parallel and Distributed Processing Symposium, 2004. Proc. 18th International, pp. 26-30.
-
J.C. Sancho, F. Petrini, G. Johnson, E. Frachtenberg, "On the feasibility of incremental checkpointing for scientific computing, "Parallel and Distributed Processing Symposium, 2004. Proc. 18th International, vol., pp. 26-30.
-
-
-
-
8
-
-
84976846528
-
A first-order approximation to the optimum checkpoint interval
-
Sept
-
J.W. Young, "A first-order approximation to the optimum checkpoint interval," Communications of. ACM 17, 9 (Sept 1974), pp. 530-531.
-
(1974)
Communications of. ACM
, vol.17
, Issue.9
, pp. 530-531
-
-
Young, J.W.1
-
9
-
-
0028485392
-
Low-latency, concurrent checkpointing for parallel programs
-
Aug
-
K. Li, J. F. Naughton, and J. S. Plank, "Low-latency, concurrent checkpointing for parallel programs". IEEE Transactions on Parallel and Distributed Systems, vol. 5, Aug. 1994.
-
(1994)
IEEE Transactions on Parallel and Distributed Systems
, vol.5
-
-
Li, K.1
Naughton, J.F.2
Plank, J.S.3
-
11
-
-
0031388399
-
Impact of checkpoint latency on overhead ratio of a checkpointing scheme
-
Aug
-
N. H. Vaidya, "Impact of checkpoint latency on overhead ratio of a checkpointing scheme". IEEE Transactions on Computer Vol.46 no.8, Aug. 1997, pp. 942-947.
-
(1997)
IEEE Transactions on Computer
, vol.46
, Issue.8
, pp. 942-947
-
-
Vaidya, N.H.1
-
13
-
-
33845434226
-
-
R. Gioiosa, J.C. Sancho, S. Jiang, F. Petrini, Transparent, incremental checkpointing at kernel level: a foundation for fault tolerance for parallel computers. Supercomputing, 2005. Proceedings of the ACM/IEEE SC 2005 Conference, pp. 9, 12-18.
-
R. Gioiosa, J.C. Sancho, S. Jiang, F. Petrini, "Transparent, incremental checkpointing at kernel level: a foundation for fault tolerance for parallel computers". Supercomputing, 2005. Proceedings of the ACM/IEEE SC 2005 Conference, vol., pp. 9, 12-18.
-
-
-
-
14
-
-
77952378080
-
Critical event prediction for proactive management in large-scale computer clusters
-
Pages:, Year of Publication: ISBN:1-58113-737-0
-
R. K. Sahoo, A. J. Oliner, I. Rish, M. Gupta, J.E. Moreira, S. Ma. "Critical event prediction for proactive management in large-scale computer clusters" International conference on Knowledge discovery and data mining Pages: 426-435, Year of Publication:2003 ISBN:1-58113-737-0.
-
(2003)
International conference on Knowledge discovery and data mining
, pp. 426-435
-
-
Sahoo, R.K.1
Oliner, A.J.2
Rish, I.3
Gupta, M.4
Moreira, J.E.5
Ma, S.6
-
15
-
-
8344232253
-
Adaptive incremental checkpointing on massively parallel systems
-
June 26, July 1
-
S. Agarwal, R. Garg, M. S. Gupta, J. Moreira. "Adaptive incremental checkpointing on massively parallel systems". In Proc. of 18th Annual ACM International Conference of Supercomputing (ICS'04), June 26 - July 1, 2004, pp. 277-286.
-
(2004)
Proc. of 18th Annual ACM International Conference of Supercomputing (ICS'04)
, pp. 277-286
-
-
Agarwal, S.1
Garg, R.2
Gupta, M.S.3
Moreira, J.4
-
16
-
-
0003778293
-
-
Wiley; 2nd edition January, ISBN-10: 0471120626
-
S. M. Ross. "Stochastic Processes" Wiley; 2nd edition (January 1995), ISBN-10: 0471120626
-
(1995)
Stochastic Processes
-
-
Ross, S.M.1
-
17
-
-
20444444457
-
The LAM/MPI checkpoint/restart framework: System-initiated checkpoint
-
Santa Fe, NM. October
-
S. Sankaran, J. M. Squyres, B. Barrett, A. Lumsdaine, J. Duell, P. Hargrove, and E. Roman. "The LAM/MPI checkpoint/restart framework: system-initiated checkpoint". The 2003 Los Alamos Computer Science Institute Symposium, Santa Fe, NM. October 2003.
-
(2003)
The 2003 Los Alamos Computer Science Institute Symposium
-
-
Sankaran, S.1
Squyres, J.M.2
Barrett, B.3
Lumsdaine, A.4
Duell, J.5
Hargrove, P.6
Roman, E.7
-
18
-
-
50649103655
-
-
Louisiana Tech University, Ruston, LA, USA, May
-
Y. Liu, "Reliability-Aware Optimal Checkpoint/Restart Model in High Performance Computing. PhD thesis," Louisiana Tech University, Ruston, LA, USA, May. 2007.
-
(2007)
Reliability-Aware Optimal Checkpoint/Restart Model in High Performance Computing
-
-
Liu, Y.1
-
19
-
-
50649087193
-
A Reliability-aware Approach for an Optimal Checkpoint/Restart Model in HPC Environments
-
Austin, Texas
-
Y. Liu, R. Nassar, C. Leangsuksun, N. Naksinehaboon, M. Paun, S. Scott, "A Reliability-aware Approach for an Optimal Checkpoint/Restart Model in HPC Environments", Refereed proceeding of the IEEE Cluster Conference Austin, Texas, 2007, pp. 452-457
-
(2007)
Refereed proceeding of the IEEE Cluster Conference
, pp. 452-457
-
-
Liu, Y.1
Nassar, R.2
Leangsuksun, C.3
Naksinehaboon, N.4
Paun, M.5
Scott, S.6
|