-
1
-
-
85060036181
-
The validity of the single processor approach to achieving large scale computing capabilities
-
AFIPS Press
-
G. Amdahl. The validity of the single processor approach to achieving large scale computing capabilities. In AFIPS Conference Proceedings, volume 30, pages 483-485. AFIPS Press, 1967.
-
(1967)
AFIPS Conference Proceedings
, vol.30
, pp. 483-485
-
-
Amdahl, G.1
-
3
-
-
0003615167
-
-
SIAM
-
L. S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley. ScaLAPACK Users'Guide. SIAM, 1997.
-
(1997)
ScaLAPACK Users'Guide
-
-
Blackford, L.S.1
Choi, J.2
Cleary, A.3
D'Azevedo, E.4
Demmel, J.5
Dhillon, I.6
Dongarra, J.7
Hammarling, S.8
Henry, G.9
Petitet, A.10
Stanley, K.11
Walker, D.12
Whaley, R.C.13
-
4
-
-
83155171268
-
Jaguar: The world's most powerful computer
-
A. Bland, R. Kendall, D. Kothe, J. Rogers, and G. Shipman. Jaguar: The World's Most Powerful Computer. In GUC'2009, 2009.
-
(2009)
GUC'2009
-
-
Bland, A.1
Kendall, R.2
Kothe, D.3
Rogers, J.4
Shipman, G.5
-
5
-
-
83155195316
-
Checkpointing strategies for parallel jobs
-
France, Jan. Available at
-
M. Bougeret, H. Casanova, M. Rabie, Y. Robert, and F. Vivien. Checkpointing strategies for parallel jobs. Research Report 7520, INRIA, France, Jan. 2011. Available at http://graal.ens-lyon.fr/~fvivien/.
-
(2011)
Research Report 7520, INRIA
-
-
Bougeret, M.1
Casanova, H.2
Rabie, M.3
Robert, Y.4
Vivien, F.5
-
6
-
-
77955097389
-
A exible checkpoint/restart model in distributed systems
-
volume 6067 of LNCS
-
M.-S. Bouguerra, T. Gautier, D. Trystram, and J.-M. Vincent. A exible checkpoint/restart model in distributed systems. In PPAM, volume 6067 of LNCS, pages 206-215, 2010.
-
(2010)
PPAM
, pp. 206-215
-
-
Bouguerra, M.-S.1
Gautier, T.2
Trystram, D.3
Vincent, J.-M.4
-
8
-
-
78649559128
-
Checkpointing vs. Migration for post-petascale supercomputers
-
IEEE Computer Society Press
-
F. Cappello, H. Casanova, and Y. Robert. Checkpointing vs. migration for post-petascale supercomputers. In ICPP'2010. IEEE Computer Society Press, 2010.
-
(2010)
ICPP'2010
-
-
Cappello, F.1
Casanova, H.2
Robert, Y.3
-
9
-
-
0035266102
-
Proactive management of software aging
-
V. Castelli, R. E. Harper, P. Heidelberger, S. W. Hunter, K. S. Trivedi, K. Vaidyanathan, and W. P. Zeggert. Proactive management of software aging. IBM J. Res. Dev., 45(2):311-332, 2001. (Pubitemid 32736915)
-
(2001)
IBM Journal of Research and Development
, vol.45
, Issue.2
, pp. 311-332
-
-
Castelli, V.1
Harper, R.E.2
Heidelberger, P.3
Hunter, S.W.4
Trivedi, K.S.5
Vaidyanathan, K.6
Zeggert, W.P.7
-
10
-
-
28044460018
-
A higher order estimate of the optimum checkpoint interval for restart dumps
-
DOI 10.1016/j.future.2004.11.016, PII S0167739X04002213
-
J. T. Daly. A higher order estimate of the optimum checkpoint interval for restart dumps. Future Generation Computer Systems, 22(3):303-312, 2004. (Pubitemid 41689812)
-
(2006)
Future Generation Computer Systems
, vol.22
, Issue.3
, pp. 303-312
-
-
Daly, J.T.1
-
11
-
-
70450159193
-
The international exascale software project: A call to cooperative action by the global high-performance community
-
J. Dongarra, P. Beckman, P. Aerts, F. Cappello, T. Lippert, S. Matsuoka, P. Messina, T. Moore, R. Stevens, A. Trefethen, and M. Valero. The international exascale software project: a call to cooperative action by the global high-performance community. Int. J. High Perform. Comput. Appl., 23(4):309-322, 2009.
-
(2009)
Int. J. High Perform. Comput. Appl.
, vol.23
, Issue.4
, pp. 309-322
-
-
Dongarra, J.1
Beckman, P.2
Aerts, P.3
Cappello, F.4
Lippert, T.5
Matsuoka, S.6
Messina, P.7
Moore, T.8
Stevens, R.9
Trefethen, A.10
Valero, M.11
-
12
-
-
0042078549
-
A survey of rollback-recovery protocols in message-passing systems
-
E. N. M. Elnozahy, L. Alvisi, Y.-M. Wang, and D. B. Johnson. A survey of rollback-recovery protocols in message-passing systems. ACM Computing Survey, 34:375-408, 2002.
-
(2002)
ACM Computing Survey
, vol.34
, pp. 375-408
-
-
Elnozahy, E.N.M.1
Alvisi, L.2
Wang, Y.-M.3
Johnson, D.B.4
-
13
-
-
0036041277
-
Improving cluster availability using workstation validation
-
T. Heath, R. P. Martin, and T. D. Nguyen. Improving cluster availability using workstation validation. SIGMETRICS Perf. Eval. Rev., 30(1):217-227, 2002. (Pubitemid 35009524)
-
(2002)
Performance Evaluation Review
, vol.30
, Issue.1
, pp. 217-227
-
-
Heath, T.1
Martin, R.P.2
Nguyen, T.D.3
-
14
-
-
51049086184
-
Scalable group-based checkpoint/restart for large-scale message-passing systems
-
IEEE
-
J. Ho, C. Wang, and F. Lau. Scalable group-based checkpoint/restart for large-scale message-passing systems. In Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on, pages 1-12. IEEE, 2008.
-
(2008)
Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on
, pp. 1-12
-
-
Ho, J.1
Wang, C.2
Lau, F.3
-
15
-
-
78650009816
-
Impact of sub-optimal checkpoint intervals on application efficiency in computational clusters
-
ACM
-
W. Jones, J. Daly, and N. DeBardeleben. Impact of sub-optimal checkpoint intervals on application efficiency in computational clusters. In HPDC'10, pages 276-279. ACM, 2010.
-
(2010)
HPDC'10
, pp. 276-279
-
-
Jones, W.1
Daly, J.2
DeBardeleben, N.3
-
16
-
-
0028994247
-
Software rejuvenation: Analysis, module and applications
-
Washington, DC, USA, IEEE CS
-
N. Kolettis and N. D. Fulton. Software rejuvenation: Analysis, module and applications. In FTCS'95, page 381, Washington, DC, USA, 1995. IEEE CS.
-
(1995)
FTCS'95
, pp. 381
-
-
Kolettis, N.1
Fulton, N.D.2
-
17
-
-
77954903245
-
The failure trace archive: Enabling comparative analysis of failures in diverse distributed systems
-
0
-
D. Kondo, B. Javadi, A. Iosup, and D. Epema. The failure trace archive: Enabling comparative analysis of failures in diverse distributed systems. Cluster Computing and the Grid, IEEE International Symposium on, 0:398-407, 2010.
-
(2010)
Cluster Computing and the Grid, IEEE International Symposium on
, pp. 398-407
-
-
Kondo, D.1
Javadi, B.2
Iosup, A.3
Epema, D.4
-
18
-
-
0023995854
-
Computing optimal checkpointing strategies for rollback and recovery systems
-
P. L'Ecuyer and J. Malenfant. Computing optimal checkpointing strategies for rollback and recovery systems. IEEE Transactions on computers, 37(4):491-496, 2002.
-
(2002)
IEEE Transactions on Computers
, vol.37
, Issue.4
, pp. 491-496
-
-
L'Ecuyer, P.1
Malenfant, J.2
-
19
-
-
0035390088
-
A variational calculus approach to optimal checkpoint placement
-
DOI 10.1109/12.936236
-
Y. Ling, J. Mi, and X. Lin. A variational calculus approach to optimal checkpoint placement. IEEE Transactions on computers, pages 699-708, 2001. (Pubitemid 32720123)
-
(2001)
IEEE Transactions on Computers
, vol.50
, Issue.7
, pp. 699-708
-
-
Ling, Y.1
Mi, J.2
Lin, X.3
-
20
-
-
51049108820
-
An optimal checkpoint/restart model for a large scale high performance computing system
-
IEEE
-
Y. Liu, R. Nassar, C. Leangsuksun, N. Naksinehaboon, M. Paun, and S. Scott. An optimal checkpoint/restart model for a large scale high performance computing system. In IPDPS 2008, pages 1-9. IEEE, 2008.
-
(2008)
IPDPS 2008
, pp. 1-9
-
-
Liu, Y.1
Nassar, R.2
Leangsuksun, C.3
Naksinehaboon, N.4
Paun, M.5
Scott, S.6
-
22
-
-
78650831692
-
Design, modeling, and evaluation of a scalable multi-level checkpointing system
-
A. Moody, G. Bronevetsky, K. Mohror, and B. R. d. Supinski. Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System. In Proceedings of the ACM/IEEE SC Conference, pages 1-11, 2010.
-
(2010)
Proceedings of the ACM/IEEE SC Conference
, pp. 1-11
-
-
Moody, A.1
Bronevetsky, G.2
Mohror, K.3
Supinski, B.R.D.4
-
23
-
-
33646721605
-
Distribution-free checkpoint placement algorithms based on min-max principle
-
T. Ozaki, T. Dohi, H. Okamura, and N. Kaio. Distribution-free checkpoint placement algorithms based on min-max principle. IEEE TDSC, pages 130-140, 2006.
-
(2006)
IEEE TDSC
, pp. 130-140
-
-
Ozaki, T.1
Dohi, T.2
Okamura, H.3
Kaio, N.4
-
27
-
-
84976696875
-
Performance analysis of checkpointing strategies
-
A. Tantawi and M. Ruschitzka. Performance analysis of checkpointing strategies. ACM TOCS, 2(2):123-144, 1984.
-
(1984)
ACM TOCS
, vol.2
, Issue.2
, pp. 123-144
-
-
Tantawi, A.1
Ruschitzka, M.2
-
28
-
-
0021473687
-
On the optimum checkpoint selection problem
-
S. Toueg and O. Babaoglu. On the optimum checkpoint selection problem. SIAM J. Computing, 13(3):630-649, 1984.
-
(1984)
SIAM J. Computing
, vol.13
, Issue.3
, pp. 630-649
-
-
Toueg, S.1
Babaoglu, O.2
-
29
-
-
83155195315
-
Analysis of dependencies of checkpoint cost and checkpoint interval of fault tolerant MPI applications
-
K. Venkatesh. Analysis of Dependencies of Checkpoint Cost and Checkpoint Interval of Fault Tolerant MPI Applications. Analysis, 2(08):2690-2697, 2010.
-
(2010)
Analysis
, vol.2
, Issue.8
, pp. 2690-2697
-
-
Venkatesh, K.1
-
30
-
-
27544513113
-
Modeling coordinated checkpointing for large-scale supercomputers
-
Proceedings - 2005 International Conference on Dependable Systems and Networks
-
L. Wang, P. Karthik, Z. Kalbarczyk, R. Iyer, L. Votta, C. Vick, and A. Wood. Modeling Coordinated Checkpointing for Large-Scale Supercomputers. In Proc. of the International Conference on Dependable Systems and Networks, pages 812-821, June 2005. (Pubitemid 41538294)
-
(2005)
Proceedings of the International Conference on Dependable Systems and Networks
, pp. 812-821
-
-
Wang, L.1
Pattabiraman, K.2
Kalbarczyk, Z.3
Iyer, R.K.4
Votta, L.5
Vick, C.6
Wood, A.7
-
31
-
-
84976846528
-
A first order approximation to the optimum checkpoint interval
-
J. W. Young. A first order approximation to the optimum checkpoint interval. Communications of the ACM, 17(9):530-531, 1974.
-
(1974)
Communications of the ACM
, vol.17
, Issue.9
, pp. 530-531
-
-
Young, J.W.1
|