-
2
-
-
0003473816
-
-
2nd Edition. SIAM, Philadelphia, PA
-
R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Rominc, and H. V. der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd Edition. SIAM, Philadelphia, PA, 1994.
-
(1994)
Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods
-
-
Barrett, R.1
Berry, M.2
Chan, T.F.3
Demmel, J.4
Donato, J.5
Dongarra, J.6
Eijkhout, V.7
Pozo, R.8
Rominc, C.9
der Vorst, H.V.10
-
3
-
-
31844451082
-
Fault Tolerant High Performance Computing by a Coding Approach
-
Chicago, Illinois, USA, June 15-17
-
Z. Chen, G. E. Fagg, E. Gabriel, J. Langou, T. Angskun, G. Bosilca, and J. Dongarra. Fault Tolerant High Performance Computing by a Coding Approach. Proceeding of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'05), Chicago, Illinois, USA, June 15-17, 2005.
-
(2005)
Proceeding of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'05)
-
-
Chen, Z.1
Fagg, G.E.2
Gabriel, E.3
Langou, J.4
Angskun, T.5
Bosilca, G.6
Dongarra, J.7
-
4
-
-
0029715009
-
Evaluation of checkpoint mechanisms for massively parallel machines
-
T. cker Chiueh and P. Deng. Evaluation of checkpoint mechanisms for massively parallel machines. In FTCS, pages 370-379, 1996.
-
(1996)
FTCS
, pp. 370-379
-
-
cker Chiueh, T.1
Deng, P.2
-
5
-
-
58449099052
-
-
J. Dongarra, H. Meuer, and E. Strohmaier. TOP500 Supercomputer Sites, 24th edition. In Proceedings of the Supercomputing Conference (SC'2004), Pittsburgh PA, USA. ACM, 2004.
-
J. Dongarra, H. Meuer, and E. Strohmaier. TOP500 Supercomputer Sites, 24th edition. In Proceedings of the Supercomputing Conference (SC'2004), Pittsburgh PA, USA. ACM, 2004.
-
-
-
-
6
-
-
84940567900
-
FT-MPI: Fault tolerant MPI, supporting dynamic applications in a dynamic world
-
G. E. Fagg and J. Dongarra. FT-MPI: Fault tolerant MPI, supporting dynamic applications in a dynamic world. In PVM/MPI 2000, pages 346-353, 2000.
-
(2000)
PVM/MPI 2000
, pp. 346-353
-
-
Fagg, G.E.1
Dongarra, J.2
-
7
-
-
33646110228
-
Extending the MPI specification for process fault tolerance on high performance computing systems
-
Germany
-
G. E. Fagg, E. Gabriel, G. Bosilca, T. Angskun, Z. Chen, J. Pjesivac-Grbovic, K. London, and J. J. Dongarra. Extending the MPI specification for process fault tolerance on high performance computing systems. In Proceedings of the International Supercomputer Conference. Heidelberg, Germany, 2004.
-
(2004)
Proceedings of the International Supercomputer Conference. Heidelberg
-
-
Fagg, G.E.1
Gabriel, E.2
Bosilca, G.3
Angskun, T.4
Chen, Z.5
Pjesivac-Grbovic, J.6
London, K.7
Dongarra, J.J.8
-
8
-
-
27844508605
-
Process fault-tolerance: Semantics, design and applications for high performance computing
-
Winter
-
G. E. Fagg, E. Gabriel, Z. Chen, , T. Angskun, G. Bosilca, J. Pjesivac-Grbovic, and J. J. Dongarra. Process fault-tolerance: Semantics, design and applications for high performance computing. International Journal of High Performance Computing Applications, Volume 19, Number 4, Page 465-477, Winter, 2005.
-
(2005)
International Journal of High Performance Computing Applications
, vol.19
, Issue.4
, pp. 465-477
-
-
Fagg, G.E.1
Gabriel, E.2
Chen, Z.3
Angskun, T.4
Bosilca, G.5
Pjesivac-Grbovic, J.6
Dongarra, J.J.7
-
9
-
-
0018454850
-
On the optimum checkpoint interval
-
E. Gelenbc. On the optimum checkpoint interval. J. ACM, 26(2):259-270, 1979.
-
(1979)
J. ACM
, vol.26
, Issue.2
, pp. 259-270
-
-
Gelenbc, E.1
-
12
-
-
0031223146
-
A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems
-
September
-
J. S. Plank. A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems. Software -Practice & Experience, 27(9):995-1012, September 1997.
-
(1997)
Software -Practice & Experience
, vol.27
, Issue.9
, pp. 995-1012
-
-
Plank, J.S.1
-
13
-
-
0031570636
-
Fault-tolerant matrix operations for networks of workstations using diskless checkpointing
-
J. S. Plank, Y. Kim, and J. Dongarra. Fault-tolerant matrix operations for networks of workstations using diskless checkpointing. J. Parallel Distrib. Comput., 43(2):125-138, 1997.
-
(1997)
J. Parallel Distrib. Comput
, vol.43
, Issue.2
, pp. 125-138
-
-
Plank, J.S.1
Kim, Y.2
Dongarra, J.3
-
14
-
-
0028060943
-
Faster checkpointing with n+1 parity
-
J. S. Plank and K. Li. Faster checkpointing with n+1 parity. In FTCS, pages 288-297, 1994.
-
(1994)
FTCS
, pp. 288-297
-
-
Plank, J.S.1
Li, K.2
-
15
-
-
0032179680
-
Diskless checkpointing
-
J. S. Plank, K. Li, and M. A. Puening. Diskless checkpointing. IEEE Trans. Parallel Distrib. Syst., 9(10):972-986, 1998.
-
(1998)
IEEE Trans. Parallel Distrib. Syst
, vol.9
, Issue.10
, pp. 972-986
-
-
Plank, J.S.1
Li, K.2
Puening, M.A.3
-
16
-
-
0035201417
-
Processor allocation and checkpoint interval selection in cluster computing systems
-
November
-
J. S. Plank and M. G. Thomason. Processor allocation and checkpoint interval selection in cluster computing systems. J. Parallel Distrib. Comput., 61(11):1570-1590, November 2001.
-
(2001)
J. Parallel Distrib. Comput
, vol.61
, Issue.11
, pp. 1570-1590
-
-
Plank, J.S.1
Thomason, M.G.2
-
17
-
-
84864756973
-
An experimental study about diskless checkpointing
-
L. M. Silva and J. G. Silva. An experimental study about diskless checkpointing. In EUROMI-CRO'98, pages 395-402, 1998.
-
(1998)
EUROMI-CRO'98
, pp. 395-402
-
-
Silva, L.M.1
Silva, J.G.2
-
18
-
-
84976846528
-
A first order approximation to the optimal checkpoint interval
-
J. W. Young. A first order approximation to the optimal checkpoint interval. Commun. ACM, 17(9):530-531, 1974.
-
(1974)
Commun. ACM
, vol.17
, Issue.9
, pp. 530-531
-
-
Young, J.W.1
|