-
2
-
-
31844451082
-
Fault tolerant high performance computing by a coding approach
-
ACM
-
Z. Chen, G. E. Fagg, E. Gabriel, J. Langou, T. Angskun, G. Bosilca, and J. Dongarra. Fault tolerant high performance computing by a coding approach. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2005, June 14-17, 2005, Chicago, IL, USA. ACM, 2005.
-
(2005)
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2005, June 14-17, 2005, Chicago, IL, USA
-
-
Chen, Z.1
Fagg, G.E.2
Gabriel, E.3
Langou, J.4
Angskun, T.5
Bosilca, G.6
Dongarra, J.7
-
3
-
-
0242658775
-
Self-adapting software for numerical linear algebra and LAPACK for clusters
-
November-December
-
Z. Chen, J. Dongarra, P. Luszczek, and K. Roche. Self-adapting software for numerical linear algebra and LAPACK for clusters. Parallel Computing, 29(11-12): 1723-1743, November-December 2003.
-
(2003)
Parallel Computing
, vol.29
, Issue.11-12
, pp. 1723-1743
-
-
Chen, Z.1
Dongarra, J.2
Luszczek, P.3
Roche, K.4
-
4
-
-
0029715009
-
Evaluation of checkpoint mechanisms for massively parallel machines
-
T. Chiueh and P. Deng. Evaluation of checkpoint mechanisms for massively parallel machines. In FTCS, pages 370-379,1996.
-
(1996)
FTCS
, pp. 370-379
-
-
Chiueh, T.1
Deng, P.2
-
5
-
-
34548728989
-
-
J. Dongarra, H. Meuer, and E. Strohmaier. TOP500 Supercomputer Sites, 28th edition. In Proceedings of the Supercomputing Conference (SC'2006), Pittsburgh PA, USA. ACM, 2006.
-
J. Dongarra, H. Meuer, and E. Strohmaier. TOP500 Supercomputer Sites, 28th edition. In Proceedings of the Supercomputing Conference (SC'2006), Pittsburgh PA, USA. ACM, 2006.
-
-
-
-
6
-
-
84940567900
-
FT-MPl: Fault tolerant MPI, supporting dynamic applications in a dynamic world
-
G. E. Fagg and J. Dongarra. FT-MPl: Fault tolerant MPI, supporting dynamic applications in a dynamic world. In PVM/MPI 2000, pages 346-353, 2000.
-
(2000)
PVM/MPI 2000
, pp. 346-353
-
-
Fagg, G.E.1
Dongarra, J.2
-
7
-
-
33646110228
-
Extending the MPI specification for process fault tolerance on high performance computing systems
-
G. E. Fagg, E. Gabriel, G. Bosilca, T. Angskun, Z. Chen, J. Pjesivac-Grbovic, K. London, and J. J. Dongarra. Extending the MPI specification for process fault tolerance on high performance computing systems. In Proceedings of the International Supercomputer Conference, Heidelberg, Germany, 2004.
-
(2004)
Proceedings of the International Supercomputer Conference, Heidelberg, Germany
-
-
Fagg, G.E.1
Gabriel, E.2
Bosilca, G.3
Angskun, T.4
Chen, Z.5
Pjesivac-Grbovic, J.6
London, K.7
Dongarra, J.J.8
-
8
-
-
33847184145
-
Process fault-tolerance: Semantics, design and applications for high performance computing
-
G. E. Fagg, E. Gabriel, Z. Chen, , T. Angskun, G. Bosilca, J. Pjesivac-Grbovic, and J. J. Dongarra. Process fault-tolerance: Semantics, design and applications for high performance computing. Submitted to International Journal of High Performance Computing Applications, 2004.
-
(2004)
Submitted to International Journal of High Performance Computing Applications
-
-
Fagg, G.E.1
Gabriel, E.2
Chen, Z.3
Angskun, T.4
Bosilca, G.5
Pjesivac-Grbovic, J.6
Dongarra, J.J.7
-
11
-
-
0028060943
-
Faster checkpointing with n+1 parity
-
J. S. Plank and K. Li. Faster checkpointing with n+1 parity. In FTCS, pages 288-297, 1994.
-
(1994)
FTCS
, pp. 288-297
-
-
Plank, J.S.1
Li, K.2
-
12
-
-
0030392072
-
Improving the Performance of Coordinated Checkpointers on Networks of Workstations using RAID Techniques
-
J. S. Plank. Improving the Performance of Coordinated Checkpointers on Networks of Workstations using RAID Techniques. In 15th Symposium on Reliable Distributed Systems, pages 76-85, 1996.
-
(1996)
15th Symposium on Reliable Distributed Systems
, pp. 76-85
-
-
Plank, J.S.1
-
13
-
-
0032179680
-
Diskless checkpointing
-
J. S. Plank, K. Li, and M. A. Puening. Diskless checkpointing. IEEE Trans. Parallel Distrib. Syst., 9(10):972-986, 1998.
-
(1998)
IEEE Trans. Parallel Distrib. Syst
, vol.9
, Issue.10
, pp. 972-986
-
-
Plank, J.S.1
Li, K.2
Puening, M.A.3
-
14
-
-
84864756973
-
An experimental study about diskless checkpointing
-
L. M. Silva and J. G. Silva. An experimental study about diskless checkpointing. In EUROMICRO'98, pages 395-402, 1998.
-
(1998)
EUROMICRO'98
, pp. 395-402
-
-
Silva, L.M.1
Silva, J.G.2
-
15
-
-
0345442370
-
A case for two-level recovery schemes
-
N. H. Vaidya. A case for two-level recovery schemes. IEEE Trans. Computers, 47(6):656-666, 1998.
-
(1998)
IEEE Trans. Computers
, vol.47
, Issue.6
, pp. 656-666
-
-
Vaidya, N.H.1
|