-
1
-
-
33746779994
-
MPICH-V: A multiprotocol fault tolerant MPI
-
A. Bouteiller, T. Herault, G. Krawezik, P. Lemarinier, and F. Cappello. MPICH-V: A multiprotocol fault tolerant MPI. International Journal of High Performance Computing and Applications, 20(3):319-333, 2006.
-
(2006)
International Journal of High Performance Computing and Applications
, vol.20
, Issue.3
, pp. 319-333
-
-
Bouteiller, A.1
Herault, T.2
Krawezik, G.3
Lemarinier, P.4
Cappello, F.5
-
4
-
-
74549218304
-
-
Cray Inc, Seattle, documentation. URL
-
Cray Inc., Seattle, WA, USA. Cray XT4 documentation. URL http://www.cray.com/products/xt5.
-
Cray XT4
-
-
WA, U.S.A.1
-
5
-
-
50649107313
-
Application MTTFE vs. platform MTTF: A fresh perspective on system reliability and application throughput for computations at scale
-
Lyon, France, May
-
J. T Daly, L. A. Pritchett-Sheats, and S. E. Michalak. Application MTTFE vs. platform MTTF: A fresh perspective on system reliability and application throughput for computations at scale. In Proceedings of the IEEE International Symposium on Cluster Computing and the Grid, Lyon, France, May 2008.
-
(2008)
Proceedings of the IEEE International Symposium on Cluster Computing and the Grid
-
-
Daly, J.T.1
Pritchett-Sheats, L.A.2
Michalak, S.E.3
-
7
-
-
70349089035
-
Proactive fault tolerance using preemptive migration
-
Weimar, Germany, Feb
-
C. Engelmann, G. R. Vallée, T. Naughton, and S. L. Scott. Proactive fault tolerance using preemptive migration. In Proceedings of the Euromicro International Conference on Parallel, Distributed, and network-based Processing, Weimar, Germany, Feb. 2009.
-
(2009)
Proceedings of the Euromicro International Conference on Parallel, Distributed, and network-based Processing
-
-
Engelmann, C.1
Vallée, G.R.2
Naughton, T.3
Scott, S.L.4
-
8
-
-
33845434226
-
Transparent, incremental checkpointing at kernel level: A foundation for fault tolerance for parallel computers
-
Seattle, WA, USA, Nov
-
R. Gioiosa, J. C. Sancho, S. Jiang, and F. Petrini. Transparent, incremental checkpointing at kernel level: A foundation for fault tolerance for parallel computers. In Proceedings of the IEEE/ACM International Conference on High Performance Computing and Networking, Seattle, WA, USA, Nov. 2005.
-
(2005)
Proceedings of the IEEE/ACM International Conference on High Performance Computing and Networking
-
-
Gioiosa, R.1
Sancho, J.C.2
Jiang, S.3
Petrini, F.4
-
9
-
-
74549208634
-
-
Hewlett-Packard Development Company, L.P., Palo Alto, CA, USA. HP Integrity NonStop Computing. URL http://h20223.www2.hp.com/nonstopcomputing/ cache/76385-0-0-0-121.aspx.
-
Hewlett-Packard Development Company, L.P., Palo Alto, CA, USA. HP Integrity NonStop Computing. URL http://h20223.www2.hp.com/nonstopcomputing/ cache/76385-0-0-0-121.aspx.
-
-
-
-
10
-
-
40749160036
-
Overview of the IBM Blue Gene/P project
-
IBM Blue Gene team
-
IBM Blue Gene team. Overview of the IBM Blue Gene/P project. IBM Journal of Research and Development, 52(1/2): 199-220, 2008.
-
(2008)
IBM Journal of Research and Development
, vol.52
, Issue.1-2
, pp. 199-220
-
-
-
11
-
-
85013703470
-
-
Morgan Kaufmann Publishers, Burlington, MA, USA, July
-
I. Koren and C. M. Krishna. Fault-Tolerant Systems. Morgan Kaufmann Publishers, Burlington, MA, USA, July 2007.
-
(2007)
Fault-Tolerant Systems
-
-
Koren, I.1
Krishna, C.M.2
-
12
-
-
74549200447
-
-
National Center for Computational Sciences, Oak Ridge, TN, USA. Leadership science. URL http://www.nccs.gov/leadership-science.
-
National Center for Computational Sciences, Oak Ridge, TN, USA. Leadership science. URL http://www.nccs.gov/leadership-science.
-
-
-
-
15
-
-
36148941068
-
Understanding failures in petascale computers
-
Boston, MA, USA, June
-
B. Schroeder and G. A. Gibson. Understanding failures in petascale computers. In Journal of Physics: Proceedings of the Scientific Discovery through Advanced Computing Program Conference, volume 78, pages 2022-2032, Boston, MA, USA, June 2007.
-
(2007)
Journal of Physics: Proceedings of the Scientific Discovery through Advanced Computing Program Conference
, vol.78
, pp. 2022-2032
-
-
Schroeder, B.1
Gibson, G.A.2
-
16
-
-
0004085631
-
-
A K Peters, Ltd, Wellesley, MA, USA, Oct
-
D. P. Siewiorek and R. S. Swarz. Reliable Computer Systems: Design and Evaluation. A K Peters, Ltd., Wellesley, MA, USA, Oct. 1998.
-
(1998)
Reliable Computer Systems: Design and Evaluation
-
-
Siewiorek, D.P.1
Swarz, R.S.2
|