-
1
-
-
0033359224
-
-
A. Agbaria, R. Friedman, Starfish: fault-tolerant dynamic MPI programs on clusters of workstations, in: HPDC '99: Proceedings of the Eighth IEEE International Symposium on High Performance Distributed Computing, Washington, DC, USA, IEEE Computer Society, Silver Spring, MD, 1999, p. 31.
-
A. Agbaria, R. Friedman, Starfish: fault-tolerant dynamic MPI programs on clusters of workstations, in: HPDC '99: Proceedings of the Eighth IEEE International Symposium on High Performance Distributed Computing, Washington, DC, USA, IEEE Computer Society, Silver Spring, MD, 1999, p. 31.
-
-
-
-
2
-
-
77954003885
-
-
R. Batchu, A. Skjellum, Z. Cui, M. Beddhu, J.P. Neelamegam, Y. Dandass, M. Apte, MPI/FTTM: architecture and taxonomies for fault-tolerant, message-passing middleware for performance-portable parallel computing, in: CCGRID '01: Proceedings of the 1st International Symposium on Cluster Computing and the Grid, Washington, DC, USA, IEEE Computer Society, Silver Spring, MD, 2001, p. 26.
-
R. Batchu, A. Skjellum, Z. Cui, M. Beddhu, J.P. Neelamegam, Y. Dandass, M. Apte, MPI/FTTM: architecture and taxonomies for fault-tolerant, message-passing middleware for performance-portable parallel computing, in: CCGRID '01: Proceedings of the 1st International Symposium on Cluster Computing and the Grid, Washington, DC, USA, IEEE Computer Society, Silver Spring, MD, 2001, p. 26.
-
-
-
-
3
-
-
41449106385
-
The mollification method and the numerical solution of ill-posed problems (Diego A. Murio)
-
Beck J.V. The mollification method and the numerical solution of ill-posed problems (Diego A. Murio). SIAM Rev. 36 3 (1994) 502-503
-
(1994)
SIAM Rev.
, vol.36
, Issue.3
, pp. 502-503
-
-
Beck, J.V.1
-
4
-
-
0032592492
-
HARNESS: a next generation distributed virtual machine
-
Beck M., Dongarra J.J., Fagg G.E., Al Geist G., Gray P., Kohl J., Migliardi M., Moore K., Moore T., Papadopoulous P., Scott S.L., and Sunderam V. HARNESS: a next generation distributed virtual machine. Future Generation Computer Systems 15 5-6 (1999) 571-582
-
(1999)
Future Generation Computer Systems
, vol.15
, Issue.5-6
, pp. 571-582
-
-
Beck, M.1
Dongarra, J.J.2
Fagg, G.E.3
Al Geist, G.4
Gray, P.5
Kohl, J.6
Migliardi, M.7
Moore, K.8
Moore, T.9
Papadopoulous, P.10
Scott, S.L.11
Sunderam, V.12
-
5
-
-
41449112246
-
-
G. Bosilca, A. Bouteiller, F. Cappello, S. Djilali, G. Fedak, C. Germain, T. Herault, P. Lemarinier, O. Lodygensky, F. Magniette, V. Neri, A. Selikhov, MPICH-V: toward a scalable fault tolerant MPI for volatile nodes, in: Supercomputing '02: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing, Los Alamitos, CA, USA, IEEE Computer Society Press, Silver Spring, MD, 2002, pp. 1-18.
-
G. Bosilca, A. Bouteiller, F. Cappello, S. Djilali, G. Fedak, C. Germain, T. Herault, P. Lemarinier, O. Lodygensky, F. Magniette, V. Neri, A. Selikhov, MPICH-V: toward a scalable fault tolerant MPI for volatile nodes, in: Supercomputing '02: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing, Los Alamitos, CA, USA, IEEE Computer Society Press, Silver Spring, MD, 2002, pp. 1-18.
-
-
-
-
6
-
-
85140868634
-
-
G. Bosilca, Z. Chen, J. Dongarra, J. Langou, Recovery patterns for iterative methods in a parallel unstable environment, SIAM J. Sci. Comput., May, 2007.
-
G. Bosilca, Z. Chen, J. Dongarra, J. Langou, Recovery patterns for iterative methods in a parallel unstable environment, SIAM J. Sci. Comput., May, 2007.
-
-
-
-
7
-
-
31844451082
-
-
Z. Chen, G.E. Fagg, E. Gabriel, J. Langou, T. Angskun, G. Bosilca, J. Dongarra, Fault tolerant high performance computing by a coding approach, in: PPoPP '05: Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, New York, NY, USA, ACM Press, New York, 2005, pp. 213-223.
-
Z. Chen, G.E. Fagg, E. Gabriel, J. Langou, T. Angskun, G. Bosilca, J. Dongarra, Fault tolerant high performance computing by a coding approach, in: PPoPP '05: Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, New York, NY, USA, ACM Press, New York, 2005, pp. 213-223.
-
-
-
-
8
-
-
41449103072
-
-
J.J. Dongarra, Z. Chen, G. Bosilca, J. Langou, Disaster survival guide in petascale computing: an algorithmic approach, in: Petascale Computing: Algorithms and Applications, Chapman & Hall, CRC Press, London, Boca Raton, FL, 2007.
-
J.J. Dongarra, Z. Chen, G. Bosilca, J. Langou, Disaster survival guide in petascale computing: an algorithmic approach, in: Petascale Computing: Algorithms and Applications, Chapman & Hall, CRC Press, London, Boca Raton, FL, 2007.
-
-
-
-
9
-
-
41449113981
-
-
J. Duell, The design and implementation of Berkeley Lab's linux checkpoint/restart, Berkeley Lab Technical Report (Publication LBNL-54941 〈http://www.osti.gov/servlets/purl/891617-2L2UJc/〉, September 25, 2006.
-
J. Duell, The design and implementation of Berkeley Lab's linux checkpoint/restart, Berkeley Lab Technical Report (Publication LBNL-54941 〈http://www.osti.gov/servlets/purl/891617-2L2UJc/〉, September 25, 2006.
-
-
-
-
11
-
-
0004767479
-
Asymptotic analysis on large timescales for singular perturbations of hyperbolic type
-
Eckhaus W., and Garbey M. Asymptotic analysis on large timescales for singular perturbations of hyperbolic type. SIAM J. Math. Anal. 21 4 (1990) 867-883
-
(1990)
SIAM J. Math. Anal.
, vol.21
, Issue.4
, pp. 867-883
-
-
Eckhaus, W.1
Garbey, M.2
-
12
-
-
25144486687
-
-
C. Engelmann, A. Geist, Super-scalable algorithms for computing on 100,000 processors, in: V.S. Sunderam, G. Dick van Albada, P.M.A. Sloot, J. Dongarra (Eds.), Proceedings of the International Conference on Computational Science (ICCS) 2005, Part I, Lecture Notes in Computer Science, vol. 3514, Springer, Berlin, 2005, pp. 313-321.
-
C. Engelmann, A. Geist, Super-scalable algorithms for computing on 100,000 processors, in: V.S. Sunderam, G. Dick van Albada, P.M.A. Sloot, J. Dongarra (Eds.), Proceedings of the International Conference on Computational Science (ICCS) 2005, Part I, Lecture Notes in Computer Science, vol. 3514, Springer, Berlin, 2005, pp. 313-321.
-
-
-
-
13
-
-
27844508605
-
Process fault-tolerance: semantics, design and applications for high performance computing
-
Fagg G.E., Gabriel E., Chen Z., Angskun T., Bosilca G., Pjesivac-Grbovic J., and Dongarra J.J. Process fault-tolerance: semantics, design and applications for high performance computing. Internat. J. High Performance Comput. Appl. 19 (2005) 465-477
-
(2005)
Internat. J. High Performance Comput. Appl.
, vol.19
, pp. 465-477
-
-
Fagg, G.E.1
Gabriel, E.2
Chen, Z.3
Angskun, T.4
Bosilca, G.5
Pjesivac-Grbovic, J.6
Dongarra, J.J.7
-
14
-
-
34548773868
-
-
E. Gabriel, S. Huang, Runtime optimization of application level communication patterns, in: Proceedings of the 21st IEEE International Parallel and Distributed Processing Symposium, 12th International Workshop on High-Level Parallel Programming Models and Supportive Environments, Long Beach, CA, March 26, IEEE Computer Society, Silver Spring, MD, 2007, p. 185.
-
E. Gabriel, S. Huang, Runtime optimization of application level communication patterns, in: Proceedings of the 21st IEEE International Parallel and Distributed Processing Symposium, 12th International Workshop on High-Level Parallel Programming Models and Supportive Environments, Long Beach, CA, March 26, IEEE Computer Society, Silver Spring, MD, 2007, p. 185.
-
-
-
-
15
-
-
41449112852
-
-
M. Garbey, H. Ltaief, Fault tolerant domain decomposition for parabolic problems, New York University, Lecture Notes in Computational Science and Engineering, Springer, Berlin, January 2005, pp. 565-572.
-
M. Garbey, H. Ltaief, Fault tolerant domain decomposition for parabolic problems, New York University, Lecture Notes in Computational Science and Engineering, Springer, Berlin, January 2005, pp. 565-572.
-
-
-
-
16
-
-
41449110416
-
A least square extrapolation method for the a priori error estimate of CFD and heat transfer problem
-
Garbey M., and Picard C. A least square extrapolation method for the a priori error estimate of CFD and heat transfer problem. Structural Dynamic Eurodyn (2005) 871-876
-
(2005)
Structural Dynamic Eurodyn
, pp. 871-876
-
-
Garbey, M.1
Picard, C.2
-
17
-
-
0030243005
-
A high-performance, portable implementation of the MPI message passing interface standard
-
Gropp W., Lusk E., Doss N., and Skjellum A. A high-performance, portable implementation of the MPI message passing interface standard. Parallel Comput. 22 6 (1996) 789-828
-
(1996)
Parallel Comput.
, vol.22
, Issue.6
, pp. 789-828
-
-
Gropp, W.1
Lusk, E.2
Doss, N.3
Skjellum, A.4
-
18
-
-
41449087256
-
-
P. Hough, V. Howle, Fault tolerance in large scale scientific computing, in: M.A. Heroux, P. Raghavan, H.D. Simon (Eds.), Parallel Processing for Scientific Computing, SIAM Press, Philadelphia, PA, 2006.
-
P. Hough, V. Howle, Fault tolerance in large scale scientific computing, in: M.A. Heroux, P. Raghavan, H.D. Simon (Eds.), Parallel Processing for Scientific Computing, SIAM Press, Philadelphia, PA, 2006.
-
-
-
-
19
-
-
0035007397
-
-
K. Ingols, I. Keidar, Availability study of dynamic voting algorithms, in: Proceedings of the 21st IEEE International Conference on Distributed Computing Systems (ICDCS), 2001, pp. 247-254.
-
K. Ingols, I. Keidar, Availability study of dynamic voting algorithms, in: Proceedings of the 21st IEEE International Conference on Distributed Computing Systems (ICDCS), 2001, pp. 247-254.
-
-
-
-
20
-
-
41449096456
-
-
MPI Forum, Special Issue: MPI2: A Message-Passing Interface Standard. Internat. J. Supercomputer Appl. and High Performance Comput. 12(1-2) (1998) 1-299.
-
MPI Forum, Special Issue: MPI2: A Message-Passing Interface Standard. Internat. J. Supercomputer Appl. and High Performance Comput. 12(1-2) (1998) 1-299.
-
-
-
-
21
-
-
41449111112
-
-
I.R. Philp, Software failures and the road to a petaflop machine, in: First Workshop on High Performance Computing Reliability Issues (HPCRI), Los Alamos National Laboratory, February 2005.
-
I.R. Philp, Software failures and the road to a petaflop machine, in: First Workshop on High Performance Computing Reliability Issues (HPCRI), Los Alamos National Laboratory, February 2005.
-
-
-
-
22
-
-
27844542760
-
The LAM/MPI checkpoint/restart framework: system-initiated checkpointing
-
Sankaran S., Squyres J.M., Barrett B., Lumsdaine A., Duell J., Hargrove P., and Roman E. The LAM/MPI checkpoint/restart framework: system-initiated checkpointing. Internat. J. High Performance Comput. Appl. 19 4 (2005) 479-493
-
(2005)
Internat. J. High Performance Comput. Appl.
, vol.19
, Issue.4
, pp. 479-493
-
-
Sankaran, S.1
Squyres, J.M.2
Barrett, B.3
Lumsdaine, A.4
Duell, J.5
Hargrove, P.6
Roman, E.7
-
23
-
-
22144498897
-
Latency tolerance through parallelization of time in scientific applications
-
Srinivasana A., and Chandra N. Latency tolerance through parallelization of time in scientific applications. Parallel Comput. 31 7 (2005) 777-796
-
(2005)
Parallel Comput.
, vol.31
, Issue.7
, pp. 777-796
-
-
Srinivasana, A.1
Chandra, N.2
-
24
-
-
0029713612
-
-
G. Stellner, CoCheck: checkpointing and process migration for MPI, in: IPPS '96: Proceedings of the 10th International Parallel Processing Symposium, Washington, DC, USA, IEEE Computer Society, Silver Spring, MD, 1996, pp. 526-531.
-
G. Stellner, CoCheck: checkpointing and process migration for MPI, in: IPPS '96: Proceedings of the 10th International Parallel Processing Symposium, Washington, DC, USA, IEEE Computer Society, Silver Spring, MD, 1996, pp. 526-531.
-
-
-
-
26
-
-
85143037324
-
-
The MPI Forum, MPI: a message passing interface, in: Supercomputing '93: Proceedings of the 1993 ACM/IEEE Conference on Supercomputing, New York, NY, USA, ACM Press, New York, 1993, pp. 878-883.
-
The MPI Forum, MPI: a message passing interface, in: Supercomputing '93: Proceedings of the 1993 ACM/IEEE Conference on Supercomputing, New York, NY, USA, ACM Press, New York, 1993, pp. 878-883.
-
-
-
-
27
-
-
34548768671
-
-
C. Wang, F. Mueller, C. Engelmann, S.L. Scott, A job pause service under LAM / MPI + BLCR for transparent fault tolerance, in: Proceedings of the 21st International Parallel and Distributed Processing Symposium (IPDPS), Long Beach, CA, USA, 2007
-
C. Wang, F. Mueller, C. Engelmann, S.L. Scott, A job pause service under LAM / MPI + BLCR for transparent fault tolerance, in: Proceedings of the 21st International Parallel and Distributed Processing Symposium (IPDPS), Long Beach, CA, USA, 2007
-
-
-
-
28
-
-
0028994273
-
-
Y.-M. Wang, Y. Huang, K.-P. Vo, P.-Y. Chung, C. Kintala, Checkpointing and its applications, in: Proceedings of the International Symposium on. Fault-Tolerant Computing, June 1995, pp. 22-31.
-
Y.-M. Wang, Y. Huang, K.-P. Vo, P.-Y. Chung, C. Kintala, Checkpointing and its applications, in: Proceedings of the International Symposium on. Fault-Tolerant Computing, June 1995, pp. 22-31.
-
-
-
-
29
-
-
41449097278
-
-
Y. Zhuang, X.-H. Sun, Stable, globally non-iterative, non-overlapping domain decomposition parallel solvers for parabolic problems, in: Supercomputing '01: Proceedings of the 2001 ACM/IEEE Conference on Supercomputing (CDROM), New York, NY, USA, ACM Press, New York, 2001, p. 19.
-
Y. Zhuang, X.-H. Sun, Stable, globally non-iterative, non-overlapping domain decomposition parallel solvers for parabolic problems, in: Supercomputing '01: Proceedings of the 2001 ACM/IEEE Conference on Supercomputing (CDROM), New York, NY, USA, ACM Press, New York, 2001, p. 19.
-
-
-
|