-
1
-
-
0027757472
-
MPI: A message passing interface
-
IEEE Computer Society Press, November
-
Message Passing Interface Forum. MPI: A Message Passing Interface. In Proc. of Supercomputing '93, pages 878-883. IEEE Computer Society Press, November 1993.
-
(1993)
Proc. of Supercomputing '93
, pp. 878-883
-
-
-
2
-
-
0004096191
-
A survey of rollback recovery protocols in message-passing system
-
Pittsburgh, PA: CMU-CS-96-181. Carnegie Mellon University, Oct
-
Elnozahy E N, Johnson D B, Wang Y M. A Survey of Rollback Recovery Protocols in Message-Passing System. Technical Report. Pittsburgh, PA: CMU-CS-96-181. Carnegie Mellon University, Oct 1996.
-
(1996)
Technical Report
-
-
Elnozahy, E.N.1
Johnson, D.B.2
Wang, Y.M.3
-
5
-
-
3042633020
-
Design and implementation of a low-overhead file checkpointing approach
-
Beijing, May
-
Dan Pei, Dongsheng Wang, Weimin Zheng, "Design and Implementation of a Low-Overhead File Checkpointing Approach". IEEE International Conference on High Performance Computing in the Asia-Pacific Region. Beijing, May, 2000, pp.439-441.
-
(2000)
IEEE International Conference on High Performance Computing in the Asia-pacific Region
, pp. 439-441
-
-
Pei, D.1
Wang, D.2
Zheng, W.3
-
6
-
-
0002067202
-
Libckpt: Transparent checkpointing under unix conference proceedings
-
New Orleans, LA, January
-
James S. Plank, Micah Beck, Gerry Kingsley, and Kai Li. Libckpt: Transparent Checkpointing under Unix Conference Proceedings, Usenix Winter 1995 Technical Conference, New Orleans, LA, January, 1995, pp. 213-223.
-
(1995)
Usenix Winter 1995 Technical Conference
, pp. 213-223
-
-
Plank, J.S.1
Beck, M.2
Kingsley, G.3
Li, K.4
-
8
-
-
0022020346
-
Distributed snapshots: Determining global states of distributed systems
-
Chandy K M, Lamport L. Distributed snapshots: Determining global states of distributed systems. ACM Trans on Computer Systems. 1985, 3(1): 63-75.
-
(1985)
ACM Trans on Computer Systems
, vol.3
, Issue.1
, pp. 63-75
-
-
Chandy, K.M.1
Lamport, L.2
-
10
-
-
8344283205
-
CRAK: Linux checkpoint / restart as a kernel module
-
Department of Computer Science, Columbia University
-
H. Zhong and J. Nieh. CRAK: Linux checkpoint / restart as a kernel module. Technical Report CUCS-014-01, Department of Computer Science, Columbia University, 2001.
-
(2001)
Technical Report
, vol.CUCS-014-01
-
-
Zhong, H.1
Nieh, J.2
-
14
-
-
20444444457
-
The LAM/MPI Checkpoint/Restart framework: System-initiated checkpointing
-
October
-
Sriram Sankaran, Jeffrey M. Squyres, Brian Barrett, Andrew Lumsdaine, Jason Duell, Paul Hargrove, and Eric Roman. The LAM/MPI Checkpoint/Restart Framework: System-Initiated Checkpointing. In LACSI Symposium, October 2003.
-
(2003)
LACSI Symposium
-
-
Sankaran, S.1
Squyres, J.M.2
Barrett, B.3
Lumsdaine, A.4
Duell, J.5
Hargrove, P.6
Roman, E.7
-
15
-
-
84884662651
-
MPICH-V: Toward a scalable fault tolerant MPI for volatile nodes
-
IEEE
-
George Bosilca, Aurelien Bouteiller, Franck Cappello, Samir Djilali, Gilles Fedak, Cedile Germain, Thomas Herault, Pierre Lemarinier, Oleg Lodygensky, Frederic Magniette, Vencent Neri, and Anton Selikhov. MPICH-V: Toward a scalable fault tolerant MPI for volatile nodes. In Proceedings of SC 2002. IEEE, 2002.
-
(2002)
Proceedings of SC 2002
-
-
Bosilca, G.1
Bouteiller, A.2
Cappello, F.3
Djilali, S.4
Fedak, G.5
Germain, C.6
Herault, T.7
Lemarinier, P.8
Lodygensky, O.9
Magniette, F.10
Neri, V.11
Selikhov, A.12
-
16
-
-
0034439137
-
MPI-FT: Portable fault tolerance scheme for MPI
-
Soulla Louca, Neophytos Neophytou, Arianos Lachanas, and Paraskevas Evrepidou. MPI-FT: Portable fault tolerance scheme for MPI. Parallel Processing Letters, 10(4):371-382, 2000.
-
(2000)
Parallel Processing Letters
, vol.10
, Issue.4
, pp. 371-382
-
-
Louca, S.1
Neophytou, N.2
Lachanas, A.3
Evrepidou, P.4
-
17
-
-
84940567900
-
Fault-tolerant MPI: Supporting dynamic applications in a dynamic world
-
Jack Dongarra, Peter Kacsuk, and Norbert Podhorszki, editors Recent Advances in Parallel Virutal Machine and Message Passing Interface . 7th European PVM/MPI Users' Group Meeting
-
Graham Fagg and Jack Dongarra. Fault-tolerant MPI: Supporting dynamic applications in a dynamic world. In Jack Dongarra, Peter Kacsuk, and Norbert Podhorszki, editors, Recent Advances in Parallel Virutal Machine and Message Passing Interface, number 1908 in Springer Lecture Notes in Computer Science, pages 346-353, 2000. 7th European PVM/MPI Users' Group Meeting.
-
(2000)
Springer Lecture Notes in Computer Science
, vol.1908
, pp. 346-353
-
-
Fagg, G.1
Dongarra, J.2
-
19
-
-
0042625580
-
Monitors, messages, and clusters: The p4 parallel programming system
-
Argonne National Laboratory
-
Ralph Butler and Ewing Lusk. "Monitors, Messages, and Clusters: the p4 Parallel Programming System". Technical report, Argonne National Laboratory, 1993.
-
(1993)
Technical Report
-
-
Butler, R.1
Lusk, E.2
|