-
1
-
-
0038335808
-
Compiler-assisted checkpointing
-
Dept. of Computer Science, University of Tennessee
-
Micah Beck, James S. Plank, and Gerry Kingsley. Compiler-assisted checkpointing. Technical Report UT-CS-94-269, Dept. of Computer Science, University of Tennessee, 1994.
-
(1994)
Technical Report
, vol.UT-CS-94-269
-
-
Beck, M.1
Plank, J.S.2
Kingsley, G.3
-
2
-
-
0031570635
-
Application level fault tolerance in heterogeneous networks of workstations
-
Adam Beguelin, Erik Seligman, and Peter Stephan. Application level fault tolerance in heterogeneous networks of workstations. Journal of Parallel and Distributed Computing, 43(2):147-155, 1997.
-
(1997)
Journal of Parallel and Distributed Computing
, vol.43
, Issue.2
, pp. 147-155
-
-
Beguelin, A.1
Seligman, E.2
Stephan, P.3
-
3
-
-
0038040085
-
Automated application-level checkpointing of mpi programs
-
San Diego, CA, June
-
Greg Bronevetsky, Daniel Marques, Keshav Pingali, and Paul Stodghill. Automated application-level checkpointing of mpi programs. In Principles and Practices of Parallel Programming, San Diego, CA, June 2003.
-
(2003)
Principles and Practices of Parallel Programming
-
-
Bronevetsky, G.1
Marques, D.2
Pingali, K.3
Stodghill, P.4
-
4
-
-
1142268808
-
Collective operations in an application-level fault tolerant MPI system
-
San Francisco, CA, June 23-26
-
Greg Bronevetsky, Daniel Marques, Keshav Pingali, and Paul Stodghill. Collective operations in an application-level fault tolerant MPI system. In International Conference on Supercomputing (ICS) 2003, San Francisco, CA, June 23-26 2003.
-
(2003)
International Conference on Supercomputing (ICS) 2003
-
-
Bronevetsky, G.1
Marques, D.2
Pingali, K.3
Stodghill, P.4
-
5
-
-
12844286028
-
Application-level checkpointing for shared memory programs
-
Boston, MA, October 9-13
-
Greg Bronevetsky, Martin Schulz, Peter Szwed, Daniel Marques, and Keshav Pingali. Application-level checkpointing for shared memory programs. In Symposium on Application Support for Programming Languages and Operating Systems (ASPLOS) 2004, Boston, MA, October 9-13 2004.
-
(2004)
Symposium on Application Support for Programming Languages and Operating Systems (ASPLOS) 2004
-
-
Bronevetsky, G.1
Schulz, M.2
Szwed, P.3
Marques, D.4
Pingali, K.5
-
6
-
-
0022020346
-
Distributed snapshots: Determining global states of distributed systems
-
M. Chandy and L. Lamport. Distributed snapshots: Determining global states of distributed systems. ACM Transactions on Computing Systems, 3(1):63-75, 1985.
-
(1985)
ACM Transactions on Computing Systems
, vol.3
, Issue.1
, pp. 63-75
-
-
Chandy, M.1
Lamport, L.2
-
7
-
-
84944337657
-
CLIP: A checkpointing tool for message-passing parallel programs
-
Yuqun Chen, Kai Li, and James S. Plank. CLIP: A checkpointing tool for message-passing parallel programs. In The IEEE Supercomputing '97, 1997.
-
(1997)
The IEEE Supercomputing '97
-
-
Chen, Y.1
Li, K.2
Plank, J.S.3
-
9
-
-
84858915102
-
-
Condor, http://www.cs.wisc.edu/condor/manual.
-
-
-
-
11
-
-
12844258075
-
Finding your cronies: Static analysis for dynamic object colocation
-
Samuel Z. Guyer and Kathryn S. McKinley. Finding your cronies: Static analysis for dynamic object colocation. In OOPSLA 2004, 2004.
-
(2004)
OOPSLA 2004
-
-
Guyer, S.Z.1
McKinley, K.S.2
-
13
-
-
0004215089
-
-
Morgan Kaufmann, San Francisco, California, first edition
-
Nancy Lynch. Distributed Algorithms. Morgan Kaufmann, San Francisco, California, first edition, 1996.
-
(1996)
Distributed Algorithms
-
-
Lynch, N.1
-
14
-
-
0003912256
-
Checkpoint and migration of UNIX processes in the condor distributed processing system
-
University of Wisconsin-Madison
-
J. Basney M. Litzkow, T. Tannenbaum and M. Livny. Checkpoint and migration of UNIX processes in the condor distributed processing system. Technical Report 1346, University of Wisconsin-Madison, 1997.
-
(1997)
Technical Report
, vol.1346
-
-
Basney, J.1
Litzkow, M.2
Tannenbaum, T.3
Livny, M.4
-
15
-
-
84858925015
-
-
Naswebpage. http://www.nas.nasa.gov/Software/NPB/.
-
Naswebpage
-
-
-
17
-
-
0141599174
-
Libckpt: Transparent checkpointing under UNIX
-
Dept. of Computer Science, University of Tennessee
-
James S. Plank, Micah Beck, Gerry Kingsley, and Kai Li. Libckpt: Transparent checkpointing under UNIX. Technical Report UT-CS-94-242, Dept. of Computer Science, University of Tennessee, 1994.
-
(1994)
Technical Report
, vol.UT-CS-94-242
-
-
Plank, J.S.1
Beck, M.2
Kingsley, G.3
Li, K.4
-
18
-
-
0033077475
-
Memory exclusion: Optimizing the performance of checkpointing systems
-
James S. Plank, Yuqun Chen, Kai Li, Micah Beck, and Gerry Kingsley. Memory exclusion: optimizing the performance of checkpointing systems. Software Practice, and Experience, 29(2):125-142, 1999.
-
(1999)
Software Practice, and Experience
, vol.29
, Issue.2
, pp. 125-142
-
-
Plank, J.S.1
Chen, Y.2
Li, K.3
Beck, M.4
Kingsley, G.5
-
20
-
-
84934312471
-
Implementation and evaluation of a scalable application-level checkpoint-recovery scheme for mpi programs
-
Pittsburgh, PA, November 6-12
-
Martin Schulz, Greg Bronevetsky, Rohit Fernandes, Daniel Marques, Keshav Pingali, and Paul Stodghill. Implementation and evaluation of a scalable application-level checkpoint-recovery scheme for mpi programs. In Supercomputing 2004, Pittsburgh, PA, November 6-12 2004.
-
(2004)
Supercomputing 2004
-
-
Schulz, M.1
Bronevetsky, G.2
Fernandes, R.3
Marques, D.4
Pingali, K.5
Stodghill, P.6
-
21
-
-
84858919212
-
-
Specwebpage. http://www.spec.org/.
-
Specwebpage
-
-
-
23
-
-
33645423303
-
A checkpoint and recovery system for the Pittsburgh supercomputing center terascale computing system
-
Nathan Stone, John Kochmar, Raghurama Reddy, J. Ray Scott, Jason Sommerfield, and Chad Vizino. A checkpoint and recovery system for the Pittsburgh Supercomputing Center Terascale Computing System. In Supercomputing, 2001. Available at http://www.psc.edu/publications/tech_reports/chkpt.rcvry/checkpoint-recovery-1. 0.html.
-
(2001)
Supercomputing
-
-
Stone, N.1
Kochmar, J.2
Reddy, R.3
Scott, J.R.4
Sommerfield, J.5
Vizino, C.6
-
24
-
-
0141682129
-
SRS - A framework for developing malleable and migratable parallel software
-
June
-
S. Vadhiyar and J. Dongarra. SRS - a framework for developing malleable and migratable parallel software. Parallel Processing Letters, 13(2):291-312, June 2003.
-
(2003)
Parallel Processing Letters
, vol.13
, Issue.2
, pp. 291-312
-
-
Vadhiyar, S.1
Dongarra, J.2
|