메뉴 건너뛰기




Volumn , Issue , 2004, Pages

Implementation and Evaluation of a Scalable Application-Level Checkpoint-Recovery Scheme for MPI Programs

Author keywords

[No Author keywords available]

Indexed keywords

DISTRIBUTED COMPUTER SYSTEMS; FAULT TOLERANCE; HARDWARE;

EID: 84934312471     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/SC.2004.29     Document Type: Conference Paper
Times cited : (45)

References (25)
  • 2
  • 7
    • 84934278304 scopus 로고    scopus 로고
    • September 19
    • B. Carnes. The smg2000 benchmark code. Available at http://www.llnl.gov/asci/purple/benchmarks/limited/smg/, September 19 2001.
    • (2001) The smg2000 Benchmark Code
    • Carnes, B.1
  • 8
    • 0022020346 scopus 로고
    • Distributed snapshots: Determining global states of distributed systems
    • M. Chandy and L. Lamport. Distributed snapshots: Determining global states of distributed systems. ACM Transactions on Computing Systems, 3(1):63-75, 1985.
    • (1985) ACM Transactions on Computing Systems , vol.3 , Issue.1 , pp. 63-75
    • Chandy, M.1    Lamport, L.2
  • 9
    • 84934282404 scopus 로고    scopus 로고
    • Condor. http://www.cs.wisc.edu/condor/manual.
    • Condor
  • 10
    • 0026867749 scopus 로고
    • Manetho: Transparent rollback-recovery with low overhead, limited rollback and fast output
    • May
    • E. N. Elnozahy and W. Zwaenepoel. Manetho: Transparent rollback-recovery with low overhead, limited rollback and fast output. IEEE Transactions on Computers, 41(5), May 1992.
    • (1992) IEEE Transactions on Computers , vol.41 , Issue.5
    • Elnozahy, E.N.1    Zwaenepoel, W.2
  • 12
    • 84940567900 scopus 로고    scopus 로고
    • FT-MPI: Fault tolerant MPI, supporting dynamic applications in a dynamic world
    • Springer-Verilag
    • G. Fagg and J. J. Dongarra. FT-MPI: Fault tolerant MPI, supporting dynamic applications in a dynamic world. In EuroPVM/MPI User's Group Meeting, pages 346-353. Springer-Verilag, 2000.
    • (2000) EuroPVM/MPI User's Group Meeting , pp. 346-353
    • Fagg, G.1    Dongarra, J.J.2
  • 16
    • 0004215089 scopus 로고    scopus 로고
    • Morgan Kaufmann, San Francisco, California, first edition
    • N. Lynch. Distributed Algorithms. Morgan Kaufmann, San Francisco, California, first edition, 1996.
    • (1996) Distributed Algorithms
    • Lynch, N.1
  • 17
    • 0038335808 scopus 로고
    • Technical Report Technical Report, University of Tennessee, Dec
    • J. P. M. Beck and G. Kingsley. Compiler-Assisted Checkpointing. Technical Report Technical Report CS-94-269, University of Tennessee, Dec. 1994.
    • (1994) Compiler-assisted Checkpointing
    • Beck, J.P.M.1    Kingsley, G.2
  • 22
  • 24
    • 33645423303 scopus 로고    scopus 로고
    • A checkpoint and recovery system for the Pittsburgh Supercomputing Center Terascale Computing System
    • N. Stone, J. Kochmar, R. Reddy, J. R. Scott, J. Sommerfield, and C. Vizino. A checkpoint and recovery system for the Pittsburgh Supercomputing Center Terascale Computing System. In Supercomputing, 2001. Available at http://www.psc.edu/publications/tech\-reports/chkpt\-rcvry/checkpoint-recovery-1.0.html.
    • (2001) Supercomputing
    • Stone, N.1    Kochmar, J.2    Reddy, R.3    Scott, J.R.4    Sommerfield, J.5    Vizino, C.6
  • 25
    • 0141682129 scopus 로고    scopus 로고
    • Srs - A framework for developing malleable and migratable parallel software
    • June
    • S. Vadhiyar and J. Dongarra. Srs - a framework for developing malleable and migratable parallel software. Parallel Processing Letters, 13(2):291-312, June 2003.
    • (2003) Parallel Processing Letters , vol.13 , Issue.2 , pp. 291-312
    • Vadhiyar, S.1    Dongarra, J.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.