메뉴 건너뛰기




Volumn , Issue , 2007, Pages

A fault tolerance protocol with fast fault recovery

Author keywords

[No Author keywords available]

Indexed keywords

ADAPTIVE SYSTEMS; COMPUTATIONAL METHODS; COMPUTER SYSTEM RECOVERY; LARGE SCALE SYSTEMS; MESSAGE PASSING; PROGRAM PROCESSORS;

EID: 34548782109     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/IPDPS.2007.370310     Document Type: Conference Paper
Times cited : (45)

References (26)
  • 1
    • 33646420251 scopus 로고    scopus 로고
    • Starfish: Fault-tolerant dynamic MPI programs on clusters of workstations
    • July
    • A. Agbaria and R. Friedman. Starfish: Fault-tolerant dynamic MPI programs on clusters of workstations. Cluster Computing, 6(3):227-236, July 2003.
    • (2003) Cluster Computing , vol.6 , Issue.3 , pp. 227-236
    • Agbaria, A.1    Friedman, R.2
  • 5
    • 79961061539 scopus 로고    scopus 로고
    • MPICH-V2: A fault tolerant MPI for volatile nodes based on the pessimistic sender based message logging programming via processor virtualization
    • November
    • A. Bouteiller, F. Cappello, T. Hérault, G. Krawezik, P. Lemarinier, and F. Magniette. MPICH-V2: A fault tolerant MPI for volatile nodes based on the pessimistic sender based message logging programming via processor virtualization. In Proceedings of SC'03, November 2003.
    • (2003) Proceedings of SC'03
    • Bouteiller, A.1    Cappello, F.2    Hérault, T.3    Krawezik, G.4    Lemarinier, P.5    Magniette, F.6
  • 6
    • 33746310123 scopus 로고    scopus 로고
    • Impact of event logger on causal message logging protocols for fault tolerant mpi
    • A. Bouteiller, B. Collin, T. Herault, P. Lemarinier, and F. Cappello. Impact of event logger on causal message logging protocols for fault tolerant mpi. In IPDPS'05, page 97, 2005.
    • (2005) IPDPS'05 , pp. 97
    • Bouteiller, A.1    Collin, B.2    Herault, T.3    Lemarinier, P.4    Cappello, F.5
  • 9
    • 12444281734 scopus 로고    scopus 로고
    • A fault tolerant protocol for massively parallel machines
    • Santa Fe, NM, April, IEEE Press
    • S. Chakravorty and L. V. Kalé. A fault tolerant protocol for massively parallel machines. In FTPDS Workshop at IPDPS'2004, Santa Fe, NM, April 2004. IEEE Press.
    • (2004) FTPDS Workshop at IPDPS'2004
    • Chakravorty, S.1    Kalé, L.V.2
  • 10
    • 84900298636 scopus 로고    scopus 로고
    • Y. Chen, J. S. Plank, and K. Li. Clip: A checkpointing tool for message-passing parallel programs. In Proc. of the 1997 ACM/IEEE conference on Supercomputing, pages 1-11, 1997.
    • Y. Chen, J. S. Plank, and K. Li. Clip: A checkpointing tool for message-passing parallel programs. In Proc. of the 1997 ACM/IEEE conference on Supercomputing, pages 1-11, 1997.
  • 11
    • 34548773972 scopus 로고    scopus 로고
    • W. E. Cohen, R. K. Gaede, and W. D. Garrett. Interconnection network independent characterization of communication traffic in the nas benchmarks via processor performance monitoring hardware
    • W. E. Cohen, R. K. Gaede, and W. D. Garrett. Interconnection network independent characterization of communication traffic in the nas benchmarks via processor performance monitoring hardware.
  • 12
    • 0004096191 scopus 로고    scopus 로고
    • A survey of rollback-recovery protocols in message passing systems
    • Technical Report CMU-CS-96-181, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA, Oct
    • E. N. Elnozahy, L. Alvisi, Y. M. Wang, and D. B. Johnson. A survey of rollback-recovery protocols in message passing systems. Technical Report CMU-CS-96-181, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA, Oct. 1996.
    • (1996)
    • Elnozahy, E.N.1    Alvisi, L.2    Wang, Y.M.3    Johnson, D.B.4
  • 13
    • 0026867749 scopus 로고
    • Manetho: Transparent rollback-recovery with low overhead, limited rollback, and fast output commit
    • E. N. Elnozahy and W. Zwaenepoel. Manetho: Transparent rollback-recovery with low overhead, limited rollback, and fast output commit. IEEE Transactions on Computers, 41(5):526-531, 1992.
    • (1992) IEEE Transactions on Computers , vol.41 , Issue.5 , pp. 526-531
    • Elnozahy, E.N.1    Zwaenepoel, W.2
  • 19
    • 0002479236 scopus 로고    scopus 로고
    • Charm++: Parallel programming with message-driven objects
    • G. V. Wilson and P. Lu, editors, MIT Press
    • L. V. Kalé and S. Krishnan. Charm++: Parallel programming with message-driven objects. In G. V. Wilson and P. Lu, editors, Parallel Programming using C++, pages 175-213. MIT Press, 1996.
    • (1996) Parallel Programming using C , pp. 175-213
    • Kalé, L.V.1    Krishnan, S.2
  • 20
    • 35048847069 scopus 로고    scopus 로고
    • A lightweight message logging scheme for fault tolerant mpi
    • I. Lee, H. Y. Yeom, T. Park, and H.-W. Park. A lightweight message logging scheme for fault tolerant mpi. In PPAM, pages 397-404, 2003.
    • (2003) PPAM , pp. 397-404
    • Lee, I.1    Yeom, H.Y.2    Park, T.3    Park, H.-W.4
  • 21
    • 85114705648 scopus 로고    scopus 로고
    • NAMD: Biomolecular simulation on thousands of processors
    • Baltimore, MD, September
    • J. C. Phillips, G. Zheng, S. Kumar, and L. V. Kalé. NAMD: Biomolecular simulation on thousands of processors. In Proceedings of SC 2002, Baltimore, MD, September 2002.
    • (2002) Proceedings of SC 2002
    • Phillips, J.C.1    Zheng, G.2    Kumar, S.3    Kalé, L.V.4
  • 22
    • 84976815497 scopus 로고
    • Fail-stop processors: An approach to designing fault-tolerant computing systems
    • R. D. Schlichting and F. B. Schneider. Fail-stop processors: An approach to designing fault-tolerant computing systems. ACM Transactions on Computer Systems, 1(3):222-238, 1983.
    • (1983) ACM Transactions on Computer Systems , vol.1 , Issue.3 , pp. 222-238
    • Schlichting, R.D.1    Schneider, F.B.2
  • 26


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.