메뉴 건너뛰기




Volumn , Issue , 2011, Pages 1533-1540

Evaluation of simple causal message logging for large-scale fault tolerant HPC systems

Author keywords

Causal message logging; Migratable objects; Parallel applications; Pessimistic message logging

Indexed keywords

CHECKPOINT/RESTART; COLLECTIVE COMMUNICATION OPERATIONS; FAULT TOLERANCE TECHNIQUES; FAULT-TOLERANT; HIGH PRODUCTIVITY; LARGE MACHINES; LOW LATENCY; MEAN TIME BETWEEN FAILURES; MESSAGE LOGGING; MESSAGE LOGGING PROTOCOLS; MIGRATABLE OBJECTS; NAS PARALLEL BENCHMARKS; PARALLEL APPLICATION; PERFORMANCE PENALTIES; PETA-SCALE COMPUTING; ROLL BACK;

EID: 83455181657     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/IPDPS.2011.307     Document Type: Conference Paper
Times cited : (15)

References (20)
  • 1
    • 84855235304 scopus 로고    scopus 로고
    • "Top500 supercomputing sites," http://top500.org.
  • 6
    • 0022112420 scopus 로고
    • Optimistic recovery in distributed systems
    • R. Strom and S. Yemini, "Optimistic recovery in distributed systems," ACM Trans. Comput. Syst., vol. 3, no. 3, pp. 204-226, 1985.
    • (1985) ACM Trans. Comput. Syst. , vol.3 , Issue.3 , pp. 204-226
    • Strom, R.1    Yemini, S.2
  • 8
    • 0026867749 scopus 로고
    • Manetho: Transparent roll back-recovery with low overhead, limited rollback, and fast output commit
    • E. N. Elnozahy and W. Zwaenepoel, "Manetho: Transparent roll back-recovery with low overhead, limited rollback, and fast output commit," IEEE Trans. Comput., vol. 41, no. 5, pp. 526-531, 1992.
    • (1992) IEEE Trans. Comput. , vol.41 , Issue.5 , pp. 526-531
    • Elnozahy, E.N.1    Zwaenepoel, W.2
  • 9
    • 0029237761 scopus 로고
    • Message logging: Pessimistic, optimistic, and causal
    • International Conference on, Vol. 0
    • L. Alvisi and K. Marzullo, "Message logging: pessimistic, optimistic, and causal," Distributed Computing Systems, International Conference on, vol. 0, p. 0229, 1995.
    • (1995) Distributed Computing Systems , pp. 0229
    • Alvisi, L.1    Marzullo, K.2
  • 10
    • 33746310123 scopus 로고    scopus 로고
    • Impact of event logger on causal message logging protocols for fault tolerant mpi
    • A. Bouteiller, B. Collin, T. Herault, P. Lemarinier, and F. Cappello, "Impact of event logger on causal message logging protocols for fault tolerant mpi," in IPDPS'05, 2005, p. 97.
    • (2005) IPDPS'05 , pp. 97
    • Bouteiller, A.1    Collin, B.2    Herault, T.3    Lemarinier, P.4    Cappello, F.5
  • 12
    • 0033721199 scopus 로고    scopus 로고
    • The cost of recovery in message logging protocols
    • S. Rao, L. Alvisi, and H. M. Vin, "The cost of recovery in message logging protocols," IEEE Trans. Knowl. Data Eng., vol. 12, no. 2, pp. 160-173, 2000.
    • (2000) IEEE Trans. Knowl. Data Eng. , vol.12 , Issue.2 , pp. 160-173
    • Rao, S.1    Alvisi, L.2    Vin, H.M.3
  • 13
    • 20444435911 scopus 로고    scopus 로고
    • Improved message logging versus improved coordinated checkpointing for fault tolerant mpi
    • IEEE International Conference on, Vol. 0
    • P. Lemarinier, A. Bouteiller, T. Herault, G. Krawezik, and F. Cappello, "Improved message logging versus improved coordinated checkpointing for fault tolerant mpi," Cluster Computing, IEEE International Conference on, vol. 0, pp. 115-124, 2004.
    • (2004) Cluster Computing , pp. 115-124
    • Lemarinier, P.1    Bouteiller, A.2    Herault, T.3    Krawezik, G.4    Cappello, F.5
  • 14
    • 0022020346 scopus 로고
    • Distributed snapshots: Determining global states of distributed systems
    • Feb.
    • K. M. Chandy and L. Lamport, "Distributed snapshots: Determining global states of distributed systems," ACM Transactions on Computer Systems, Feb. 1985.
    • (1985) ACM Transactions on Computer Systems
    • Chandy, K.M.1    Lamport, L.2
  • 15
    • 72149132074 scopus 로고    scopus 로고
    • Reasons for a pessimistic or optimistic message logging protocol in mpi uncoordinated failure, recovery
    • A. Bouteiller, T. Ropars, G. Bosilca, C. Morin, and J. Dongarra, "Reasons for a pessimistic or optimistic message logging protocol in mpi uncoordinated failure, recovery," in CLUSTER, 2009, pp. 1-9.
    • (2009) CLUSTER , pp. 1-9
    • Bouteiller, A.1    Ropars, T.2    Bosilca, G.3    Morin, C.4    Dongarra, J.5
  • 17
    • 84976817516 scopus 로고
    • CHARM++: A portable concurrent object oriented system based on C++
    • A. Paepcke, Ed. ACM Press, September
    • L. Kalé and S. Krishnan, "CHARM++: A Portable Concurrent Object Oriented System Based on C++," in Proceedings of OOPSLA'93, A. Paepcke, Ed. ACM Press, September 1993, pp. 91-108.
    • (1993) Proceedings of OOPSLA'93 , pp. 91-108
    • Kalé, L.1    Krishnan, S.2
  • 19
    • 79961061539 scopus 로고    scopus 로고
    • MPICH-V2: A fault tolerant MPI for volatile nodes based on the pessimistic sender based message logging programming via processor virtualization
    • November
    • A. Bouteiller, F. Cappello, T. Hérault, G. Krawezik, P. Lemarinier, and F. Magniette, "MPICH-V2: A fault tolerant MPI for volatile nodes based on the pessimistic sender based message logging programming via processor virtualization," in Proceedings of SC'03, November 2003.
    • (2003) Proceedings of SC'03
    • Bouteiller, A.1    Cappello, F.2    Hérault, T.3    Krawezik, G.4    Lemarinier, P.5    Magniette, F.6


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.