메뉴 건너뛰기




Volumn , Issue , 2012, Pages 1216-1227

HydEE: Failure containment without event logging for large scale send-deterministic MPI applications

Author keywords

failure containment; fault tolerance; High performance computing; MPI; send determinism

Indexed keywords

CHECK POINTING; CONCURRENT FAILURES; DIFFERENT PROTOCOLS; EVENT LOGGING; EXECUTION MODEL; EXPERIMENTAL EVALUATION; FAULT TOLERANT PROTOCOLS; HIGH PERFORMANCE COMPUTING; HYBRID PROTOCOLS; MEAN TIME BETWEEN FAILURES; MESSAGE LOGGING; MPI; MPI APPLICATIONS; ROLLBACK RECOVERY; SEND-DETERMINISM;

EID: 84866852589     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/IPDPS.2012.111     Document Type: Conference Paper
Times cited : (28)

References (32)
  • 1
    • 0032000230 scopus 로고    scopus 로고
    • Message Logging: Pessimistic, Optimistic, Causal, and Optimal
    • L. Alvisi and K. Marzullo. Message Logging: Pessimistic, Optimistic, Causal, and Optimal. IEEE Transactions on Software Engineering, 24(2):149-159, 1998.
    • (1998) IEEE Transactions on Software Engineering , vol.24 , Issue.2 , pp. 149-159
    • Alvisi, L.1    Marzullo, K.2
  • 5
    • 0024123530 scopus 로고
    • Independent Checkpointing and Concurrent Rollback for Recovery in Distributed Systems-an Optimistic Approach
    • Columbus, OH , USA
    • B. Bhargava and L. Shu-Renn. Independent Checkpointing and Concurrent Rollback for Recovery in Distributed Systems-an Optimistic Approach. In Seventh Symposium on Reliable Distributed Systems, pages 3-12, Columbus, OH , USA, 1988.
    • (1988) Seventh Symposium on Reliable Distributed Systems , pp. 3-12
    • Bhargava, B.1    Shu-Renn, L.2
  • 8
    • 80052306159 scopus 로고    scopus 로고
    • Correlated Set Coordination in Fault Tolerant Message Logging Protocols
    • Euro-Par 2011, Springer Berlin / Heidelberg
    • A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra. Correlated Set Coordination in Fault Tolerant Message Logging Protocols. In Euro-Par 2011, volume 6853 of Lecture Notes in Computer Science, pages 51-64. Springer Berlin / Heidelberg, 2011.
    • (2011) Lecture Notes in Computer Science , vol.6853 , pp. 51-64
    • Bouteiller, A.1    Herault, T.2    Bosilca, G.3    Dongarra, J.4
  • 11
    • 0022020346 scopus 로고
    • Distributed Snapshots: Determining Global States of Distributed Systems
    • K. Chandy and L. Lamport. Distributed Snapshots: Determining Global States of Distributed Systems. ACM Transactions on Computer Systems, 3(1):63-75, 1985.
    • (1985) ACM Transactions on Computer Systems , vol.3 , Issue.1 , pp. 63-75
    • Chandy, K.1    Lamport, L.2
  • 14
    • 0042078549 scopus 로고    scopus 로고
    • A Survey of Rollback-Recovery Protocols in Message-Passing Systems
    • E. N. M. Elnozahy, L. Alvisi, Y.-M. Wang, and D. B. Johnson. A Survey of Rollback-Recovery Protocols in Message-Passing Systems. ACM Computing Surveys, 34(3):375-408, 2002.
    • (2002) ACM Computing Surveys , vol.34 , Issue.3 , pp. 375-408
    • Elnozahy, E.N.M.1    Alvisi, L.2    Wang, Y.-M.3    Johnson, D.B.4
  • 21
    • 0017996760 scopus 로고
    • Clocks, and the Ordering of Events in a Distributed System
    • L. Lamport. Time, Clocks, and the Ordering of Events in a Distributed System. Communications of the ACM, 21(7):558-565, 1978.
    • (1978) Communications of the ACM , vol.21 , Issue.7 , pp. 558-565
    • Lamport Time, L.1
  • 28
    • 80052380100 scopus 로고    scopus 로고
    • On the Use of Cluster-Based Partial Message Logging to Improve Fault Tolerance for MPI HPC Applications
    • T. Ropars, A. Guermouche, B. Uçar, E. Meneses, L. V. Kalé, and F. Cappello. On the Use of Cluster-Based Partial Message Logging to Improve Fault Tolerance for MPI HPC Applications. In Euro-Par 2011, pages 567-578, 2011.
    • (2011) Euro-Par 2011 , pp. 567-578
    • Ropars, T.1    Guermouche, A.2    Uçar, B.3    Meneses, E.4    Kalé, L.V.5    Cappello, F.6
  • 29
    • 80054900610 scopus 로고    scopus 로고
    • Active optimistic and distributed message logging for message-passing applications
    • T. Ropars and C. Morin. Active optimistic and distributed message logging for message-passing applications. Concurrency and Computation: Practice and Experience, 23(17):2167-2178, 2011.
    • (2011) Concurrency and Computation: Practice and Experience , vol.23 , Issue.17 , pp. 2167-2178
    • Ropars, T.1    Morin, C.2
  • 32


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.