메뉴 건너뛰기




Volumn 20, Issue 3, 2006, Pages 319-333

MPICH-V project: A multiprotocol automatic fault-tolerant MPI

Author keywords

Coordinated checkpoint; Fault tolerant MPI; Message logging; Performance evaluation

Indexed keywords

COMPUTER SELECTION AND EVALUATION; INFORMATION RETRIEVAL SYSTEMS; NETWORK PROTOCOLS; PERFORMANCE;

EID: 33746779994     PISSN: 10943420     EISSN: 17412846     Source Type: Journal    
DOI: 10.1177/1094342006067469     Document Type: Article
Times cited : (98)

References (38)
  • 6
    • 0032313590 scopus 로고    scopus 로고
    • The relative over-head of piggybacking in causal message logging protocols
    • Los Alamitos, CA: IEEE CS Press
    • Bhatia, K., Marzullo, K., and Alvisi, L. 1998. The relative over-head of piggybacking in causal message logging protocols. 17th Symposium on Reliable Distributed Systems (SRDS'98), pp. 348-353. Los Alamitos, CA: IEEE CS Press.
    • (1998) 17th Symposium on Reliable Distributed Systems (SRDS'98) , pp. 348-353
    • Bhatia, K.1    Marzullo, K.2    Alvisi, L.3
  • 12
    • 0022020346 scopus 로고
    • Distributed snapshots: Determining global states of distributed systems
    • ACM
    • Chandy, K. M. and Lamport, L. 1985. Distributed snapshots: Determining global states of distributed systems. Transactions on Computer Systems 3(1):63-75. ACM.
    • (1985) Transactions on Computer Systems , vol.3 , Issue.1 , pp. 63-75
    • Chandy, K.M.1    Lamport, L.2
  • 15
    • 0026867749 scopus 로고
    • Manetho: Transparent rollback-recovery with low overhead, limited rollback and fast output
    • Elnozahy, E. N. and Zwaenepoel, W. 1992b. Manetho: Transparent rollback-recovery with low overhead, limited rollback and fast output. IEEE Transactions on Computers 41(5).
    • (1992) IEEE Transactions on Computers , vol.41 , Issue.5
    • Elnozahy, E.N.1    Zwaenepoel, W.2
  • 16
    • 0042078549 scopus 로고    scopus 로고
    • A survey of rollback-recovery protocols in message-passing systems
    • Elnozahy, M., Alvisi, L., Wang, Y. M., and Johnson, D. B. 2002. A survey of rollback-recovery protocols in message-passing systems. ACM Computing Surveys (CSUR) 34(3):375-408.
    • (2002) ACM Computing Surveys (CSUR) , vol.34 , Issue.3 , pp. 375-408
    • Elnozahy, M.1    Alvisi, L.2    Wang, Y.M.3    Johnson, D.B.4
  • 17
    • 84940567900 scopus 로고    scopus 로고
    • FT-MPI: Fault tolerant mpi, supporting dynamic applications in a dynamic world
    • Balatonfüred, Hungary. Heidelberg: Springer-Verlag
    • Fagg, G. and Dongarra, J. 2000. FT-MPI: Fault tolerant mpi, supporting dynamic applications in a dynamic world. 7th Euro PVM/MPI User's Group Meeting 2000, vol. 1908, Balatonfüred, Hungary. Heidelberg: Springer-Verlag.
    • (2000) 7th Euro PVM/MPI User's Group Meeting 2000 , vol.1908
    • Fagg, G.1    Dongarra, J.2
  • 20
    • 0030243005 scopus 로고    scopus 로고
    • High-performance, portable implementation of the MPI message passing interface standard
    • Gropp, W., Lusk, E., Doss, N., and Skjellum, A. 1996. High-performance, portable implementation of the MPI message passing interface standard. Parallel Computing 22(6): 789-828.
    • (1996) Parallel Computing , vol.22 , Issue.6 , pp. 789-828
    • Gropp, W.1    Lusk, E.2    Doss, N.3    Skjellum, A.4
  • 25
    • 0003912256 scopus 로고    scopus 로고
    • Checkpoint and migration of UNIX processes in the condor distributed processing system
    • Technical Report Technical Report 1346, University of Wisconsin-Madison
    • Litzkow, M., Tannenbaum, T., Basney, J., and Livny, M. 1997. Checkpoint and migration of UNIX processes in the condor distributed processing system. Technical Report Technical Report 1346, University of Wisconsin-Madison.
    • (1997)
    • Litzkow, M.1    Tannenbaum, T.2    Basney, J.3    Livny, M.4
  • 27
    • 0035201417 scopus 로고    scopus 로고
    • Processor allocation and checkpoint interval selection in cluster computing systems
    • Planck, J. S. and Thomason, M. G. 2001. Processor allocation and checkpoint interval selection in cluster computing systems. Journal of Parallel and Distributed Computing 61(11): 1570-1590.
    • (2001) Journal of Parallel and Distributed Computing , vol.61 , Issue.11 , pp. 1570-1590
    • Planck, J.S.1    Thomason, M.G.2
  • 28
    • 85014175705 scopus 로고    scopus 로고
    • Experimental assessment of workstation failures and their impact on checkpointing systems
    • Los Alamitos, CA: IEEE CS Press
    • Plank, J. S. and Elwasif, W. R. 1998. Experimental assessment of workstation failures and their impact on checkpointing systems. 28th Symposium on Fault-Tolerant Computing (FTCS'98), pp. 48-57. Los Alamitos, CA: IEEE CS Press.
    • (1998) 28th Symposium on Fault-Tolerant Computing (FTCS'98) , pp. 48-57
    • Plank, J.S.1    Elwasif, W.R.2
  • 31
    • 0032597696 scopus 로고    scopus 로고
    • Egida: An extensible toolkit for low-overhead fault-tolerance
    • In Los Alamitos, CA: IEEE CS Press
    • Rao, S., Alvisi, L., and Vin, H. M. 1999. Egida: An extensible toolkit for low-overhead fault-tolerance. In 29th Symposium on Fault-Tolerant Computing (FTCS'99), pp. 48-55. Los Alamitos, CA: IEEE CS Press.
    • (1999) 29th Symposium on Fault-Tolerant Computing (FTCS'99) , pp. 48-55
    • Rao, S.1    Alvisi, L.2    Vin, H.M.3
  • 36
    • 0022112420 scopus 로고
    • Optimistic recovery in distributed systems
    • ACM
    • Strom, R. and Yemini, S. 1985. Optimistic recovery in distributed systems. Transactions on Computer Systems 3(3):204-226. ACM.
    • (1985) Transactions on Computer Systems , vol.3 , Issue.3 , pp. 204-226
    • Strom, R.1    Yemini, S.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.