메뉴 건너뛰기




Volumn 34, Issue 3, 2002, Pages 375-408

A Survey of Rollback-Recovery Protocols in Message-Passing Systems

Author keywords

Message logging; Rollback recovery

Indexed keywords

MESSAGE LOGGING; ROLLBACK-RECOVERY PROTOCOLS;

EID: 0042078549     PISSN: 03600300     EISSN: None     Source Type: Journal    
DOI: 10.1145/568522.568525     Document Type: Review
Times cited : (1175)

References (63)
  • 2
    • 0032000230 scopus 로고    scopus 로고
    • Message logging: Pessimistic, optimistic, causal and optimal
    • ALVISI, L. AND MARZULLO, K. 1998. Message logging: pessimistic, optimistic, causal and optimal. IEEE Trans. Softw. Eng. 24, 2, 149-159.
    • (1998) IEEE Trans. Softw. Eng. , vol.24 , Issue.2 , pp. 149-159
    • Alvisi, L.1    Marzullo, K.2
  • 3
    • 0032597670 scopus 로고    scopus 로고
    • An analysis of communication-induced checkpointing
    • FTCS-29, The Twenty Nineth Annual International Symposium on FaultTolerant Computing (Madison, Wisconsin)
    • ALVISI, L., ELNOZAHY, E. N., RAO, S., HUSAIN, S. A., and MEL, A. D. 1999. An analysis of communication-induced checkpointing. In Digest of Papers, FTCS-29, The Twenty Nineth Annual International Symposium on FaultTolerant Computing (Madison, Wisconsin), 242-249.
    • (1999) Digest of Papers , pp. 242-249
    • Alvisi, L.1    Elnozahy, E.N.2    Rao, S.3    Husain, S.A.4    Mel, A.D.5
  • 4
    • 25544434088 scopus 로고
    • A runtime system
    • Department of Computer Science, Princeton University
    • APPEL, A. W. 1989. A runtime system. Technical Report CS-TR220-89, Department of Computer Science, Princeton University.
    • (1989) Technical Report , vol.CS-TR220-89
    • Appel, A.W.1
  • 9
    • 0031570635 scopus 로고    scopus 로고
    • Application-level fault tolerance in heterogeneous networks of workstations
    • BEGUELIN, A., SELIGMAN, E., AND STEPHAN, P. 1997. Application-level fault tolerance in heterogeneous networks of workstations. J. Parallel and Distributed Comput. 43, 2, 147-155.
    • (1997) J. Parallel and Distributed Comput. , vol.43 , Issue.2 , pp. 147-155
    • Beguelin, A.1    Seligman, E.2    Stephan, P.3
  • 15
    • 84941514592 scopus 로고
    • Rollback and recovery strategies for computer programs
    • CHANDY, M. AND RAMAMOORTHY, C. V. 1972. Rollback and recovery strategies for computer programs. IEEE Trans. Comput. 21, 6, 546-556.
    • (1972) IEEE Trans. Comput. , vol.21 , Issue.6 , pp. 546-556
    • Chandy, M.1    Ramamoorthy, C.V.2
  • 16
    • 0022020346 scopus 로고
    • Distributed snapshots: Determining global states of distributed systems
    • CHANDY, M. AND LAMPORT, L. 1985. Distributed snapshots: Determining global states of distributed systems. ACM Trans. Comut. Syst. 31, 1, 63-75.
    • (1985) ACM Trans. Comput. Syst. , vol.31 , Issue.1 , pp. 63-75
    • Chandy, M.1    Lamport, L.2
  • 19
    • 85043362389 scopus 로고    scopus 로고
    • How safe is probabilistic checkpointing?
    • FTCS-28, the Twenty Eight Annual International Symposium on Fault-Tolerant Computing
    • ELNOZAHY, E. N. 1998. How safe is probabilistic checkpointing? In Digest of Papers, FTCS-28, the Twenty Eight Annual International Symposium on Fault-Tolerant Computing, 358-363.
    • (1998) Digest of Papers , pp. 358-363
    • Elnozahy, E.N.1
  • 20
    • 0027988876 scopus 로고
    • On the use and implementing of message logging
    • FTCS-24, The Twenty Fourth International Symposium on Fault-Tolerant Computing
    • ELNOZAHY, E. N. AND ZWAENEPOEL, W. 1994. On the use and implementing of message logging. In Digest of Papers, FTCS-24, The Twenty Fourth International Symposium on Fault-Tolerant Computing, 298-307.
    • (1994) Digest of Papers , pp. 298-307
    • Elnozahy, E.N.1    Zwaenepoel, W.2
  • 26
    • 0027796299 scopus 로고
    • Software implemented fault tolerance: Technologies and experience
    • FTCS-23, the Twenty Third Annual International Symposium on Fault-Tolerant Computing
    • HUANG, Y. AND KINTALA, C. 1993. Software implemented fault tolerance: Technologies and experience. In Digest of Papers, FTCS-23, the Twenty Third Annual International Symposium on Fault-Tolerant Computing, 2-9.
    • (1993) Digest of Papers , pp. 2-9
    • Huang, Y.1    Kintala, C.2
  • 27
    • 0028994239 scopus 로고
    • Why optimistic message logging has not been used in telecommunication systems
    • FTCS25, the Twenty Fifth Annual International Symposium on Fault-Tolerant Computing
    • HUANG, Y. AND WANG, Y.-M. 1995. Why optimistic message logging has not been used in telecommunication systems. In Digest of Papers, FTCS25, the Twenty Fifth Annual International Symposium on Fault-Tolerant Computing, 459-463.
    • (1995) Digest of Papers , pp. 459-463
    • Huang, Y.1    Wang, Y.-M.2
  • 29
    • 0023207654 scopus 로고
    • Sender-based message logging
    • FTCS-17, The Seventeenth Annual International Symposium on Fault-Tolerant Computing
    • JOHNSON, D. B. AND ZWAENEPOEL, W. 1987. Sender-based message logging. In Digest of Papers, FTCS-17, The Seventeenth Annual International Symposium on Fault-Tolerant Computing, 14-19.
    • (1987) Digest of Papers , pp. 14-19
    • Johnson, D.B.1    Zwaenepoel, W.2
  • 30
    • 38249017422 scopus 로고
    • Recovery in distributed systems using optimistic message logging and checkpointing
    • JOHNSON, D. B. AND ZWAENEPOEL, W. 1990. Recovery in distributed systems using optimistic message logging and checkpointing. J. Algorithms 11, 3, 462-491.
    • (1990) J. Algorithms , vol.11 , Issue.3 , pp. 462-491
    • Johnson, D.B.1    Zwaenepoel, W.2
  • 32
    • 0023090161 scopus 로고
    • Checkpointing and rollback-recovery for distributed systems
    • KOO, R. AND TOUEG, S. 1987. Checkpointing and rollback-recovery for distributed systems. IEEE Trans. Softw. Engin. 13, 1, 23-31.
    • (1987) IEEE Trans. Softw. Engin. , vol.13 , Issue.1 , pp. 23-31
    • Koo, R.1    Toueg, S.2
  • 34
    • 0017996760 scopus 로고
    • Time, clocks, and the ordering of events in a distributed system
    • LAMPORT, L. 1978. Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21, 7, 588-565.
    • (1978) Commun. ACM , vol.21 , Issue.7 , pp. 588-565
    • Lamport, L.1
  • 35
    • 0005668895 scopus 로고
    • Crash recovery in a distributed data storage system
    • Xerox Palo Alto Research Center
    • LAMPSON, B. W. AND STURGIS, H. E. 1979. Crash recovery in a distributed data storage system. Technical Report, Xerox Palo Alto Research Center.
    • (1979) Technical Report
    • Lampson, B.W.1    Sturgis, H.E.2
  • 36
    • 0025625415 scopus 로고
    • CATCH: Compilerassisted techniques for checkpointing
    • FTCS-20, The Twentieth Annual International Symposium on Fault-Tolerant Computing
    • LI, C. C. AND FUCHS, W. K. 1990. CATCH: Compilerassisted techniques for checkpointing. In Digest of Papers, FTCS-20, The Twentieth Annual International Symposium on Fault-Tolerant Computing, 74-81.
    • (1990) Digest of Papers , pp. 74-81
    • Li, C.C.1    Fuchs, W.K.2
  • 38
    • 0031236741 scopus 로고    scopus 로고
    • A survey of recoverable distributed shared memory systems
    • MORIN, C. AND PUAUT, T. 1997. A survey of recoverable distributed shared memory systems. IEEE Trans. Parallel and Distributed Syst. 8, 9, 959-969.
    • (1997) IEEE Trans. Parallel and Distributed Syst. , vol.8 , Issue.9 , pp. 959-969
    • Morin, C.1    Puaut, T.2
  • 39
    • 21844494806 scopus 로고
    • Performance of consistent checkpointing in a modular operating system: Results of the FTM experiment
    • MULLER, G., HUE, M., AND PEYROUZ, N. 1994. Performance of consistent checkpointing in a modular operating system: Results of the FTM experiment. In Lecture Notes in Computer Science: Dependable Computing, EDDC-1, 491-508.
    • (1994) Lecture Notes in Computer Science: Dependable Computing , vol.EDDC-1 , pp. 491-508
    • Muller, G.1    Hue, M.2    Peyrouz, N.3
  • 40
    • 84950134799 scopus 로고    scopus 로고
    • Probabilistic checkpointing
    • FTCS-27, The Twenty Seventh Annual International Symposium on Fault-Tolerant Computing
    • NAM, H.-C., KIM, J., HONG, S. J., AND LEE, S. 1997. Probabilistic checkpointing. In Digest of Papers, FTCS-27, The Twenty Seventh Annual International Symposium on Fault-Tolerant Computing, 48-57.
    • (1997) Digest of Papers , pp. 48-57
    • Nam, H.-C.1    Kim, J.2    Hong, S.J.3    Lee, S.4
  • 41
    • 0029255243 scopus 로고
    • Necessary and sufficient conditions for consistent global snapshots
    • NETZER, R. H. AND Xu, J. 1995. Necessary and sufficient conditions for consistent global snapshots. IEEE Trans. Parallel and Distributed Syst. 6, 2, 165-169.
    • (1995) IEEE Trans. Parallel and Distributed Syst. , vol.6 , Issue.2 , pp. 165-169
    • Netzer, R.H.1    Xu, J.2
  • 44
    • 0028060943 scopus 로고
    • Faster checkpointing with N +1 parity
    • FTCS-24, The Twenty Fourth Annual International Symposiumn Fault-Tolerant Computing
    • PLANK, J. S. AND Li, K. 1994. Faster checkpointing with N +1 parity. In Digest of Papers, FTCS-24, The Twenty Fourth Annual International Symposium on Fault-Tolerant Computing, 288-297.
    • (1994) Digest of Papers , pp. 288-297
    • Plank, J.S.1    Li, K.2
  • 45
    • 0004097019 scopus 로고
    • Compressed differences: An algorithm for fast incremental checkpointing
    • University of Tennessee at Knoxville
    • PLANK, J. S., Xu, J., AND NETZER, R. H. 1995a. Compressed differences: An algorithm for fast incremental checkpointing. Technical Report CS-95-302, University of Tennessee at Knoxville.
    • (1995) Technical Report , vol.CS-95-302
    • Plank, J.S.1    Xu, J.2    Netzer, R.H.3
  • 47
    • 0030262195 scopus 로고    scopus 로고
    • Low-cost check-pointing and failure recovery in mobile computing systems
    • PRAKASH, R. AND SINGHAL, M. 1996. Low-cost check-pointing and failure recovery in mobile computing systems. IEEE Trans. Parallel and Distributed Syst. 7, 10, 1035-1048.
    • (1996) IEEE Trans. Parallel and Distributed Syst. , vol.7 , Issue.10 , pp. 1035-1048
    • Prakash, R.1    Singhal, M.2
  • 48
    • 0016522101 scopus 로고
    • System structure for software fault tolerance
    • RANDELL, B. 1975. System structure for software fault tolerance. IEEE Trans. Softw. Engin, 1, 2, 220-232.
    • (1975) IEEE Trans. Softw. Engin , vol.1 , Issue.2 , pp. 220-232
    • Randell, B.1
  • 51
    • 0000494607 scopus 로고
    • State restoration in systems of communicating processes
    • RUSSELL, D. L. 1980. State restoration in systems of communicating processes. IEEE Trans. Softw. Engin. 6, 2, 183-194.
    • (1980) IEEE Trans. Softw. Engin. , vol.6 , Issue.2 , pp. 183-194
    • Russell, D.L.1
  • 52
    • 84976815497 scopus 로고
    • Fail-stop processors: An approach to designing fault-tolerant computing systems
    • SCHLICHTUNG, R. D. AND SCHNEIDER, F. B. 1983. Fail-stop processors: An approach to designing fault-tolerant computing systems. ACM Trans. Comput. Syst. 1, 3, 222-238.
    • (1983) ACM Trans. Comput. Syst. , vol.1 , Issue.3 , pp. 222-238
    • Schlichtung, R.D.1    Schneider, F.B.2
  • 55
    • 0032182041 scopus 로고    scopus 로고
    • Support for software interrupts in log-based rollback-recovery
    • SLYE, J. H. AND ELNOZAHY, E. N. 1998. Support for software interrupts in log-based rollback-recovery. IEEE Trans. Comput. 47, 10, 1113-1123.
    • (1998) IEEE Trans. Comput. , vol.47 , Issue.10 , pp. 1113-1123
    • Slye, J.H.1    Elnozahy, E.N.2
  • 57
    • 0022112420 scopus 로고
    • Optimistic recovery in distributed systems
    • STROM, R. AND YEMINI, S. 1985. Optimistic recovery in distributed systems. ACM Trans. Comput. Syst. 3, 3, 204-226.
    • (1985) ACM Trans. Comput. Syst. , vol.3 , Issue.3 , pp. 204-226
    • Strom, R.1    Yemini, S.2
  • 59
    • 0026825917 scopus 로고
    • Rollback-recovery in distributed systems using loosely synchronized clocks
    • TONG, Z., KAIN, R. Y., AND TSAI, W. T. 1992. Rollback-recovery in distributed systems using loosely synchronized clocks. IEEE Trans. Parallel and Distributed Syst. 3, 2, 246-251.
    • (1992) IEEE Trans. Parallel and Distributed Syst. , vol.3 , Issue.2 , pp. 246-251
    • Tong, Z.1    Kain, R.Y.2    Tsai, W.T.3
  • 61
    • 0031124071 scopus 로고    scopus 로고
    • Consistent global checkpoints that contain a set of local checkpoints
    • WANG, Y.-M. 1997. Consistent global checkpoints that contain a set of local checkpoints. IEEE Trans. Comput. 46, 4, 456-468.
    • (1997) IEEE Trans. Comput. , vol.46 , Issue.4 , pp. 456-468
    • Wang, Y.-M.1
  • 62
    • 0344476633 scopus 로고
    • Tight upper bound on useful distributed system checkpoints
    • University of Illinois
    • WANG, Y.-M., CHUNG, P. Y., AND FUCHS, W. K. 1995a. Tight upper bound on useful distributed system checkpoints. Technical Report, University of Illinois.
    • (1995) Technical Report
    • Wang, Y.-M.1    Chung, P.Y.2    Fuchs, W.K.3
  • 63
    • 0029305383 scopus 로고
    • Checkpoint space reclamation for uncoordinated checkpointing in message-passing systems
    • WANG, Y.-M., CHUNG, P. Y., LIN, I. J., AND FUCHS, W. K. 1995b. Checkpoint space reclamation for uncoordinated checkpointing in message-passing systems. IEEE Trans. Parallel and Distributed Syst. 6, 5, 546-554.
    • (1995) IEEE Trans. Parallel and Distributed Syst. , vol.6 , Issue.5 , pp. 546-554
    • Wang, Y.-M.1    Chung, P.Y.2    Lin, I.J.3    Fuchs, W.K.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.