메뉴 건너뛰기




Volumn 1, Issue 2, 2004, Pages 97-108

Checkpointing for Peta-scale systems: A look into the future of practical rollback-recovery

Author keywords

Availability; Distributed applications; Distributed systems; Evaluation; Fault tolerance; Measurement; Modeling; Modeling techniques; Performance of systems; Reliability; Serviceability; Simulation of multiple processor systems

Indexed keywords

AVAILABILITY; COMMUNICATION SYSTEMS; COMPUTER SIMULATION; EVALUATION; FAULT TOLERANT COMPUTER SYSTEMS; MEASUREMENT THEORY; PERFORMANCE; PROGRAM PROCESSORS; RELIABILITY;

EID: 9144223280     PISSN: 15455971     EISSN: None     Source Type: Journal    
DOI: 10.1109/TDSC.2004.15     Document Type: Article
Times cited : (155)

References (26)
  • 2
    • 85020592954 scopus 로고
    • Converting a swap-based system to do paging in an architecture lacking page-reference bits
    • O. Babaoglu and W. Joy, "Converting a Swap-Based System to Do Paging in an Architecture Lacking Page-Reference Bits," Proc. Symp. Operating Systems Principles, pp. 78-86, 1981.
    • (1981) Proc. Symp. Operating Systems Principles , pp. 78-86
    • Babaoglu, O.1    Joy, W.2
  • 3
    • 0031570635 scopus 로고    scopus 로고
    • Application level fault tolerance in heterogeneous networks of workstations
    • June
    • A. Beguelin, E. Seligman, and P. Stephan, "Application Level Fault Tolerance in Heterogeneous Networks of Workstations," J. Parallel and Distributed Computing, vol. 43, no. 2, pp. 147-155, June 1997.
    • (1997) J. Parallel and Distributed Computing , vol.43 , Issue.2 , pp. 147-155
    • Beguelin, A.1    Seligman, E.2    Stephan, P.3
  • 4
    • 0024123530 scopus 로고
    • Independent checkpointing and concurrent rollback for recovery - An optimistic approach
    • B. Bhargava and S.R. Lian, "Independent Checkpointing and Concurrent Rollback for Recovery - An Optimistic Approach," Proc. Symp. Reliable Distributed Systems, pp. 3-12, 1988.
    • (1988) Proc. Symp. Reliable Distributed Systems , pp. 3-12
    • Bhargava, B.1    Lian, S.R.2
  • 6
    • 84941514592 scopus 로고
    • Rollback and recovery strategies for computer programs
    • June
    • M. Chandy and C.V. Ramamoorthy, "Rollback and Recovery Strategies for Computer Programs," IEEE Trans. Computers, vol. 21, no. 6, pp. 546-556, June 1972.
    • (1972) IEEE Trans. Computers , vol.21 , Issue.6 , pp. 546-556
    • Chandy, M.1    Ramamoorthy, C.V.2
  • 7
    • 0022020346 scopus 로고
    • Distributed snapshots: Determining global states of distributed systems
    • Aug.
    • M. Chandy and L. Lamport, "Distributed Snapshots: Determining Global States of Distributed Systems," ACM Trans. Computing Systems, vol. 3, no. 1, pp. 63-75, Aug. 1985.
    • (1985) ACM Trans. Computing Systems , vol.3 , Issue.1 , pp. 63-75
    • Chandy, M.1    Lamport, L.2
  • 8
    • 0042078549 scopus 로고    scopus 로고
    • A survey of rollback-recovery protocols in message passing systems
    • Sept.
    • E.N. Elnozahy, L. Alvisi, Y.-M. Wang, and D.B. Johnson, "A Survey of Rollback-Recovery Protocols in Message Passing Systems," ACM Computing Surveys, vol. 34, no. 3, Sept. 2002.
    • (2002) ACM Computing Surveys , vol.34 , Issue.3
    • Elnozahy, E.N.1    Alvisi, L.2    Wang, Y.-M.3    Johnson, D.B.4
  • 13
    • 9144219503 scopus 로고    scopus 로고
    • Intel hastily redraws road maps
    • May
    • M. Kanellos, "Intel Hastily Redraws Road Maps," CNET News, May 2004.
    • (2004) CNET News
    • Kanellos, M.1
  • 14
    • 0023090161 scopus 로고
    • Checkpointing and rollback-recovery for distributed systems
    • Jan.
    • R. Koo and S. Toueg, "Checkpointing and Rollback-Recovery for Distributed Systems," IEEE Trans. Software Eng., vol. 13, no. 1, pp. 23-31, Jan. 1987.
    • (1987) IEEE Trans. Software Eng. , vol.13 , Issue.1 , pp. 23-31
    • Koo, R.1    Toueg, S.2
  • 16
    • 0000793139 scopus 로고
    • Cramming more components unto integrated circuits
    • Apr.
    • G. Moore, "Cramming More Components unto Integrated Circuits," Electronics, Apr. 1965.
    • (1965) Electronics
    • Moore, G.1
  • 20
    • 0004097019 scopus 로고
    • Compressed differences: An algorithm for fast incremental checkpointing
    • Univ. of Tennessee at Knoxville, Aug.
    • J.S. Plank, J. Xu, and R.B. Netzer, "Compressed Differences: An Algorithm for Fast Incremental Checkpointing," Technical Report CS-95-302, Univ. of Tennessee at Knoxville, Aug. 1995.
    • (1995) Technical Report , vol.CS-95-302
    • Plank, J.S.1    Xu, J.2    Netzer, R.B.3
  • 21
    • 0016522101 scopus 로고
    • System structure for software fault-tolerance
    • June
    • B. Randell, "System Structure for Software Fault-Tolerance," IEEE Trans. Software Eng., vol. 1, no. 2, pp. 220-232, June 1975.
    • (1975) IEEE Trans. Software Eng. , vol.1 , Issue.2 , pp. 220-232
    • Randell, B.1
  • 22
    • 84976815497 scopus 로고
    • Fail-stop processors: An approach to designing fault-tolerant computing systems
    • Aug.
    • R.D. Schlichting and F.B. Schneider, "Fail-Stop Processors: An Approach to Designing Fault-Tolerant Computing Systems," ACM Trans. Computer Systems, vol. 1, no. 3, pp. 222-238, Aug. 1983.
    • (1983) ACM Trans. Computer Systems , vol.1 , Issue.3 , pp. 222-238
    • Schlichting, R.D.1    Schneider, F.B.2
  • 24
    • 0022112420 scopus 로고
    • Optimistic recovery in distributed systems
    • Aug.
    • R. Strom and S. Yemini, "Optimistic Recovery in Distributed Systems," ACM Trans. Computer Systems, vol. 3, no. 3, pp. 204-226, Aug. 1985.
    • (1985) ACM Trans. Computer Systems , vol.3 , Issue.3 , pp. 204-226
    • Strom, R.1    Yemini, S.2
  • 25
    • 0031388399 scopus 로고    scopus 로고
    • Impact of checkpoint latency on overhead ratio of a checkpointing scheme
    • Aug.
    • N. Vaidya, "Impact of Checkpoint Latency on Overhead Ratio of a Checkpointing Scheme," IEEE Trans. Computers, vol. 46, no. 8, pp. 942-947, Aug. 1997.
    • (1997) IEEE Trans. Computers , vol.46 , Issue.8 , pp. 942-947
    • Vaidya, N.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.