메뉴 건너뛰기




Volumn 20, Issue 2, 2009, Pages 180-190

Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids

Author keywords

availability; Distributed systems; fault tolerance; performance of systems

Indexed keywords


EID: 85008006673     PISSN: 10459219     EISSN: None     Source Type: Journal    
DOI: 10.1109/TPDS.2008.93     Document Type: Article
Times cited : (78)

References (36)
  • 2
    • 0343644421 scopus 로고    scopus 로고
    • http://www.cs.huji.ac.il/ labs/parallel/workload/
    • D. Feitelson, Parallel Workloads Archive,http://www.cs.huji.ac.il/ labs/parallel/workload/, 2008
    • (2008) Parallel Workloads Archive
    • Feitelson, D.1
  • 4
    • 33750624416 scopus 로고    scopus 로고
    • A Flexible Framework for Fault Tolerance in the Grid
    • Sept
    • S. Hwang and C. Kesselman, “A Flexible Framework for Fault Tolerance in the Grid,” J. Grid Computing, vol. 1, no. 3, pp. 251–272, Sept. 2003
    • (2003) J. Grid Computing , vol.1 , Issue.3 , pp. 251-272
    • Hwang, S.1    Kesselman, C.2
  • 5
    • 3042583502 scopus 로고    scopus 로고
    • Distributed Diagnosis in Dynamic Fault Environments
    • A. Subbiah and D. Blough, “Distributed Diagnosis in Dynamic Fault Environments,” Parallel and Distributed Systems, vol. 15, no. 5, pp. 453–467, 2004
    • (2004) Parallel and Distributed Systems , vol.15 , Issue.5 , pp. 453-467
    • Subbiah, A.1    Blough, D.2
  • 6
    • 85006105225 scopus 로고    scopus 로고
    • A New Fault-Tolerance Framework for Grid Computing
    • Y. Derbal, “A New Fault-Tolerance Framework for Grid Computing,” Multiagent and Grid Systems, vol. 2, no. 2, pp. 115–133, 2006
    • (2006) Multiagent and Grid Systems , vol.2 , Issue.2 , pp. 115-133
    • Derbal, Y.1
  • 9
    • 0036504529 scopus 로고    scopus 로고
    • Matching and Scheduling Algorithms for Minimizing Execution Time and Failure Probability of Applications in Heterogeneous Computing
    • A. Dogan and F. Osgunger, “Matching and Scheduling Algorithms for Minimizing Execution Time and Failure Probability of Applications in Heterogeneous Computing,” Parallel and Distributed Systems, vol. 13, no. 3, pp. 308–323, 2002
    • (2002) Parallel and Distributed Systems , vol.13 , Issue.3 , pp. 308-323
    • Dogan, A.1    Osgunger, F.2
  • 14
    • 33750297342 scopus 로고    scopus 로고
    • On Checkpointing and Heavy-Tails in Unreliable Computing Environments
    • C. Bossie and P. Fiorini, “On Checkpointing and Heavy-Tails in Unreliable Computing Environments,” SIGMETRICS Perforance Evaluation Rev., vol. 34, no. 2, pp. 13–15, 2006
    • (2006) SIGMETRICS Perforance Evaluation Rev , vol.34 , Issue.2 , pp. 13-15
    • Bossie, C.1    Fiorini, P.2
  • 17
    • 84976846528 scopus 로고
    • A First Order Approximation to the Optimum Checkpoint Interval
    • Sept
    • J. Young, “A First Order Approximation to the Optimum Checkpoint Interval,” Comm. ACM, vol. 17, no. 9, pp. 530–531, Sept. 1974
    • (1974) Comm. ACM , vol.17 , Issue.9 , pp. 530-531
    • Young, J.1
  • 18
    • 0018454850 scopus 로고
    • On the Optimum Checkpoint Interval
    • Apr
    • E. Gelenbe, “On the Optimum Checkpoint Interval,” J. ACM, vol. 26, no. 2, pp. 259–270, Apr. 1979
    • (1979) J. ACM , vol.26 , Issue.2 , pp. 259-270
    • Gelenbe, E.1
  • 19
    • 84976696875 scopus 로고
    • Performance Analysis of Checkpointing Strategies
    • May
    • A. Tantawi and M. Ruschitzka, “Performance Analysis of Checkpointing Strategies,” ACM Trans. Computer Systems, vol. 2, no. 2, pp. 123–144, May 1984
    • (1984) ACM Trans. Computer Systems , vol.2 , Issue.2 , pp. 123-144
    • Tantawi, A.1    Ruschitzka, M.2
  • 24
    • 33845777089 scopus 로고    scopus 로고
    • Performance and Effectiveness Trade-Off for Checkpointing in Fault-Tolerant Distributed Systems
    • P. Katsaros, L. Angelis, and C. Lazos, “Performance and Effectiveness Trade-Off for Checkpointing in Fault-Tolerant Distributed Systems,” Concurrency and Computation: Practice and Experience, vol. 19, no. 1, pp. 37–63, 2007
    • (2007) Concurrency and Computation: Practice and Experience , vol.19 , Issue.1 , pp. 37-63
    • Katsaros, P.1    Angelis, L.2    Lazos, C.3
  • 25
    • 79952168926 scopus 로고    scopus 로고
    • Using Adaptive Fault Tolerance to Improve Application Robustness on the TeraGrid
    • June
    • Y. Li and Z. Lan, “Using Adaptive Fault Tolerance to Improve Application Robustness on the TeraGrid,” Proc. TeraGrid Conf., June 2007
    • (2007) Proc. TeraGrid Conf
    • Li, Y.1    Lan, Z.2
  • 28
    • 0031383052 scopus 로고    scopus 로고
    • Performance Optimization of Checkpointing Schemes with Task Duplication
    • Dec
    • A. Ziv and J. Bruck, “Performance Optimization of Checkpointing Schemes with Task Duplication,” IEEE Trans. Computers, vol. 46, no. 12, pp. 1381–1386, Dec. 1997
    • (1997) IEEE Trans. Computers , vol.46 , Issue.12 , pp. 1381-1386
    • Ziv, A.1    Bruck, J.2
  • 29
    • 0028518423 scopus 로고
    • Roll-Forward Checkpointing Scheme: A Novel Fault-Tolerant Architecture
    • Oct
    • D. Pradhan and N. Vaidya, “Roll-Forward Checkpointing Scheme: A Novel Fault-Tolerant Architecture,” IEEE Trans. Computers, vol. 43, no. 10, pp. 1163–1174, Oct. 1994
    • (1994) IEEE Trans. Computers , vol.43 , Issue.10 , pp. 1163-1174
    • Pradhan, D.1    Vaidya, N.2
  • 30
    • 38049155259 scopus 로고    scopus 로고
    • A Problem of Program Execution Time Measurement
    • M. Hajdukovic, Z. Suvajdzin, Z. Zivanov, and E. Hodzic, “A Problem of Program Execution Time Measurement,” Novi Sad J. Math., vol. 33, no. 1, pp. 67–73, 2003
    • (2003) Novi Sad J. Math , vol.33 , Issue.1 , pp. 67-73
    • Hajdukovic, M.1    Suvajdzin, Z.2    Zivanov, Z.3    Hodzic, E.4
  • 31
    • 0036871428 scopus 로고    scopus 로고
    • GridSim: A Toolkit for the Modeling and Simulation of Distributed Resource Management and Scheduling for Grid Computing
    • Wiley, Nov.-Dec
    • R. Buyya and M. Murshed, “GridSim: A Toolkit for the Modeling and Simulation of Distributed Resource Management and Scheduling for Grid Computing,” J. Concurrency and Computation: Practice and Experience, vol. 14, nos. 13–15, Wiley, Nov.-Dec. 2002
    • (2002) J. Concurrency and Computation: Practice and Experience , vol.14 , pp. 13-15
    • Buyya, R.1    Murshed, M.2
  • 34
    • 0346613481 scopus 로고    scopus 로고
    • Charging and Rate Control for Elastic Traffic
    • F. Kelly, “Charging and Rate Control for Elastic Traffic,” European Trans. Telecomm., vol. 8, pp. 33–37, 1997
    • (1997) European Trans. Telecomm , vol.8 , pp. 33-37
    • Kelly, F.1
  • 35
    • 12444323034 scopus 로고    scopus 로고
    • A Network Model for Simulation of Grid Application
    • Ecole Normale Superieure de Lyon, Laboratoire de l’Informatique du Parallélisme
    • H. Casanova and L. Marchal, “A Network Model for Simulation of Grid Application,” technical report, Ecole Normale Superieure de Lyon, Laboratoire de l’Informatique du Parallélisme, 2002
    • (2002) technical report
    • Casanova, H.1    Marchal, L.2
  • 36
    • 0345446547 scopus 로고    scopus 로고
    • The Workload on Parallel Supercomputers: Modeling the Characteristics of Rigid Jobs
    • Nov. 1992
    • U. Lublin and D. Feitelson, “The Workload on Parallel Supercomputers: Modeling the Characteristics of Rigid Jobs,” Parallel and Distributed Computing, vol. 63, no. 11, pp. 1105–1122, Nov. 1992, 2003
    • (2003) Parallel and Distributed Computing , vol.63 , Issue.11 , pp. 1105-1122
    • Lublin, U.1    Feitelson, D.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.