메뉴 건너뛰기




Volumn , Issue , 2007, Pages 303-311

Evaluation of fault-tolerant policies using simulation

Author keywords

[No Author keywords available]

Indexed keywords

APPLICATION EXECUTION; CHECK-POINTING; CHECKPOINT/RESTART; CLUSTER COMPUTING; DEFAULT BEHAVIOR; FAULT-TOLERANT; FT MECHANISMS; INTERNATIONAL CONFERENCES; LAWRENCE LIVERMORE NATIONAL LABORATORY; SIMULATED RESULTS; SIMULATION FRAMEWORK; SYSTEM FAILURES;

EID: 53349098075     PISSN: 15525244     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/CLUSTR.2007.4629244     Document Type: Conference Paper
Times cited : (22)

References (24)
  • 1
    • 53349160576 scopus 로고    scopus 로고
    • The software challenges of petascale computing
    • K. Yelick, "The software challenges of petascale computing," HPCwire interview, 2006.
    • (2006) HPCwire interview
    • Yelick, K.1
  • 2
    • 34548100442 scopus 로고    scopus 로고
    • Investigating lightweight storage and overlay networks for fault tolerance
    • Santa Fe, New Mexico, USA: Held in conjunction with LACSI, OCT
    • R. Oldfield, "Investigating lightweight storage and overlay networks for fault tolerance," in HAPCW'06: High Availability and Performance Computing Workshop. Santa Fe, New Mexico, USA: Held in conjunction with LACSI 2006, OCT 2006.
    • (2006) HAPCW'06: High Availability and Performance Computing Workshop , pp. 2006
    • Oldfield, R.1
  • 3
    • 53349146014 scopus 로고    scopus 로고
    • Gang scheduling performance on a cluster of non-dedicated workstations
    • Washington, DC, USA: IEEE Computer Society
    • H. D. Karatza, "Gang scheduling performance on a cluster of non-dedicated workstations," in SS 02: Proceedings of the 35th Annual Simulation Symposium. Washington, DC, USA: IEEE Computer Society, 2002, p. 235.
    • (2002) SS 02: Proceedings of the 35th Annual Simulation Symposium , pp. 235
    • Karatza, H.D.1
  • 5
    • 46149124815 scopus 로고    scopus 로고
    • H. Rajaei, M. Dadfar, and P. Joshi, Simulation of job scheduling for small scale clusters, in WSC '06: Proceedings of the 38th conference on Winter simulation. Winter Simulation Conference, 2006, pp. 1195-1201.
    • H. Rajaei, M. Dadfar, and P. Joshi, "Simulation of job scheduling for small scale clusters," in WSC '06: Proceedings of the 38th conference on Winter simulation. Winter Simulation Conference, 2006, pp. 1195-1201.
  • 7
    • 0027868954 scopus 로고    scopus 로고
    • R. Goswami, K.K.; Iyer, Simulation of software behavior under hardware faults, in Fault-Tolerant Computing, 1993. FTCS-23. Digest of Papers., The Twenty-Third Internat Symposium on, Iss., 22-24 Jun 1993, 1993, pp. 218-227.
    • R. Goswami, K.K.; Iyer, "Simulation of software behavior under hardware faults," in Fault-Tolerant Computing, 1993. FTCS-23. Digest of Papers., The Twenty-Third Internat Symposium on, Vol., Iss., 22-24 Jun 1993, 1993, pp. 218-227.
  • 9
    • 33746286070 scopus 로고    scopus 로고
    • A. J. Oliner, R. K. Sahoo, J. E. Moreira, and M. Gupta, Performance implications of periodic checkpointing on large-scale cluster systems, in IPDPS '05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 18. Washington, DC, USA: IEEE Computer Society, 2005, p. 299.2.
    • A. J. Oliner, R. K. Sahoo, J. E. Moreira, and M. Gupta, "Performance implications of periodic checkpointing on large-scale cluster systems," in IPDPS '05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 18. Washington, DC, USA: IEEE Computer Society, 2005, p. 299.2.
  • 13
    • 84875366511 scopus 로고    scopus 로고
    • Slurm: Simple linux utility for resource management
    • Online, Available:, http://www.llnl.gov/linux/slurm/slurm.html
    • "Slurm: Simple linux utility for resource management," http://www.llnl.gov/linux/slurm/slurm.html. [Online]. Available: http://www.llnl.gov/linux/slurm/slurm.html
  • 14
    • 53349151375 scopus 로고    scopus 로고
    • Deja vu software
    • Online, Available:, http://www.californiadigital.com/sw.html
    • "Deja vu software," http://www.californiadigital.com/sw.html. [Online], Available: http://www.californiadigital.com/sw.html
  • 15
    • 85084159983 scopus 로고    scopus 로고
    • Libckpt: Transparent checkpointing under unix
    • New Orleans USA, pp, Jan. 1995, Online, Available
    • J. S. Plank, M. Beck, G. Kingsley, and K. Li, "Libckpt: Transparent checkpointing under unix," Proceedings of USENIX Winter 1995 Technical Conference, New Orleans USA, pp. 213-224, Jan. 1995. [Online]. Available: citeseer.ist.psu.edu/plank95libckpt.html
    • Proceedings of USENIX Winter 1995 Technical Conference , pp. 213-224
    • Plank, J.S.1    Beck, M.2    Kingsley, G.3    Li, K.4
  • 16
    • 53349092011 scopus 로고    scopus 로고
    • G. Stellner, Cocheck: Checkpointing and process migration for mpi, In Proceedings of the 10th International Parallel Processing Symposium (IPPS '96), Honolulu, 1996.
    • G. Stellner, "Cocheck: Checkpointing and process migration for mpi," In Proceedings of the 10th International Parallel Processing Symposium (IPPS '96), Honolulu, 1996.
  • 18
    • 53349156961 scopus 로고    scopus 로고
    • G. Hamerly and C. Elkan, Bayesian approaches to failure prediction for disk drives, In Proceedings of the eighteenth international conference on machine learning, 2001.
    • G. Hamerly and C. Elkan, "Bayesian approaches to failure prediction for disk drives," In Proceedings of the eighteenth international conference on machine learning, 2001.
  • 19
    • 53349092010 scopus 로고    scopus 로고
    • Technical Report UCB/CSD-99-1042, University of California, Berkeley, Computer Science Division
    • N. Talagala and D. Patterson, "An analysis of error behavior in a large storage system," Technical Report UCB/CSD-99-1042, University of California, Berkeley, Computer Science Division, 1999.
    • (1999)
    • Talagala, N.1    Patterson, D.2
  • 22
    • 33751082401 scopus 로고    scopus 로고
    • Exploit failure prediction for adaptive fault-tolerance in cluster computing
    • Y. Li and Z. Lan, "Exploit failure prediction for adaptive fault-tolerance in cluster computing," CCGrid, vol. 0, pp. 531-538, 2006.
    • (2006) CCGrid , vol.0 , pp. 531-538
    • Li, Y.1    Lan, Z.2
  • 23
    • 21644433634 scopus 로고    scopus 로고
    • E. Dragovic, P. Barham, K. Fraser, S. Hand, T. H. A. Ho, R. Neugebauery, I. Pratt, and A. Warfield, Xen and the art of virtualization, In the Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), 2003.
    • E. Dragovic, P. Barham, K. Fraser, S. Hand, T. H. A. Ho, R. Neugebauery, I. Pratt, and A. Warfield, "Xen and the art of virtualization," In the Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), 2003.


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.