메뉴 건너뛰기




Volumn , Issue , 2007, Pages 23-32

Proactive fault tolerance for HPC with Xen virtualization

Author keywords

High performance computing; Proactive fault tolerance; Virtualization

Indexed keywords

CLUSTER ANALYSIS; COMPUTER OPERATING SYSTEMS; COMPUTER SYSTEM RECOVERY; FAULT TOLERANCE; HEALTH RISKS;

EID: 34548046749     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1274971.1274978     Document Type: Conference Paper
Times cited : (291)

References (43)
  • 1
    • 34548025750 scopus 로고    scopus 로고
    • Ganglia, http://ganglia.sourceforge.net/.
    • Ganglia
  • 2
    • 34548052823 scopus 로고    scopus 로고
    • OpenIPMI
    • OpenIPMI. http://openipmi.sourceforge.net/.
  • 5
    • 0344867889 scopus 로고
    • MOSIX: An integrated multiprocessor UNIX. In USENIX Association, editor
    • San Diego, California, USA, Berkeley, CA, USA, Winter, USENIX
    • A. Barak and R. Wheeler. MOSIX: An integrated multiprocessor UNIX. In USENIX Association, editor, Proceedings of the Winter 1989 USENIX Conference: January 30-February 3, 1989, San Diego, California, USA, pages 101-112, Berkeley, CA, USA, Winter 1989. USENIX.
    • (1989) Proceedings of the Winter 1989 USENIX Conference: January 30-February 3 , pp. 101-112
    • Barak, A.1    Wheeler, R.2
  • 7
    • 0038194608 scopus 로고    scopus 로고
    • MPICH-V: Toward a scalable fault tolerant MPI for volatile nodes
    • Nov
    • G. Bosilca, A. Boutellier, and F. Cappello. MPICH-V: Toward a scalable fault tolerant MPI for volatile nodes. In Supercomputing, Nov. 2002.
    • (2002) Supercomputing
    • Bosilca, G.1    Boutellier, A.2    Cappello, F.3
  • 8
    • 84957017252 scopus 로고    scopus 로고
    • A scalable process-management environment for parallel programs
    • R. Butler, W. Gropp, and E. L. Lusk. A scalable process-management environment for parallel programs. In Euro PVM/MPI, pages 168-175, 2000.
    • (2000) Euro PVM/MPI , pp. 168-175
    • Butler, R.1    Gropp, W.2    Lusk, E.L.3
  • 13
    • 0026205353 scopus 로고
    • Transparent process migration: Design alternatives and the sprite implementation
    • F. Douglis and J. K. Ousterhout. Transparent process migration: Design alternatives and the sprite implementation. Softw., Pract. Exper., 21(8):757-785, 1991.
    • (1991) Softw., Pract. Exper , vol.21 , Issue.8 , pp. 757-785
    • Douglis, F.1    Ousterhout, J.K.2
  • 14
    • 34548049440 scopus 로고    scopus 로고
    • J. Duell. The design and implementation of berkeley lab's linux checkpoint/restart. Tr, Lawrence Berkeley National Laboratory, 2000.
    • J. Duell. The design and implementation of berkeley lab's linux checkpoint/restart. Tr, Lawrence Berkeley National Laboratory, 2000.
  • 15
    • 0026867749 scopus 로고
    • Manetho: Transparent roll back-recovery with low overhead, limited rollback, and fast output commit
    • E. N. Elnozahy and W. Zwaenepoel. Manetho: Transparent roll back-recovery with low overhead, limited rollback, and fast output commit. IEEE Trans. Comput., 41(5):526-531, 1992.
    • (1992) IEEE Trans. Comput , vol.41 , Issue.5 , pp. 526-531
    • Elnozahy, E.N.1    Zwaenepoel, W.2
  • 22
    • 34548033627 scopus 로고    scopus 로고
    • Personal communications. Ruud Haring
    • July
    • IBM T.J. Watson. Personal communications. Ruud Haring, July 2005.
    • (2005)
    • Watson, I.T.J.1
  • 25
    • 57349155964 scopus 로고    scopus 로고
    • High performance vmm-bypass i/o in virtual machines
    • June
    • J. Liu, W. Huang, B. Abali, and D. Panda. High performance vmm-bypass i/o in virtual machines. In USENIX Conference, June 2006.
    • (2006) USENIX Conference
    • Liu, J.1    Huang, W.2    Abali, B.3    Panda, D.4
  • 29
    • 2642552074 scopus 로고    scopus 로고
    • The design and implementation of zap: A system for migrating computing environments
    • S. Osman, D. Subhraveti, G. Su, and J. Nieh. The design and implementation of zap: A system for migrating computing environments. In OSDI, 2002.
    • (2002) OSDI
    • Osman, S.1    Subhraveti, D.2    Su, G.3    Nieh, J.4
  • 32
    • 34548051006 scopus 로고    scopus 로고
    • S. Rani, C. Leangsuksun, A. Tikotekar, V. Rampure, and S. Scott. Toward efficient failre detection and recovery in hpc. In High Availability and Performance Computing Workshop, page (accepted), 2006.
    • S. Rani, C. Leangsuksun, A. Tikotekar, V. Rampure, and S. Scott. Toward efficient failre detection and recovery in hpc. In High Availability and Performance Computing Workshop, page (accepted), 2006.
  • 38
    • 0029713612 scopus 로고    scopus 로고
    • G. Stellner. CoCheck: checkpointing and process migration for MPI. In IEEE, editor, Proceedings of IPPS '96. The 10th International Parallel Processing Symposium: Honolulu, HI, USA, 15-19 April 1996, pages 526-531, 1109 Spring Street, Suite 300, Silver Spring, MD 20910, USA, 1996. IEEE Computer Society Press.
    • G. Stellner. CoCheck: checkpointing and process migration for MPI. In IEEE, editor, Proceedings of IPPS '96. The 10th International Parallel Processing Symposium: Honolulu, HI, USA, 15-19 April 1996, pages 526-531, 1109 Spring Street, Suite 300, Silver Spring, MD 20910, USA, 1996. IEEE Computer Society Press.
  • 39
    • 0022302038 scopus 로고
    • Preemptable remote execution facilities for the v-system
    • M. Theimer, K. A. Lantz, and D. R. Cheriton. Preemptable remote execution facilities for the v-system. In SOSP, pages 2-12, 1985.
    • (1985) SOSP , pp. 2-12
    • Theimer, M.1    Lantz, K.A.2    Cheriton, D.R.3
  • 40
    • 34548768671 scopus 로고    scopus 로고
    • C. Wang, F. Mueller, C. Engelmann, and S. Scott. A job pause service under lam/mpi+blcr for transparent fault tolerance. In International Parallel and Distributed Processing Symposium, page (accepted), Apr. 2007.
    • C. Wang, F. Mueller, C. Engelmann, and S. Scott. A job pause service under lam/mpi+blcr for transparent fault tolerance. In International Parallel and Distributed Processing Symposium, page (accepted), Apr. 2007.
  • 42
    • 85014969248 scopus 로고    scopus 로고
    • Architectural requirements and scalability of the NAS parallel benchmarks
    • F. Wong, R. Martin, R. Arpaci-Dusseau, and D. Culler. Architectural requirements and scalability of the NAS parallel benchmarks. In Supercomputing, 1999.
    • (1999) Supercomputing
    • Wong, F.1    Martin, R.2    Arpaci-Dusseau, R.3    Culler, D.4
  • 43
    • 84987240026 scopus 로고
    • Attacking the process migration bottleneck
    • E. R. Zayas. Attacking the process migration bottleneck. In SOSP, pages 13-24, 1987.
    • (1987) SOSP , pp. 13-24
    • Zayas, E.R.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.