메뉴 건너뛰기




Volumn , Issue , 2007, Pages

A job pause service under LAM/MPI+BLCR for transparent fault tolerance

Author keywords

[No Author keywords available]

Indexed keywords

BENCHMARKING; LARGE SCALE SYSTEMS; PARALLEL PROCESSING SYSTEMS; REQUIREMENTS ENGINEERING;

EID: 34548768671     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/IPDPS.2007.370307     Document Type: Conference Paper
Times cited : (61)

References (32)
  • 3
    • 34548797982 scopus 로고    scopus 로고
    • T. Angskun, G. Fagg, G. Bosilca, J. Pjesivac-Grbovic, and J. Dongarra. Self-healing network for scalable fault tolerant runtime environments. In Austrian-Hungarian Workshop on Distributed and Parallel Systems, 2006.
    • T. Angskun, G. Fagg, G. Bosilca, J. Pjesivac-Grbovic, and J. Dongarra. Self-healing network for scalable fault tolerant runtime environments. In Austrian-Hungarian Workshop on Distributed and Parallel Systems, 2006.
  • 5
    • 0032203011 scopus 로고    scopus 로고
    • Coyote: A system for constructing fine-grain configurable communication services
    • N. T. Bhatti, M. A. Hiltunen, R. D. Schlichting, and W. Chiu. Coyote: a system for constructing fine-grain configurable communication services. ACM Trans. Comput. Syst., 16(4):321-366, 1998.
    • (1998) ACM Trans. Comput. Syst , vol.16 , Issue.4 , pp. 321-366
    • Bhatti, N.T.1    Hiltunen, M.A.2    Schlichting, R.D.3    Chiu, W.4
  • 6
    • 0038194608 scopus 로고    scopus 로고
    • MPICH-V: Toward a scalable fault tolerant MPI for volatile nodes
    • Nov
    • G. Bosilca, A. Boutellier, and F. Cappello. MPICH-V: Toward a scalable fault tolerant MPI for volatile nodes. In Supercomputing, Nov. 2002.
    • (2002) Supercomputing
    • Bosilca, G.1    Boutellier, A.2    Cappello, F.3
  • 9
    • 84957017252 scopus 로고    scopus 로고
    • A scalable processmanagement environment for parallel programs
    • R. Butler, W. Gropp, and E. L. Lusk. A scalable processmanagement environment for parallel programs. In Euro PVM/MPI, pages 168-175, 2000.
    • (2000) Euro PVM/MPI , pp. 168-175
    • Butler, R.1    Gropp, W.2    Lusk, E.L.3
  • 12
    • 34548769215 scopus 로고    scopus 로고
    • J. Duell. The design and implementation of berkeley lab's linux checkpoint/restart. Tr, Lawrence Berkeley National Laboratory, 2000.
    • J. Duell. The design and implementation of berkeley lab's linux checkpoint/restart. Tr, Lawrence Berkeley National Laboratory, 2000.
  • 14
    • 0003487248 scopus 로고
    • Strong and weak virtual synchrony in Horus
    • Technical Report TR95-1537, Cornell University, Computer Science Department, Aug. 24
    • R. Friedman and R. van Renesse. Strong and weak virtual synchrony in Horus. Technical Report TR95-1537, Cornell University, Computer Science Department, Aug. 24, 1995.
    • (1995)
    • Friedman, R.1    van Renesse, R.2
  • 15
    • 34548755483 scopus 로고    scopus 로고
    • A checkpoint and restart service specification for open mpi
    • Technical report, Indiana University, Computer Science Department
    • J. Hursey, J. M. Squyres, and A. Lumsdaine. A checkpoint and restart service specification for open mpi. Technical report, Indiana University, Computer Science Department, 2006.
    • (2006)
    • Hursey, J.1    Squyres, J.M.2    Lumsdaine, A.3
  • 16
    • 34548033627 scopus 로고    scopus 로고
    • Personal communications. Ruud Haring
    • July
    • IBM T.J. Watson. Personal communications. Ruud Haring, July 2005.
    • (2005)
    • Watson, I.T.J.1
  • 18
    • 0002695959 scopus 로고
    • Remote unix - turning idle workstations into cycle servers
    • M. Litzkow. Remote unix - turning idle workstations into cycle servers. In Usenix Summer Conference, pages 381-384, 1987.
    • (1987) Usenix Summer Conference , pp. 381-384
    • Litzkow, M.1
  • 19
    • 0003912256 scopus 로고    scopus 로고
    • Checkpoint and migration of UNIX processes in the Condor distributed processing system
    • Technical Report UW-CS-TR-1346, University of Wisconsin, Madison Computer Sciences Department, April
    • M. Litzkow, T. Tannenbaum, J. Basney, and M. Livny. Checkpoint and migration of UNIX processes in the Condor distributed processing system. Technical Report UW-CS-TR-1346, University of Wisconsin - Madison Computer Sciences Department, April 1997.
    • (1997)
    • Litzkow, M.1    Tannenbaum, T.2    Basney, J.3    Livny, M.4
  • 22
    • 85084159983 scopus 로고    scopus 로고
    • J. S. Plank, M. Beck, G. Kingsley, and K. Li. Libckpt: Transparent checkpointing under Unix. In Usenix Winter Technical Conference, pages 213-223, January 1995.
    • J. S. Plank, M. Beck, G. Kingsley, and K. Li. Libckpt: Transparent checkpointing under Unix. In Usenix Winter Technical Conference, pages 213-223, January 1995.
  • 25
    • 27844564536 scopus 로고    scopus 로고
    • Request progression interface (RPI) system services interface (SSI) modules for LAM/MPI
    • Technical Report TR579, Indiana University, Computer Science Department
    • J. M. Squyres, B. Barrett, and A. Lumsdaine. Request progression interface (RPI) system services interface (SSI) modules for LAM/MPI. Technical Report TR579, Indiana University, Computer Science Department, 2003.
    • (2003)
    • Squyres, J.M.1    Barrett, B.2    Lumsdaine, A.3
  • 26
    • 35248827046 scopus 로고    scopus 로고
    • A component architecture for lam/mpi
    • European PVM/MPI Users' Group Meeting, number in, Springer-Verlag, Sep/Oct
    • J. M. Squyres and A. Lumsdaine. A component architecture for lam/mpi. In European PVM/MPI Users' Group Meeting, number 2840 in Lecture Notes in Computer Science, pages 379-387. Springer-Verlag, Sep/Oct 2003.
    • (2003) Lecture Notes in Computer Science , vol.2840 , pp. 379-387
    • Squyres, J.M.1    Lumsdaine, A.2
  • 27
    • 0029713612 scopus 로고    scopus 로고
    • CoCheck: Checkpointing and process migration for MPI. In IEEE, editor
    • G. Stellner. CoCheck: checkpointing and process migration for MPI. In IEEE, editor, International Parallel Processing Symposium, pages 526-531, 1996.
    • (1996) International Parallel Processing Symposium , pp. 526-531
    • Stellner, G.1
  • 30
    • 80052332150 scopus 로고    scopus 로고
    • Large scale parallel structured amr calculations using the samrai framework
    • Nov
    • A. Wissink, R. Hornung, S. Kohn, and S. Smith. Large scale parallel structured amr calculations using the samrai framework. In Supercomputing, Nov. 2001.
    • (2001) Supercomputing
    • Wissink, A.1    Hornung, R.2    Kohn, S.3    Smith, S.4
  • 31
    • 85014969248 scopus 로고    scopus 로고
    • Architectural requirements and scalability of the NAS parallel benchmarks
    • F. Wong, R. Martin, R. Arpaci-Dusseau, and D. Culler. Architectural requirements and scalability of the NAS parallel benchmarks. In Supercomputing, 1999.
    • (1999) Supercomputing
    • Wong, F.1    Martin, R.2    Arpaci-Dusseau, R.3    Culler, D.4
  • 32
    • 84976846528 scopus 로고
    • A first order approximation to the optimum checkpoint interval
    • J. W. Young. A first order approximation to the optimum checkpoint interval. Commun. ACM, 17(9):530-531, 1974.
    • (1974) Commun. ACM , vol.17 , Issue.9 , pp. 530-531
    • Young, J.W.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.