메뉴 건너뛰기




Volumn 7203 LNCS, Issue PART 1, 2012, Pages 700-709

HADAB: Enabling fault tolerance in parallel applications running in distributed environments

Author keywords

checkpointing; distributed environments; Fault tolerance; HPC; PETSc library

Indexed keywords

CHECK POINTING; COMPUTING NODES; COMPUTING RESOURCE; CONJUGATE GRADIENT; DISTRIBUTED COMPUTING ENVIRONMENT; DISTRIBUTED ENVIRONMENTS; DISTRIBUTED INFRASTRUCTURE; HIGH-PERFORMANCE COMPUTING; HPC; PARALLEL APPLICATION; RUNNING-IN; SCIENTIFIC SOFTWARES; TESTING PHASE; UNEXPECTED EVENTS;

EID: 84865251301     PISSN: 03029743     EISSN: 16113349     Source Type: Book Series    
DOI: 10.1007/978-3-642-31464-3_71     Document Type: Conference Paper
Times cited : (22)

References (16)
  • 1
    • 0003660984 scopus 로고    scopus 로고
    • ANL-95/11 - Revision 3.1, Argonne National Laboratory
    • Balay, S., et al.: PETSc Users Manual. ANL-95/11 - Revision 3.1, Argonne National Laboratory (2010)
    • (2010) PETSc Users Manual
    • Balay, S.1
  • 6
    • 25144486687 scopus 로고    scopus 로고
    • Super-Scalable Algorithms for Computing on 100,000 Processors
    • Sunderam, V.S., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2005. Springer, Heidelberg
    • Engelmann, C., Geist, A.: Super-Scalable Algorithms for Computing on 100,000 Processors. In: Sunderam, V.S., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2005. LNCS, vol. 3514, pp. 313-321. Springer, Heidelberg (2005)
    • (2005) LNCS , vol.3514 , pp. 313-321
    • Engelmann, C.1    Geist, A.2
  • 8
    • 77954948567 scopus 로고    scopus 로고
    • On Disk-based and Diskless Check-pointing for Parallel and Distributed Systems: An Empirical Analysis
    • Kofahi, N.A., Al-Bokhitan, S., Journal, A.A.: On Disk-based and Diskless Check-pointing for Parallel and Distributed Systems: An Empirical Analysis. Information Technology Journal 4, 367-376 (2005)
    • (2005) Information Technology Journal , vol.4 , pp. 367-376
    • Kofahi, N.A.1    Al-Bokhitan, S.2    Journal, A.A.3
  • 9
    • 24944565453 scopus 로고    scopus 로고
    • Process resurrection: A fast recovery mechanism for real-time embedded systems
    • IEEE
    • Lee, K., Sha, L.: Process resurrection: A fast recovery mechanism for real-time embedded systems. In: Real-Time and Embedded Technology and Applications Symposium, pp. 292-301. IEEE (2005)
    • (2005) Real-Time and Embedded Technology and Applications Symposium , pp. 292-301
    • Lee, K.1    Sha, L.2
  • 10
    • 36448932746 scopus 로고    scopus 로고
    • Monitoring and Migration of a PETSc-based Parallel Application for Medical Imaging in a Grid computing PSE
    • Springer
    • Murli, A., Boccia, V., Carracciuolo, L., D Amore, L., Lapegna, M.: Monitoring and Migration of a PETSc-based Parallel Application for Medical Imaging in a Grid computing PSE. In: Proceedings of IFIP 2.5 WoCo9, vol. 239, pp. 421-432. Springer (2007)
    • (2007) Proceedings of IFIP 2.5 WoCo9 , vol.239 , pp. 421-432
    • Murli, A.1    Boccia, V.2    Carracciuolo, L.3    D Amore, L.4    Lapegna, M.5
  • 11
  • 15
    • 0141682129 scopus 로고    scopus 로고
    • SRS - A Framework for Developing Malleable and Migratable Parallel Applications for Distributed Systems
    • Vadhiyar, S.S., Dongarra, J.: SRS - A Framework for Developing Malleable and Migratable Parallel Applications for Distributed Systems. In: Parallel Processing Letters, pp. 291-312 (2002)
    • (2002) Parallel Processing Letters , pp. 291-312
    • Vadhiyar, S.S.1    Dongarra, J.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.