메뉴 건너뛰기




Volumn , Issue , 2005, Pages 213-223

Fault tolerant high performance computing by a coding approach

Author keywords

Fault Tolerance; Floating Point Arithmetic Coding; High Performance Computing; Message Passing Interface

Indexed keywords

COMPUTATIONAL COMPLEXITY; ENCODING (SYMBOLS); INTERFACES (COMPUTER); PERFORMANCE;

EID: 31844451082     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1065944.1065973     Document Type: Conference Paper
Times cited : (87)

References (25)
  • 3
    • 31844452364 scopus 로고    scopus 로고
    • Recovery patterns for iterative methods in a parallel unstable environment
    • University of Tennessee, Knoxville, Tennessee, USA
    • G. Bosilca, Z. Chen, J. Dongarra, and J. Langou. Recovery patterns for iterative methods in a parallel unstable environment. Technical Report ut-cs-04-538, University of Tennessee, Knoxville, Tennessee, USA, 2004.
    • (2004) Technical Report , vol.UT-CS-04-538
    • Bosilca, G.1    Chen, Z.2    Dongarra, J.3    Langou, J.4
  • 4
    • 31844450567 scopus 로고    scopus 로고
    • Condition numbers of gaussian random matrices
    • University of Tennessee, Knoxville, Tennessee, USA
    • Z. Chen and J. Dongarra. Condition numbers of gaussian random matrices. Technical Report ut-cs-04-539, University of Tennessee, Knoxville, Tennessee, USA, 2004.
    • (2004) Technical Report , vol.UT-CS-04-539
    • Chen, Z.1    Dongarra, J.2
  • 5
    • 0242658775 scopus 로고    scopus 로고
    • Self-adapting software for numerical linear algebra and LAPACK for clusters
    • November-December
    • Z. Chen, J. Dongarra, P. Luszczek, and K. Roche. Self-adapting software for numerical linear algebra and LAPACK for clusters. Parallel Computing, 29(11-12):1723-1743, November-December 2003.
    • (2003) Parallel Computing , vol.29 , Issue.11-12 , pp. 1723-1743
    • Chen, Z.1    Dongarra, J.2    Luszczek, P.3    Roche, K.4
  • 6
    • 0029715009 scopus 로고    scopus 로고
    • Evaluation of checkpoint mechanisms for massively parallel machines
    • T. cker Chiueh and P. Deng. Evaluation of checkpoint mechanisms for massively parallel machines. In FTCS, pages 370-379, 1996.
    • (1996) FTCS , pp. 370-379
    • Chiueh, T.C.1    Deng, P.2
  • 8
    • 0000324960 scopus 로고
    • Eigenvalues and condition numbers of random matrices
    • A. Edelman. Eigenvalues and condition numbers of random matrices. SIAM J. Matrix Anal. Appl., 9(4):543-560, 1988.
    • (1988) SIAM J. Matrix Anal. Appl. , vol.9 , Issue.4 , pp. 543-560
    • Edelman, A.1
  • 9
    • 84940567900 scopus 로고    scopus 로고
    • FT-MPI: Fault tolerant MPI, supporting dynamic applications in a dynamic world
    • G. E. Fagg and J. Dongarra. FT-MPI: Fault tolerant MPI, supporting dynamic applications in a dynamic world. In PVM/MPI 2000, pages 346-353, 2000.
    • (2000) PVM/MPI 2000 , pp. 346-353
    • Fagg, G.E.1    Dongarra, J.2
  • 12
    • 12444258147 scopus 로고    scopus 로고
    • Development of naturally fault tolerant algortihms for computing on 100,000 processors
    • Submited to
    • A. Geist and C. Engelmann. Development of naturally fault tolerant algortihms for computing on 100,000 processors. Submited to J. Parallel Distrib. Comput., 2002.
    • (2002) J. Parallel Distrib. Comput.
    • Geist, A.1    Engelmann, C.2
  • 13
    • 0018454850 scopus 로고
    • On the optimum checkpoint interval
    • E. Gelenbe. On the optimum checkpoint interval. J. ACM, 26(2):259-270, 1979.
    • (1979) J. ACM , vol.26 , Issue.2 , pp. 259-270
    • Gelenbe, E.1
  • 14
    • 0030243005 scopus 로고    scopus 로고
    • A high-performance, portable implementation of the MPI message passing interface standard
    • September
    • W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A high-performance, portable implementation of the MPI message passing interface standard. Parallel Computing, 22(6):789-828, September 1996.
    • (1996) Parallel Computing , vol.22 , Issue.6 , pp. 789-828
    • Gropp, W.1    Lusk, E.2    Doss, N.3    Skjellum, A.4
  • 17
    • 0003413671 scopus 로고
    • Message passing interface forum. MPI: A message passing interface standard
    • University of Tennessee, Knoxville, Tennessee, USA
    • Message Passing Interface Forum. MPI: A Message Passing Interface Standard. Technical Report ut-cs-94-230, University of Tennessee, Knoxville, Tennessee, USA, 1994.
    • (1994) Technical Report , vol.UT-CS-94-230
  • 18
    • 0031223146 scopus 로고    scopus 로고
    • A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems
    • September
    • J. S. Plank. A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems. Software - Practice & Experience. 27(9):995-1012, September 1997.
    • (1997) Software - Practice & Experience , vol.27 , Issue.9 , pp. 995-1012
    • Plank, J.S.1
  • 19
    • 0031570636 scopus 로고    scopus 로고
    • Fault-tolerant matrix operations for networks of workstations using diskless checkpointing
    • J. S. Plank, Y. Kim, and J. Dongarra. Fault-tolerant matrix operations for networks of workstations using diskless checkpointing. J. Parallel Distrib. Comput., 43(2):125-138, 1997.
    • (1997) J. Parallel Distrib. Comput. , vol.43 , Issue.2 , pp. 125-138
    • Plank, J.S.1    Kim, Y.2    Dongarra, J.3
  • 20
    • 0028060943 scopus 로고
    • Faster checkpointing with n+1 parity
    • J. S. Plank and K. Li. Faster checkpointing with n+1 parity. In FTCS, pages 288-297, 1994.
    • (1994) FTCS , pp. 288-297
    • Plank, J.S.1    Li, K.2
  • 22
    • 0035201417 scopus 로고    scopus 로고
    • Processor allocation and checkpoint interval selection in cluster computing systems
    • November
    • J. S. Plank and M. G. Thomason. Processor allocation and checkpoint interval selection in cluster computing systems. J. Parallel Distrib. Comput., 61(11):1570-1590, November 2001.
    • (2001) J. Parallel Distrib. Comput. , vol.61 , Issue.11 , pp. 1570-1590
    • Plank, J.S.1    Thomason, M.G.2
  • 23
    • 84864756973 scopus 로고    scopus 로고
    • An experimental study about diskless checkpointing
    • L. M. Silva and J. G. Silva. An experimental study about diskless checkpointing. In EUROMICRO'98. pages 395-402, 1998.
    • (1998) EUROMICRO'98 , pp. 395-402
    • Silva, L.M.1    Silva, J.G.2
  • 24
    • 0345442370 scopus 로고    scopus 로고
    • A case for two-level recovery schemes
    • N. H. Vaidya. A case for two-level recovery schemes. IEEE Trans. Computers, 47(6):656-666, 1998.
    • (1998) IEEE Trans. Computers , vol.47 , Issue.6 , pp. 656-666
    • Vaidya, N.H.1
  • 25
    • 84976846528 scopus 로고
    • A first order approximation to the optimal checkpoint interval
    • J. W. Young. A first order approximation to the optimal checkpoint interval. Commun. ACM, 17(9):530-531, 1974.
    • (1974) Commun. ACM , vol.17 , Issue.9 , pp. 530-531
    • Young, J.W.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.