메뉴 건너뛰기




Volumn , Issue , 2011, Pages 162-171

High performance linpack benchmark: A fault tolerant implementation without checkpointing

Author keywords

algorithm based recovery; fault tolerance; high performance linpack benchmark; lu factorization

Indexed keywords

ALGORITHM-BASED RECOVERY; CHECK POINTING; CHECKSUM; EXTREME SCALE; FAIL-STOP FAILURES; FAULT-TOLERANT; HIGH PERFORMANCE LINPACK; HIGH PERFORMANCE LINPACK BENCHMARKS; HIGH-PERFORMANCE COMPUTING APPLICATIONS; LONG-RUNNING APPLICATIONS; LU FACTORIZATION; MATRIX; PROCESS FAILURE; RECOVERY SCHEME; ROLL BACK; TOTAL LOSS;

EID: 79959586938     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1995896.1995923     Document Type: Conference Paper
Times cited : (84)

References (29)
  • 1
    • 0024142081 scopus 로고
    • A linear algebraic model of algorithm-based fault tolerance
    • December
    • C. J. Anfinson and F. T. Luk. A linear algebraic model of algorithm-based fault tolerance. IEEE Transactions on Computers, 37(12), December 1988.
    • (1988) IEEE Transactions on Computers , vol.37 , Issue.12
    • Anfinson, C.J.1    Luk, F.T.2
  • 2
    • 79959599577 scopus 로고    scopus 로고
    • Bounds on algorithm-based fault tolerance in multiple processor systems
    • P. Banerjee and J. Abraham. Bounds on algorithm-based fault tolerance in multiple processor systems. IEEE Transactions on Computers, 2006.
    • (2006) IEEE Transactions on Computers
    • Banerjee, P.1    Abraham, J.2
  • 4
    • 68249127079 scopus 로고    scopus 로고
    • Fault tolerance in petascale/ exascale systems: Current knowledge, challenges and research opportunities
    • August
    • F. Cappello. Fault tolerance in petascale/ exascale systems: Current knowledge, challenges and research opportunities. International Journal of High Performance Computing Applications, 23(3), August 2009.
    • (2009) International Journal of High Performance Computing Applications , vol.23 , Issue.3
    • Cappello, F.1
  • 9
  • 12
    • 36949009638 scopus 로고    scopus 로고
    • PhD thesis, Univ. of Illinois Computer Science in the Graduate College of the University of Illinois at Urbana-Champaign
    • C. da Lu. Scalable diskless checkpointing for large parallel systems. PhD thesis, Univ. of Illinois Computer Science in the Graduate College of the University of Illinois at Urbana-Champaign, 2005.
    • (2005) Scalable Diskless Checkpointing for Large Parallel Systems
    • Da Lu, C.1
  • 13
    • 28044460018 scopus 로고    scopus 로고
    • A higher order estimate of the optimum checkpoint interval for restart dumps
    • DOI 10.1016/j.future.2004.11.016, PII S0167739X04002213
    • J. Daly. A higher order estimate of the optimum checkpoint interval for restart dumps. Future Generation Computer Systems, 22(3):303-312, 2006. (Pubitemid 41689812)
    • (2006) Future Generation Computer Systems , vol.22 , Issue.3 , pp. 303-312
    • Daly, J.T.1
  • 14
    • 58349092078 scopus 로고    scopus 로고
    • Failure tolerance in petascale computers
    • November
    • G. A. Gibson, B. Schroeder, and J. Digney. Failure tolerance in petascale computers. CTWatchQuarterly, 3(4), November 2007.
    • (2007) CTWatchQuarterly , vol.3 , Issue.4
    • Gibson, G.A.1    Schroeder, B.2    Digney, J.3
  • 18
    • 0021439162 scopus 로고
    • Algorithm-based fault tolerance for matrix operations
    • K.-H. Huang and J. A. Abraham. Algorithm-based fault tolerance for matrix operations. IEEE Transactions on Computers, C-33:518-528, 1984.
    • (1984) IEEE Transactions on Computers , vol.C-33 , pp. 518-528
    • Huang, K.-H.1    Abraham, J.A.2
  • 19
    • 0022721936 scopus 로고
    • Fault-tolerant matrix arithmetic and signal processing on highly concurrent computing structures
    • May
    • J. Jou and J. Abraham. Fault-tolerant matrix arithmetic and signal processing on highly concurrent computing structures. In Proceedings of the IEEE, volume 74, May 1986.
    • (1986) Proceedings of the IEEE , vol.74
    • Jou, J.1    Abraham, J.2
  • 21
    • 0023995880 scopus 로고
    • ANALYSIS OF ALGORITHM-BASED FAULT TOLERANCE TECHNIQUES.
    • DOI 10.1016/0743-7315(88)90027-5
    • F. T. Luk and H. Park. An analysis of algorithm-based fault tolerance techniques. Journal of Parallel and Distributed Computing, 5(2):172-184, 1988. (Pubitemid 18589858)
    • (1988) Journal of Parallel and Distributed Computing , vol.5 , Issue.2 , pp. 172-184
    • Luk, F.T.1    Park, H.2
  • 24
    • 0031570636 scopus 로고    scopus 로고
    • Fault-tolerant matrix operations for networks of workstations using diskless checkpointing
    • DOI 10.1006/jpdc.1997.1336, PII S0743731597913368
    • J. S. Plank, Y. Kim, and J. Dongarra. Fault tolerant matrix operations for networks of workstations using diskless checkpointing. IEEE Journal of Parallel and Distributed Computing, 43:125-138, 1997. (Pubitemid 127171409)
    • (1997) Journal of Parallel and Distributed Computing , vol.43 , Issue.2 , pp. 125-138
    • Plank, J.S.1    Kim, Y.2    Dongarra, J.J.3
  • 29
    • 84976846528 scopus 로고
    • A first order approximation to the optimum checkpoint interval
    • September
    • J. W. Young. A first order approximation to the optimum checkpoint interval. Commun. ACM, 17:530-531, September 1974.
    • (1974) Commun. ACM , vol.17 , pp. 530-531
    • Young, J.W.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.