메뉴 건너뛰기




Volumn , Issue , 2012, Pages

Detection and correction of silent data corruption for large-scale high-performance computing

Author keywords

[No Author keywords available]

Indexed keywords

APPLICATION DATA; CONSISTENCY PROTOCOL; FAULT INJECTOR; HIGH-END COMPUTING; HIGH-PERFORMANCE COMPUTING; MPI APPLICATIONS; SILENT DATA CORRUPTIONS; SOFT ERROR;

EID: 84877705582     PISSN: 21674329     EISSN: 21674337     Source Type: Conference Proceeding    
DOI: 10.1109/SC.2012.49     Document Type: Conference Paper
Times cited : (192)

References (37)
  • 3
    • 84858781341 scopus 로고    scopus 로고
    • Cosmic rays don't strike twice: understanding the nature of dram errors and the implications for system design
    • Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems, ser.
    • A. A. Hwang, I. A. Stefanovici, and B. Schroeder, "Cosmic rays don't strike twice: understanding the nature of dram errors and the implications for system design," in Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS '12, 2012, pp. 111-122.
    • (2012) ASPLOS '12 , pp. 111-122
    • Hwang, A.A.1    Stefanovici, I.A.2    Schroeder, B.3
  • 10
    • 0016874205 scopus 로고
    • Redundancy management technique for space shuttle computers
    • J. R. Sklaroff, "Redundancy management technique for space shuttle computers," IBM Journal of Research and Development, vol. 20, no. 1, pp. 20-28, 1976.
    • (1976) IBM Journal of Research and Development , vol.20 , Issue.1 , pp. 20-28
    • Sklaroff, J.R.1
  • 11
    • 15044363155 scopus 로고    scopus 로고
    • Robust system design with built-in soft-error resilience
    • DOI 10.1109/MC.2005.70
    • S. Mitra, N. Seifert, M. Zhang, Q. Shi, and K. S. Kim, "Robust system design with built-in soft-error resilience," Computer, vol. 38, no. 2, pp. 43-52, 2005. (Pubitemid 40377402)
    • (2005) Computer , vol.38 , Issue.2 , pp. 43-52
    • Mitra, S.1    Seifert, N.2    Zhang, M.3    Shi, Q.4    Kim, K.S.5
  • 16
    • 84877712705 scopus 로고    scopus 로고
    • Hpc landscape - Application accelerators: Deus ex machina?
    • Sep. invited Talk at
    • J. Vetter, "Hpc landscape - application accelerators: Deus ex machina?" Sep. 2009, invited Talk at High Performance Embedded Computing Workshop.
    • (2009) High Performance Embedded Computing Workshop
    • Vetter, J.1
  • 17
    • 84877697134 scopus 로고    scopus 로고
    • Simulation challenge: Exascale planning overview
    • Aug. invited Talk at
    • J. Shalf, "Simulation challenge: Exascale planning overview," Aug. 2010, invited Talk at HEC FSIO R&D Workshop.
    • (2010) HEC FSIO R&D Workshop
    • Shalf, J.1
  • 22
    • 33749067567 scopus 로고    scopus 로고
    • Berkeley Lab Checkpoint/Restart (BLCR) for Linux clusters
    • Denver, CO, USA: Institute of Physics Publishing, Bristol, UK, Jun. 25-29, [Online]. Available
    • P. H. Hargrove and J. C. Duell, "Berkeley Lab Checkpoint/Restart (BLCR) for Linux clusters," in Journal of Physics: Proceedings of the Scientific Discovery through Advanced Computing Program (SciDAC) Conference 2006, vol. 46. Denver, CO, USA: Institute of Physics Publishing, Bristol, UK, Jun. 25-29, 2006, pp. 494-499. [Online]. Available: http://www.iop.org/EJ/ article/1742-6596/46/1/067/jpconf6-46-067.pdf
    • (2006) Journal of Physics: Proceedings of the Scientific Discovery Through Advanced Computing Program (SciDAC) Conference 2006 , vol.46 , pp. 494-499
    • Hargrove, P.H.1    Duell, J.C.2
  • 26
    • 29344473319 scopus 로고    scopus 로고
    • Predicting the number of fatal soft errors in Los Alamos National Laboratory's ASC Q supercomputer
    • [Online]. Available
    • S. E. Michalak, K. W. Harris, N. W. Hengartner, B. E. Takala, and S. A. Wender, "Predicting the number of fatal soft errors in Los Alamos National Laboratory's ASC Q supercomputer," IEEE Transactions on Device and Materials Reliability (TDMR), vol. 5, no. 3, pp. 329-335, 2005. [Online]. Available: http://ieeexplore.ieee.org/xpls/abs-all.jsp?arnumber=1545893
    • (2005) IEEE Transactions on Device and Materials Reliability (TDMR) , vol.5 , Issue.3 , pp. 329-335
    • Michalak, S.E.1    Harris, K.W.2    Hengartner, N.W.3    Takala, B.E.4    Wender, S.A.5
  • 28
    • 0026404704 scopus 로고
    • Architecture of fault-tolerant computers: An historical perspective
    • [Online]. Available
    • D. P. Siemwiorek, "Architecture of fault-tolerant computers: An historical perspective," Proceedings of the IEEE, vol. 79, no. 12, pp. 1710-1734, 1991. [Online]. Available: http://dx.doi.org/10.1109/5.119549
    • (1991) Proceedings of the IEEE , vol.79 , Issue.12 , pp. 1710-1734
    • Siemwiorek, D.P.1
  • 29
    • 58149131807 scopus 로고    scopus 로고
    • DDMR: Dynamic and scalable dual modular redundancy with short validation intervals
    • [Online]. Available
    • A. Golander, S. Weiss, and R. Ronen, "DDMR: Dynamic and scalable dual modular redundancy with short validation intervals," IEEE Computer Architecture Letters, vol. 7, no. 2, pp. 65-68, 2008. [Online]. Available: http://doi.ieeecomputersociety.org/10.1109/L-CA.2008.12
    • (2008) IEEE Computer Architecture Letters , vol.7 , Issue.2 , pp. 65-68
    • Golander, A.1    Weiss, S.2    Ronen, R.3
  • 30
    • 67649255075 scopus 로고    scopus 로고
    • PLR: A software approach to transient fault tolerance for multicore architectures
    • [Online]. Available
    • A. Shye, J. Blomstedt, T. Moseley, V. J. Reddi, and D. A. Connors, "PLR: A software approach to transient fault tolerance for multicore architectures," IEEE Transactions on Dependable and Secure Computing (TDSC), vol. 6, no. 2, pp. 135-148, 2009. [Online]. Available: http://doi.ieeecomputersociety.org/10.1109/TDSC.2008.62
    • (2009) IEEE Transactions on Dependable and Secure Computing (TDSC) , vol.6 , Issue.2 , pp. 135-148
    • Shye, A.1    Blomstedt, J.2    Moseley, T.3    Reddi, V.J.4    Connors, D.A.5
  • 33
    • 78149257903 scopus 로고    scopus 로고
    • Transparent redundant computing with MPI
    • EuroMPI, ser. R. Keller, E. Gabriel, M. M. Resch, and J. Dongarra, Eds., Springer
    • R. Brightwell, K. B. Ferreira, and R. Riesen, "Transparent redundant computing with MPI," in EuroMPI, ser. Lecture Notes in Computer Science, R. Keller, E. Gabriel, M. M. Resch, and J. Dongarra, Eds., vol. 6305. Springer, 2010, pp. 208-218.
    • (2010) Lecture Notes in Computer Science , vol.6305 , pp. 208-218
    • Brightwell, R.1    Ferreira, K.B.2    Riesen, R.3
  • 35
    • 70350469329 scopus 로고    scopus 로고
    • Volpexmpi: An MPI library for execution of parallel applications on volatile nodes
    • th European PVM/MPI Users' Group Meeting (EuroPVM/MPI) 2009, Espoo, Finland: Springer Verlag, Berlin, Germany, Sep. 7-10, [Online]. Available
    • th European PVM/MPI Users' Group Meeting (EuroPVM/MPI) 2009, vol. 5759. Espoo, Finland: Springer Verlag, Berlin, Germany, Sep. 7-10, 2009, pp. 124-133. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-03770-2-19
    • (2009) Lecture Notes in Computer Science , vol.5759 , pp. 124-133
    • LeBlanc, T.1    Anand, R.2    Gabriel, E.3    Subhlok, J.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.