-
2
-
-
85080637457
-
-
NVIDiA CUDA-GDB Documentation Online; accessed Apr. 2016
-
NVIDiA CUDA-GDB Documentation. http://docs. nvidia.com/cuda/cuda-gdb/#axzz45I7ljdgy. [Online; accessed Apr. 2016].
-
-
-
-
3
-
-
85080762939
-
-
NVIDiA Multi-GPU Programming Online; accessed Apr. 2016
-
NVIDiA Multi-GPU Programming. http://www.nvidia.com/docs/IO/116711/sc11-multi-gpu. pdf. [Online; accessed Apr. 2016].
-
-
-
-
4
-
-
85080712082
-
-
NVIDiA SDK Samples Online; accessed Apr. 2016
-
NVIDiA SDK Samples. http://docs. nvidia.com/gameworks/content/artisttools/hairworks/HairWorks sdkSamples.html. [Online; accessed Apr. 2016].
-
-
-
-
5
-
-
85080761546
-
-
Online; accessed Apr. 2016
-
Parallel circuit solver. https://github.com/glaswep/hpc. [Online; accessed Apr. 2016].
-
Parallel Circuit Solver
-
-
-
6
-
-
84966600338
-
Understanding the propagation of transient errors in HPC applications
-
IEEE
-
R. Ashraf, R. Gioiosa, G. Kestor, R. DeMara, C.-Y. Cher, P. Bose. Understanding the propagation of transient errors in HPC applications. In International Conference for High Performance Computing, Networking, Storage and Analysis(SC). IEEE, 2015.
-
(2015)
International Conference for High Performance Computing, Networking, Storage and Analysis(SC)
-
-
Ashraf, R.1
Gioiosa, R.2
Kestor, G.3
DeMara, R.4
Cher, C.-Y.5
Bose, P.6
-
7
-
-
84886379889
-
A study of the impact of single bit-flip and double bit-flip errors on program execution
-
Springer
-
F. Ayatolahi, B. Sangchoolie, R. Johansson, J. Karlsson. A study of the impact of single bit-flip and double bit-flip errors on program execution. In Computer Safety, Reliability, Security, pages 265-276. Springer, 2013.
-
(2013)
Computer Safety, Reliability, Security
, pp. 265-276
-
-
Ayatolahi, F.1
Sangchoolie, B.2
Johansson, R.3
Karlsson, J.4
-
11
-
-
70649092154
-
Rodinia: A benchmark suite for heterogeneous computing
-
IEEE
-
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In International Symposium onWorkload Characterization (IISWC 2009), pages 44-54. IEEE, 2009.
-
(2009)
International Symposium OnWorkload Characterization (IISWC 2009
, pp. 44-54
-
-
Che, S.1
Boyer, M.2
Meng, J.3
Tarjan, D.4
Sheaffer, J.W.5
Lee, S.-H.6
Skadron, K.7
-
12
-
-
84983113593
-
Understanding soft error resiliency of Blue Gene/Q compute chip through hardware proton irradiation and software fault injection
-
IEEE, November
-
C.-Y. Cher, M. S. Gupta, P. Bose, K. P. Muller. Understanding soft error resiliency of Blue Gene/Q compute chip through hardware proton irradiation and software fault injection. In Proceedings of the 2014 International Conference for High Performance Computing, Networking, Storage and Analysis (SC). IEEE, November 2014.
-
(2014)
Proceedings of the 2014 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)
-
-
Cher, C.-Y.1
Gupta, M.S.2
Bose, P.3
Muller, K.P.4
-
13
-
-
84879873377
-
Quantitative evaluation of soft error injection techniques for robust system design
-
IEEE
-
H. Cho, S. Mirkhani, C.-Y. Cher, J. A. Abraham, S. Mitra. Quantitative evaluation of soft error injection techniques for robust system design. In ACM/EDAC/IEEE Design Automation Conference (DAC), pages 1-10. IEEE, 2013.
-
(2013)
ACM/EDAC/IEEE Design Automation Conference (DAC
, pp. 1-10
-
-
Cho, H.1
Mirkhani, S.2
Cher, C.-Y.3
Abraham, J.A.4
Mitra, S.5
-
14
-
-
67650305371
-
Intermittent faults and effects on reliability of integrated circuits
-
C. Constantinescu IEEE
-
C. Constantinescu. Intermittent faults and effects on reliability of integrated circuits. In Reliability and Maintainability Symposium, page 370. IEEE, 2008.
-
(2008)
Reliability and Maintainability Symposium
, pp. 370
-
-
-
15
-
-
53349095714
-
A characterization of instruction-level error derating and its implications for error detection
-
IEEE
-
J. J. Cook and C. Zilles. A characterization of instruction-level error derating and its implications for error detection. In International Conference on Dependable Systems and Networks(DSN), pages 482-491. IEEE, 2008.
-
(2008)
International Conference On Dependable Systems and Networks(DSN
, pp. 482-491
-
-
Cook, J.J.1
Zilles, C.2
-
16
-
-
84936944941
-
A system software approach to proactive memory-error avoidance
-
IEEE
-
C. H. A. Costa, Y. Park, B. S. Rosenburg, C.-Y. Cher, K. D. Ryu. A system software approach to proactive memory-error avoidance. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC). IEEE, 2014.
-
(2014)
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC)
-
-
Costa, C.H.A.1
Park, Y.2
Rosenburg, B.S.3
Cher, C.-Y.4
Ryu, K.D.5
-
18
-
-
84950112790
-
Measuring and understanding extreme-scale application resilience: A field study of 5, 000, 000 HPC application runs
-
IEEE
-
C. Di Martino, W. Kramer, Z. Kalbarczyk, R. Iyer. Measuring and understanding extreme-scale application resilience: A field study of 5, 000, 000 HPC application runs. In IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pages 25-36. IEEE, 2015.
-
(2015)
IEEE/IFIP International Conference On Dependable Systems and Networks (DSN
, pp. 25-36
-
-
Di Martino, C.1
Kramer, W.2
Kalbarczyk, Z.3
Iyer, R.4
-
20
-
-
84904465883
-
Gpu-qin: A methodology for evaluating the error resilience of GPGPU applications
-
IEEE
-
B. Fang, K. Pattabiraman, M. Ripeanu, S. Gurumurthi. Gpu-qin: A methodology for evaluating the error resilience of GPGPU applications. In International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 221-230. IEEE, 2014.
-
(2014)
International Symposium On Performance Analysis of Systems and Software (ISPASS
, pp. 221-230
-
-
Fang, B.1
Pattabiraman, K.2
Ripeanu, M.3
Gurumurthi, S.4
-
21
-
-
77949759608
-
Shoestring: Probabilistic soft error reliability on the cheap
-
ACM
-
S. Feng, S. Gupta, A. Ansari, S. Mahlke. Shoestring: Probabilistic soft error reliability on the cheap. In ACM SIGARCH Computer Architecture News, volume 38, page 385. ACM, 2010.
-
(2010)
ACM SIGARCH Computer Architecture News
, vol.38
, pp. 385
-
-
Feng, S.1
Gupta, S.2
Ansari, A.3
Mahlke, S.4
-
22
-
-
85040192040
-
How to kill a supercomputer: Dirty power, cosmic rays, bad solder
-
A. Geist. How to kill a supercomputer: Dirty power, cosmic rays, bad solder. IEEE Spectrum, 2016.
-
(2016)
IEEE Spectrum
-
-
Geist, A.1
-
23
-
-
33845434226
-
Transparent incremental checkpointing at kernel level: A foundation for fault tolerance for parallel computers
-
ACM/IEEE, Nov 2005
-
R. Gioiosa, J. C. Sancho, S. Jiang, F. Petrini. Transparent, incremental checkpointing at kernel level: A foundation for fault tolerance for parallel computers. In Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005, pages 9-9. ACM/IEEE, Nov 2005.
-
(2005)
Proceedings of the ACM/IEEE SC2005 Conference On High Performance Networking and Computing
, pp. 9
-
-
Gioiosa, R.1
Sancho, J.C.2
Jiang, S.3
Petrini, F.4
-
24
-
-
1542359963
-
Characterization of linux kernel behavior under errors
-
IEEE
-
W. Gu, Z. Kalbarczyk, R. K. Iyer, Z. Yang. Characterization of linux kernel behavior under errors. In International Conference on Dependable Systems and Networks(DSN), page 459. IEEE, 2003.
-
(2003)
International Conference On Dependable Systems and Networks(DSN
, pp. 459
-
-
Gu, W.1
Kalbarczyk, Z.2
Iyer, R.K.3
Yang, Z.4
-
26
-
-
84960137667
-
Sassifi: Evaluating resilience of GPU applications
-
IEEE
-
S. Hari, T. Tsai, M. Stephenson, S. Keckler, J. Emer. Sassifi: Evaluating resilience of GPU applications. In SELSE: IEEE Workshop of Silicon Errors in Logic. IEEE, 2015.
-
(2015)
SELSE: IEEE Workshop of Silicon Errors in Logic
-
-
Hari, S.1
Tsai, T.2
Stephenson, M.3
Keckler, S.4
Emer, J.5
-
27
-
-
84858759524
-
Relyzer: Exploiting application-level fault equivalence to analyze application resiliency to transient faults
-
ACM
-
S. K. S. Hari, S. V. Adve, H. Naeimi, P. Ramachandran. Relyzer: Exploiting application-level fault equivalence to analyze application resiliency to transient faults. In ACM SIGARCH Computer Architecture News, volume 40, page 123. ACM, 2012.
-
(2012)
ACM SIGARCH Computer Architecture News
, vol.40
, pp. 123
-
-
Hari, S.K.S.1
Adve, S.V.2
Naeimi, H.3
Ramachandran, P.4
-
28
-
-
0036980151
-
Propane: An environment for examining the propagation of errors in software
-
ACM
-
M. Hiller, A. Jhumka, N. Suri. Propane: An environment for examining the propagation of errors in software. In ACM SIGSOFT Software Engineering Notes, volume 27, pages 81-85. ACM, 2002.
-
(2002)
ACM SIGSOFT Software Engineering Notes
, vol.27
, pp. 81-85
-
-
Hiller, M.1
Jhumka, A.2
Suri, N.3
-
29
-
-
0021439162
-
Algorithm-based fault tolerance for matrix operations
-
K.-H. Huang and J. A. Abraham. Algorithm-based fault tolerance for matrix operations. Computers, IEEE Transactions on, 100(6):518-528, 1984.
-
(1984)
Computers, IEEE Transactions On
, vol.100
, Issue.6
, pp. 518-528
-
-
Huang, K.-H.1
Abraham, J.A.2
-
33
-
-
84899697732
-
-
Lawrence Livermore National Laboratory (LLNL), Livermore, CA, Tech. Rep
-
I. Karlin, A. Bhatele, B. L. Chamberlain, J. Cohen, Z. Devito, M. Gokhale, R. Haque, R. Hornung, J. Keasler, D. Laney, et al. Lulesh programming model and performance ports overview. Lawrence Livermore National Laboratory (LLNL), Livermore, CA, Tech. Rep, 2012.
-
(2012)
Lulesh Programming Model and Performance Ports Overview
-
-
Karlin, I.1
Bhatele, A.2
Chamberlain, B.L.3
Cohen, J.4
Devito, Z.5
Gokhale, M.6
Haque, R.7
Hornung, R.8
Keasler, J.9
Laney, D.10
-
37
-
-
84962094360
-
LLFI: An intermediate code level fault injector for hardware faults
-
IEEE
-
Q. Lu, M. Farahani, J. Wei, A. Thomas, K. Pattabiraman. LLFI: An intermediate code level fault injector for hardware faults. In International Conference on Quality, Reliability and Security (QRS). IEEE, 2015.
-
(2015)
International Conference On Quality, Reliability and Security (QRS)
-
-
Lu, Q.1
Farahani, M.2
Wei, J.3
Thomas, A.4
Pattabiraman, K.5
-
38
-
-
85116177305
-
SDCTune: A model for predicting the SDC proneness of an application for configurable protection
-
ACM
-
Q. Lu, K. Pattabiraman, M. S. Gupta, J. A. Rivers. SDCTune: A model for predicting the SDC proneness of an application for configurable protection. In International Conference on Compilers, Architecture and Synthesis for Embedded Systems, page 23. ACM, 2014.
-
(2014)
International Conference On Compilers, Architecture and Synthesis for Embedded Systems
, pp. 23
-
-
Lu, Q.1
Pattabiraman, K.2
Gupta, M.S.3
Rivers, J.A.4
-
39
-
-
78650831692
-
Design modeling, evaluation of a scalable multi-level checkpointing system
-
IEEE
-
A. Moody, G. Bronevetsky, K. Mohror, B. R. de Supinski. Design, modeling, evaluation of a scalable multi-level checkpointing system. In Proceedings of the 2010 International Conference for High Performance Computing, Networking, Storage and Analysis (SC). IEEE, 2010.
-
(2010)
Proceedings of the 2010 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)
-
-
Moody, A.1
Bronevetsky, G.2
Mohror, K.3
De Supinski, B.R.4
-
40
-
-
84872039034
-
-
IEEE
-
R. Natella, D. Cotroneo, J. Duraes, H. S. Madeira, et al. On fault representativeness of software fault injection. volume 39, pages 80-96. IEEE, 2013.
-
(2013)
On Fault Representativeness of Software Fault Injection
, vol.39
, pp. 80-96
-
-
Natella, R.1
Cotroneo, D.2
Duraes, J.3
Madeira, H.S.4
-
41
-
-
0036507790
-
Error detection by duplicated instructions in super-scalar processors
-
N. Oh, P. P. Shirvani, E. J. McCluskey. Error detection by duplicated instructions in super-scalar processors. IEEE Transactions on Reliability, 51(1):63-75, 2002.
-
(2002)
IEEE Transactions On Reliability
, vol.51
, Issue.1
, pp. 63-75
-
-
Oh, N.1
Shirvani, P.P.2
McCluskey, E.J.3
-
42
-
-
79960509775
-
Automated derivation of application-specific error detectors using dynamic analysis
-
K. Pattabiraman, G. P. Saggese, D. Chen, Z. Kalbarczyk, R. K. Iyer. Automated derivation of application-specific error detectors using dynamic analysis. IEEE Transactions on Dependable and Secure Computing, 8(5):640-655, 2011.
-
(2011)
IEEE Transactions On Dependable and Secure Computing
, vol.8
, Issue.5
, pp. 640-655
-
-
Pattabiraman, K.1
Saggese, G.P.2
Chen, D.3
Kalbarczyk, Z.4
Iyer, R.K.5
-
43
-
-
84966487041
-
Vocl-ft introducing techniques for efficient soft error coprocessor recovery
-
ACM
-
A. J. Peña, W. Bland, P. Balaji. Vocl-ft introducing techniques for efficient soft error coprocessor recovery. In International Conference for High Performance Computing, Networking, Storage and Analysis(SC), page 71. ACM, 2015.
-
(2015)
International Conference for High Performance Computing, Networking, Storage and Analysis(SC
, pp. 71
-
-
Peña, A.J.1
Bland, W.2
Balaji, P.3
-
46
-
-
84905657146
-
Towards formal approaches to system resilience
-
IEEE
-
V. C. Sharma, A. Haran, Z. Rakamaric, G. Gopalakrishnan. Towards formal approaches to system resilience. In Pacific Rim International Symposium on Dependable Computing (PRDC), pages 41-50. IEEE, 2013.
-
(2013)
Pacific Rim International Symposium On Dependable Computing (PRDC
, pp. 41-50
-
-
Sharma, V.C.1
Haran, A.2
Rakamaric, Z.3
Gopalakrishnan, G.4
-
47
-
-
84873470137
-
-
Center for Reliable and High-Performance Computing
-
J. A. Stratton, C. Rodrigues, I.-J. Sung, N. Obeid, L.-W. Chang, N. Anssari, G. D. Liu, W.-m. W. Hwu. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Center for Reliable and High-Performance Computing, 2012.
-
(2012)
Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing
-
-
Stratton, J.A.1
Rodrigues, C.2
Sung, I.-J.3
Obeid, N.4
Chang, L.-W.5
Anssari, N.6
Liu, G.D.7
Hwu, W.-M.W.8
-
48
-
-
84862974517
-
Analyzing soft-error vulnerability on GPGPU microarchitecture
-
IEEE
-
J. Tan, N. Goswami, T. Li, X. Fu. Analyzing soft-error vulnerability on GPGPU microarchitecture. In IEEE International Symposium on Workload Characterization (IISWC), pages 226-235. IEEE, 2011.
-
(2011)
IEEE International Symposium On Workload Characterization (IISWC
, pp. 226-235
-
-
Tan, J.1
Goswami, N.2
Li, T.3
Fu, X.4
-
49
-
-
84905509992
-
Enabling preemptive multiprogramming on GPUs
-
IEEE Press
-
I. Tanasic, I. Gelado, J. Cabezas, A. Ramirez, N. Navarro, M. Valero. Enabling preemptive multiprogramming on GPUs. In ACM SIGARCH Computer Architecture News, volume 42, pages 193-204. IEEE Press, 2014.
-
(2014)
ACM SIGARCH Computer Architecture News
, vol.42
, pp. 193-204
-
-
Tanasic, I.1
Gelado, I.2
Cabezas, J.3
Ramirez, A.4
Navarro, N.5
Valero, M.6
-
51
-
-
84934300860
-
Understanding GPU errors on large-scale HPC systems and the implications for system design and operation
-
IEEE
-
D. Tiwari, S. Gupta, J. Rogers, D. Maxwell, P. Rech, S. Vazhkudai, D. Oliveira, D. Londo, N. DeBardeleben, P. Navaux, et al. Understanding GPU errors on large-scale HPC systems and the implications for system design and operation. In International Symposium on High Performance Computer Architecture (HPCA), pages 331-342. IEEE, 2015.
-
(2015)
International Symposium On High Performance Computer Architecture (HPCA
, pp. 331-342
-
-
Tiwari, D.1
Gupta, S.2
Rogers, J.3
Maxwell, D.4
Rech, P.5
Vazhkudai, S.6
Oliveira, D.7
Londo, D.8
De Bardeleben, N.9
Navaux, P.10
-
54
-
-
80053254113
-
Hauberk: Lightweight silent data corruption error detector for GPGPU
-
IEEE
-
K. S. Yim, C. Pham, M. Saleheen, Z. Kalbarczyk, R. Iyer. Hauberk: Lightweight silent data corruption error detector for GPGPU. In International Parallel, Distributed Processing Symposium (IPDPS), page 287. IEEE, 2011.
-
(2011)
International Parallel, Distributed Processing Symposium (IPDPS
, pp. 287
-
-
Yim, K.S.1
Pham, C.2
Saleheen, M.3
Kalbarczyk, Z.4
Iyer, R.5
-
55
-
-
84938823095
-
High performance computing of fiber scattering simulation
-
ACM
-
L. Yu, Y. Zhang, X. Gong, N. Roy, L. Makowski, D. Kaeli. High performance computing of fiber scattering simulation. In Proceedings of the 8th Workshop on General Purpose Processing using GPUs, pages 90-98. ACM, 2015.
-
(2015)
Proceedings of the 8th Workshop On General Purpose Processing Using GPUs
, pp. 90-98
-
-
Yu, L.1
Zhang, Y.2
Gong, X.3
Roy, N.4
Makowski, L.5
Kaeli, D.6
|