-
1
-
-
79951595196
-
The international exascale software project roadmap
-
February
-
J. Dongarra, P. Beckman, T. Moore, and et al., "The International Exascale Software Project Roadmap," International Journal on High Performance Computing Applications, pp. 3-60, February 2011.
-
(2011)
International Journal on High Performance Computing Applications
, pp. 3-60
-
-
Dongarra, J.1
Beckman, P.2
Moore, T.3
-
2
-
-
70450206305
-
Toward exascale resilience
-
F. Cappello, A. Geist, B. Gropp, L. Kale, B. Kramer, and M. Snir, "Toward Exascale Resilience," International Journal of High Performance Computing Applications, vol. 23, no. 4, pp. 374-388, 2009.
-
(2009)
International Journal of High Performance Computing Applications
, vol.23
, Issue.4
, pp. 374-388
-
-
Cappello, F.1
Geist, A.2
Gropp, B.3
Kale, L.4
Kramer, B.5
Snir, M.6
-
3
-
-
0021439162
-
Algorithm-based fault tolerance for matrix operations
-
june
-
K.-H. Huang and J. Abraham, "Algorithm-based fault tolerance for matrix operations," IEEE Transactions on Computers, vol. C-33, no. 6, pp. 518-528, june 1984.
-
(1984)
IEEE Transactions on Computers
, vol.C-33
, Issue.6
, pp. 518-528
-
-
Huang, K.-H.1
Abraham, J.2
-
4
-
-
84924433394
-
-
CoRR
-
G. Bosilca, R. Delmas, J. Dongarra, and J. Langou, "Algorithmic Based Fault Tolerance Applied to High Performance Computing," CoRR, 2008.
-
(2008)
Algorithmic Based Fault Tolerance Applied to High Performance Computing
-
-
Bosilca, G.1
Delmas, R.2
Dongarra, J.3
Langou, J.4
-
5
-
-
84877705582
-
Detection and correction of silent data corruption for large-scale high-performance computing
-
D. Fiala, F. Mueller, C. Engelmann, and et al., "Detection and Correction of Silent Data Corruption for Large-scale High-Performance Computing," in The International Conference on High Performance Computing, Networking, Storage and Analysis, 2012, pp. 78:1-78:12.
-
(2012)
The International Conference on High Performance Computing, Networking, Storage and Analysis
, pp. 781-7812
-
-
Fiala, D.1
Mueller, F.2
Engelmann, C.3
-
6
-
-
84958521395
-
A case for adaptive redundancy for HPC resilience
-
S. Hukerikar, P. Diniz, and R. Lucas, "A Case for Adaptive Redundancy for HPC Resilience," in Euro-Par 2013: Parallel Processing Workshops, ser. Lecture Notes in Computer Science, 2014, pp. 690-697.
-
(2014)
Euro-Par 2013: Parallel Processing Workshops, Ser. Lecture Notes in Computer Science
, pp. 690-697
-
-
Hukerikar, S.1
Diniz, P.2
Lucas, R.3
-
7
-
-
84908614184
-
Opportunistic Application-level Fault Detection through Adaptive Redundant Multithreading
-
July
-
S. Hukerikar, K. Teranishi, P. C. Diniz, and R. F. Lucas, "Opportunistic Application-level Fault Detection through Adaptive Redundant Multithreading," in Proceedings of the International Conference on High Performance Computing & Simulation, July 2014.
-
(2014)
Proceedings of the International Conference on High Performance Computing & Simulation
-
-
Hukerikar, S.1
Teranishi, K.2
Diniz, P.C.3
Lucas, R.F.4
-
8
-
-
0036287327
-
Detailed design and evaluation of redundant multithreading alternatives
-
May
-
S. S. Mukherjee, M. Kontz, and S. K. Reinhardt, "Detailed Design and Evaluation of Redundant Multithreading Alternatives," SIGARCH Computer Architecture News, pp. 99-110, May 2002.
-
(2002)
SIGARCH Computer Architecture News
, pp. 99-110
-
-
Mukherjee, S.S.1
Kontz, M.2
Reinhardt, S.K.3
-
10
-
-
84885606910
-
-
"Rose Compiler," http://www. rosecompiler. org.
-
Rose Compiler
-
-
-
12
-
-
0032667728
-
IBM's S/390 G5 microprocessor design
-
T. Slegel, I. Averill, R. M., M. Check, and et. al, "IBM's S/390 G5 Microprocessor Design," Micro, IEEE, pp. 12-23, 1999.
-
(1999)
Micro, IEEE
, pp. 12-23
-
-
Slegel, T.1
Averill, R.M.I.2
Check, M.3
-
13
-
-
0036290674
-
Transient-fault recovery using simultaneous multithreading
-
T. Vijaykumar, I. Pomeranz, and K. Cheng, "Transient-Fault Recovery using Simultaneous Multithreading," in 29th Annual International Symposium on Computer Architecture, 2002, 2002, pp. 87-98.
-
(2002)
29th Annual International Symposium on Computer Architecture, 2002
, pp. 87-98
-
-
Vijaykumar, T.1
Pomeranz, I.2
Cheng, K.3
-
15
-
-
34547434242
-
Slick: Slicebased locality exploitation for efficient redundant multithreading
-
A. Parashar, A. Sivasubramaniam, and S. Gurumurthi, "Slick: Slicebased locality exploitation for efficient redundant multithreading," in Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006, pp. 95-105.
-
(2006)
Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems
, pp. 95-105
-
-
Parashar, A.1
Sivasubramaniam, A.2
Gurumurthi, S.3
-
16
-
-
78149269828
-
DAFT: Decoupled acyclic fault tolerance
-
Y. Zhang, J. W. Lee, N. P. Johnson, and D. I. August, "DAFT: Decoupled Acyclic Fault Tolerance," in Proceedings of the 19th international conference on Parallel architectures and compilation techniques, ser. PACT '10, 2010, pp. 87-98.
-
(2010)
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, Ser. PACT '10
, pp. 87-98
-
-
Zhang, Y.1
Lee, J.W.2
Johnson, N.P.3
August, D.I.4
-
17
-
-
33646829087
-
SWIFT: Software implemented fault tolerance
-
G. Reis, J. Chang, N. Vachharajani, and et al., "SWIFT: Software Implemented Fault Tolerance," in International Symposium on Code Generation and Optimization, 2005, 2005, pp. 243-254.
-
(2005)
International Symposium on Code Generation and Optimization, 2005
, pp. 243-254
-
-
Reis, G.1
Chang, J.2
Vachharajani, N.3
-
18
-
-
0036507790
-
Error Detection by Duplicated Instructions in Superscalar processors
-
March
-
N. Oh, P. Shirvani, and E. McCluskey, "Error Detection by Duplicated Instructions in Superscalar processors," IEEE Transactions on Reliability, pp. 63-75, March 2002.
-
(2002)
IEEE Transactions on Reliability
, pp. 63-75
-
-
Oh, N.1
Shirvani, P.2
McCluskey, E.3
-
19
-
-
84880877655
-
ROSE::FTTransform-a source-to-source translation framework for exascale fault-tolerance research
-
June
-
J. Lidman, D. Quinlan, C. Liao, and S. McKee, "ROSE::FTTransform-a Source-to-Source Translation Framework for Exascale Fault-tolerance Research," in Dependable Systems and Networks Workshops (DSN-W), 2012 IEEE/IFIP 42nd International Conference on, June 2012, pp. 1-6.
-
(2012)
Dependable Systems and Networks Workshops (DSN-W), 2012 IEEE/IFIP 42nd International Conference on
, pp. 1-6
-
-
Lidman, J.1
Quinlan, D.2
Liao, C.3
McKee, S.4
|