-
1
-
-
84973836157
-
The NAS parallel benchmarks
-
Fall URL
-
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, D. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The NAS Parallel Benchmarks. The International Journal of Supercomputer Applications, 5(3): 63-73, Fall 1991. URL citeseer.ist.psu.edu/article/ bailey94nas.html.
-
(1991)
The International Journal of Supercomputer Applications
, vol.5
, Issue.3
, pp. 63-73
-
-
Bailey, D.H.1
Barszcz, E.2
Barton, J.T.3
Browning, D.S.4
Carter, R.L.5
Dagum, D.6
Fatoohi, R.A.7
Frederickson, P.O.8
Lasinski, T.A.9
Schreiber, R.S.10
Simon, H.D.11
Venkatakrishnan, V.12
Weeratunga, S.K.13
-
2
-
-
78650807026
-
Scalable I/O systems via node-local storage: Approaching 1 TB/sec file I/O
-
Livermore, CA, USA, Aug. URL http://dx.doi.org/10.2172/964079
-
G. Bronevetsky and A. Moody. Scalable I/O systems via node-local storage: Approaching 1 TB/sec file I/O. Technical Report TR-JLPC-09-01, Lawrence Livermore National Laboratory, Livermore, CA, USA, Aug. 2009. URL http://dx.doi.org/10.2172/964079.
-
(2009)
Technical Report TR-JLPC-09-01, Lawrence Livermore National Laboratory
-
-
Bronevetsky, G.1
Moody, A.2
-
3
-
-
77955737995
-
-
Whitepaper, Dec. URL
-
N. DeBardeleben, J. Laros, J. T. Daly, S. L. Scott, C. Engelmann, and B. Harrod. High-end computing resilience: Analysis of issues facing the HEC community and path-forward for research and development. Whitepaper, Dec. 2009. URL http://www.csm.ornl.gov/~engelman/publications/debardeleben09high-end.pdf.
-
(2009)
High-end Computing Resilience: Analysis of Issues Facing the HEC Community and Path-forward for Research and Development
-
-
DeBardeleben, N.1
Laros, J.2
Daly, J.T.3
Scott, S.L.4
Engelmann, C.5
Harrod, B.6
-
4
-
-
0030129232
-
The transis approach to high availability cluster communication
-
D. Dolev and D. Malki. The Transis approach to high availability cluster communication. Communications of the ACM, 39(4):64-70, 1996. ISSN 0001-0782. URL http://doi.acm.org/10.1145/227210.227227. (Pubitemid 126428118)
-
(1996)
Communications of the ACM
, vol.39
, Issue.4
, pp. 64-70
-
-
Dolev, D.1
Malki, D.2
-
5
-
-
77954574789
-
System resilience at extreme scale
-
URL
-
E. N. M. Elnozahy, R. Bianchini, T. El-Ghazawi, A. Fox, F. Godfrey, A. Hoisie, K. McKinley, R. Melhem, J. S. Plank, P. Ranganathan, and J. Simons. System resilience at extreme scale. Technical report, Defense Advanced Research Project Agency (DARPA), 2008. URL http://institutes.lanl.gov/resilience/docs/ Toward%20Exascale%20Resilience.pdf.
-
(2008)
Technical Report, Defense Advanced Research Project Agency (DARPA)
-
-
Elnozahy, E.N.M.1
Bianchini, R.2
El-Ghazawi, T.3
Fox, A.4
Godfrey, F.5
Hoisie, A.6
McKinley, K.7
Melhem, R.8
Plank, J.S.9
Ranganathan, P.10
Simons, J.11
-
6
-
-
74549140832
-
The case for modular redundancy in large-scale high performance computing systems
-
Innsbruck, Austria, Feb. 16-18 ACTA Press, Calgary, AB, Canada. ISBN 978-0-88986-784-0. URL
-
th IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN) 2009, pages 189-194, Innsbruck, Austria, Feb. 16-18, 2009. ACTA Press, Calgary, AB, Canada. ISBN 978-0-88986-784-0. URL http://www.csm.ornl.gov/~engelman/publications/engelmann09case.pdf.
-
(2009)
th IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN) 2009
, pp. 189-194
-
-
Engelmann, C.1
Ong, H.H.2
Scott, S.L.3
-
7
-
-
78149270835
-
Increasing fault resiliency in a messagepassing environment
-
Oct.
-
K. Ferreira, R. Riesen, R. Oldfield, J. Stearley, J. Laros, K. Pedretti, R. Brightwell, and T. Kordenbrock. Increasing fault resiliency in a messagepassing environment. Technical report SAND2009-6753, Sandia National Laboratories, Oct. 2009.
-
(2009)
Technical Report SAND2009-6753, Sandia National Laboratories
-
-
Ferreira, K.1
Riesen, R.2
Oldfield, R.3
Stearley, J.4
Laros, J.5
Pedretti, K.6
Brightwell, R.7
Kordenbrock, T.8
-
8
-
-
58149131807
-
DDMR: Dynamic and scalable dual modular redundancy with short validation intervals
-
URL http://doi.ieeecomputersociety.org/10.1109/L-CA.2008.12
-
A. Golander, S. Weiss, and R. Ronen. DDMR: Dynamic and scalable dual modular redundancy with short validation intervals. IEEE Computer Architecture Letters, 7(2):65-68, 2008. URL http://doi.ieeecomputersociety.org/10.1109/L-CA. 2008.12.
-
(2008)
IEEE Computer Architecture Letters
, vol.7
, Issue.2
, pp. 65-68
-
-
Golander, A.1
Weiss, S.2
Ronen, R.3
-
9
-
-
33749067567
-
Berkeley lab checkpoint/restart (BLCR) for Linux clusters
-
DOI 10.1088/1742-6596/46/1/067, 067
-
P. H. Hargrove and J. C. Duell. Berkeley Lab Checkpoint/Restart (BLCR) for Linux clusters. In Journal of Physics: Proceedings of the Scientific Discovery through Advanced Computing Program (SciDAC) Conference 2006, volume 46, pages 494-499, Denver, CO, USA, June 25-29, 2006. Institute of Physics Publishing, Bristol, UK. URL http://www.iop.org/EJ/article/1742-6596/46/1/067/ jpconf646067.pdf. (Pubitemid 44461038)
-
(2006)
Journal of Physics: Conference Series
, vol.46
, Issue.1
, pp. 494-499
-
-
Hargrove, P.H.1
Duell, J.C.2
-
10
-
-
11244269589
-
Configurable Fault-Tolerant Processor (CFTP) for spacecraft onboard processing
-
1097, 2004 IEEE Aerospace Conference Proceedings
-
C. A. Hulme, H. H. Loomis, A. A. Ross, and R. Yuan. Configurable fault-tolerant processor (CFTP) for spacecraft onboard processing. In Proceedings of the IEEE Aerospace Conference 2004, volume 4, pages 2269-2276, Big Sky, MT, USA, Mar. 6-13, 2002. IEEE Computer Society. ISBN 0-7803-8155-6. URL http://ieeexplore.ieee.org/xpls/absall.jsp?arnumber=1368020. (Pubitemid 40057225)
-
(2004)
IEEE Aerospace Conference Proceedings
, vol.4
, pp. 2269-2276
-
-
Hulme, C.A.1
Loomis, H.H.2
Ross, A.A.3
Yuan, R.4
-
11
-
-
66749092384
-
ExaScale computing study: Technology challenges in achieving exascale systems
-
URL
-
P. Kogge, K. Bergman, S. Borkar, D. Campbell, W. Carlson, W. Dally, M. Denneau, P. Franzon, W. Harrod, K. Hill, J. Hiller, S. Karp, S. Keckler, D. Klein, R. Lucas, M. Richards, A. Scarpelli, S. Scott, A. Snavely, T. Sterling, R. S. Williams, and K. Yelick. ExaScale computing study: Technology challenges in achieving exascale systems. Technical report, Defense Advanced Research Project Agency (DARPA) Information Processing Techniques Office (IPTO), 2008. URL http://users.ece.gatech.edu/~mrichard/ExascaleComputingStudyReports/ exascalefinalreport100208.pdf.
-
(2008)
Technical Report, Defense Advanced Research Project Agency (DARPA) Information Processing Techniques Office (IPTO)
-
-
Kogge, P.1
Bergman, K.2
Borkar, S.3
Campbell, D.4
Carlson, W.5
Dally, W.6
Denneau, M.7
Franzon, P.8
Harrod, W.9
Hill, K.10
Hiller, J.11
Karp, S.12
Keckler, S.13
Klein, D.14
Lucas, R.15
Richards, M.16
Scarpelli, A.17
Scott, S.18
Snavely, A.19
Sterling, T.20
Williams, R.S.21
Yelick, K.22
more..
-
12
-
-
70350469329
-
Volpexmpi: An MPI library for execution of parallel applications on volatile nodes
-
Espoo, Finland, Sept. 7-10 Springer Verlag, Berlin, Germany. ISBN 978-3-540-75415-2. URL http://dx.doi.org/10.1007/978-3-642-03770-219
-
th European PVM/MPI Users' Group Meeting (EuroPVM/MPI) 2009, volume 5759, pages 124-133, Espoo, Finland, Sept. 7-10, 2009. Springer Verlag, Berlin, Germany. ISBN 978-3-540-75415-2. URL http://dx.doi.org/10.1007/978-3-642-03770-219.
-
(2009)
th European PVM/MPI Users' Group Meeting (EuroPVM/MPI) 2009
, vol.5759
, pp. 124-133
-
-
LeBlanc, T.1
Anand, R.2
Gabriel, E.3
Subhlok, J.4
-
13
-
-
34548212768
-
Power efficient approaches to redundant multithreading
-
DOI 10.1109/TPDS.2007.1090
-
N. Madan and R. Balasubramonian. Power efficient approaches to redundant multithreading. IEEE Transactions on Parallel and Distributed Systems (TPDS), 18(8):1066-1079, 2007. ISSN 1045-9219. URL http://doi.ieeecomputersociety.org/ 10.1109/TPDS.2007.1090. (Pubitemid 47315989)
-
(2007)
IEEE Transactions on Parallel and Distributed Systems
, vol.18
, Issue.8
, pp. 1066-1079
-
-
Madan, N.1
Balasubramonian, R.2
-
14
-
-
0036287327
-
Detailed design and evaluation of redundant multithreading alternatives
-
th Annual International Symposium on Computer Architecture (ISCA) 2002, pages 99-110, Anchorage, AK, USA, May 25-29, 2002. IEEE Computer Society. ISBN 0-7695-1605-X. URL http://doi.ieeecomputersociety.org/10.1109/ISCA.2002. 1003566. (Pubitemid 34691854)
-
(2002)
Conference Proceedings - Annual International Symposium on Computer Architecture, ISCA
, pp. 99-110
-
-
Mukherjee, S.S.1
Kontz, M.2
Reinhardt, S.K.3
-
15
-
-
67649255075
-
PLR: A software approach to transient fault tolerance for multicore architectures
-
ISSN 1545-5971. URL http://doi.ieeecomputersociety.org/10.1109/TDSC.2008. 62
-
A. Shye, J. Blomstedt, T. Moseley, V. J. Reddi, and D. A. Connors. PLR: A software approach to transient fault tolerance for multicore architectures. IEEE Transactions on Dependable and Secure Computing (TDSC), 6(2):135-148, 2009. ISSN 1545-5971. URL http://doi.ieeecomputersociety.org/10.1109/TDSC.2008.62.
-
(2009)
IEEE Transactions on Dependable and Secure Computing (TDSC)
, vol.6
, Issue.2
, pp. 135-148
-
-
Shye, A.1
Blomstedt, J.2
Moseley, T.3
Reddi, V.J.4
Connors, D.A.5
-
16
-
-
0026404704
-
Architecture of fault-tolerant computers: An historical perspective
-
ISSN 0018-9219. URL http://dx.doi.org/10.1109/5.119549
-
D. P. Siemwiorek. Architecture of fault-tolerant computers: An historical perspective. Proceedings of the IEEE, 79(12):1710-1734, 1991. ISSN 0018-9219. URL http://dx.doi.org/10.1109/5.119549.
-
(1991)
Proceedings of the IEEE
, vol.79
, Issue.12
, pp. 1710-1734
-
-
Siemwiorek, D.P.1
-
17
-
-
84885601112
-
Data integrity in HP nonstop servers
-
Urbana-Champaign, IL, USA, Apr. 11-12 URL
-
nd Workshop on System Effects of Logic Soft Errors (SELSE) 2006, Urbana-Champaign, IL, USA, Apr. 11-12, 2006. URL http://selse2.selse.org/papers/wood.pdf.
-
(2006)
nd Workshop on System Effects of Logic Soft Errors (SELSE) 2006
-
-
Wood, A.1
Jardine, R.2
Bartlett, W.3
-
18
-
-
78249259344
-
MMPI: A scalable fault tolerance mechanism for MPI large scale parallel computing
-
Bradford, UK, June 29 - July 1 IEEE Computer Society. ISBN 978-0-7695-4108-2. URL http://doi.ieeecomputersociety.org/10.1109/CIT.2010.226
-
th IEEE International Conference on Computer and Information Technology (CIT) 2010, pages 1251-1256, Bradford, UK, June 29 - July 1, 2009. IEEE Computer Society. ISBN 978-0-7695-4108-2. URL http://doi.ieeecomputersociety.org/10.1109/CIT.2010. 226.
-
(2009)
th IEEE International Conference on Computer and Information Technology (CIT) 2010
, pp. 1251-1256
-
-
Yang, X.1
Wang, Z.2
Zhou, Y.3
|