-
2
-
-
34548724645
-
An Adaptive Semantic Filter for Blue Gene/L Failure Log Analysis
-
IEEE International, 26-30 March
-
Y. Liang, Y. Zhang, H. Xiong, R. Sahoo. "An Adaptive Semantic Filter for Blue Gene/L Failure Log Analysis". Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International, 26-30 March 2007, pp: 1-8
-
(2007)
Parallel and Distributed Processing Symposium, 2007. IPDPS
, pp. 1-8
-
-
Liang, Y.1
Zhang, Y.2
Xiong, H.3
Sahoo, R.4
-
3
-
-
27544497222
-
-
Y. Liang, Y. Zhang, A. Sivasubramaniam, R. Sahoo, J. Moreira, M. Gupta, Filtering failure logs for a Bluegene/L prototype, Dependable Systems and Networks, 2005. DSN 2005. Proceedings. International Conference on 28 June-1 July 2005 pp. 476 - 485.
-
Y. Liang, Y. Zhang, A. Sivasubramaniam, R. Sahoo, J. Moreira, M. Gupta, "Filtering failure logs for a Bluegene/L prototype", Dependable Systems and Networks, 2005. DSN 2005. Proceedings. International Conference on 28 June-1 July 2005 pp. 476 - 485.
-
-
-
-
4
-
-
47249153592
-
A Meta-Learning Failure Predictor for Blue Gene/L Systems
-
P. Gujrati, Y. Li, Z. Lan, R. Thakur, and J. White, "A Meta-Learning Failure Predictor for Blue Gene/L Systems", International Conference on Parallel Processing, 2007 (ICPP 2007), pp: 40-47.
-
(2007)
International Conference on Parallel Processing, 2007 (ICPP
, pp. 40-47
-
-
Gujrati, P.1
Li, Y.2
Lan, Z.3
Thakur, R.4
White, J.5
-
5
-
-
35048819077
-
An Overview of the Blue Gene/L System Software Organization
-
G.Almasi, R. Bellofatto, J. Brunheroto, C. Cascaval, J. G. Castonos, L. Ceze, P. Crumlev, C. C. Erway, J. Gagliano, D. Lieber, X. Martorell, J. E. Moreira, A. Sanomiva and K. Strauss, "An Overview of the Blue Gene/L System Software Organization", Euro-Par 2003, Parallel Processing (2003), pp.543-555
-
(2003)
Euro-Par 2003, Parallel Processing
, pp. 543-555
-
-
Almasi, G.1
Bellofatto, R.2
Brunheroto, J.3
Cascaval, C.4
Castonos, J.G.5
Ceze, L.6
Crumlev, P.7
Erway, C.C.8
Gagliano, J.9
Lieber, D.10
Martorell, X.11
Moreira, J.E.12
Sanomiva, A.13
Strauss, K.14
-
6
-
-
70349693443
-
-
at Lawrence Livermore National Laboratory
-
Secure Computing Facility, High Performance Computing at Lawrence Livermore National Laboratory, https://computing.llnl.gov/? set= resources&page=SCF-resources#bluegenel
-
Secure Computing Facility, High Performance Computing
-
-
-
7
-
-
33845593340
-
-
Schroeder and G. Gibson, A large-scale study of failures in highperformance- computing systems, in Proceedings of the 2006 International Conference on Dependable Systems and Networks, June 2006.
-
Schroeder and G. Gibson, "A large-scale study of failures in highperformance- computing systems," in Proceedings of the 2006 International Conference on Dependable Systems and Networks, June 2006.
-
-
-
-
8
-
-
4544382099
-
Failure Data Analysis of a Large-Scale Heterogeneous Server Environment
-
R. Sahoo, A. Sivasubramaniam, M. Squillante, and Y. Zhang, "Failure Data Analysis of a Large-Scale Heterogeneous Server Environment.", In Proceedings of the 2004 International Conference on Dependable Systems and Networks, pages 389-398, 2004.
-
(2004)
Proceedings of the 2004 International Conference on Dependable Systems and Networks
, pp. 389-398
-
-
Sahoo, R.1
Sivasubramaniam, A.2
Squillante, M.3
Zhang, Y.4
-
10
-
-
36049028957
-
Defining and measuring supercomputer Reliability, Availability, and Serviceability (RAS)
-
J. Stearley. Defining and measuring supercomputer Reliability, Availability, and Serviceability (RAS). In Proceedings of the Linux Clusters Institute Conference, 2005. See http://www.cs.sandia.gov/̃jrstear/ras.
-
Proceedings of the Linux Clusters Institute Conference, 2005
-
-
Stearley, J.1
-
11
-
-
51049108820
-
An optimal checkpoint/restart model for a large scale high performance computing system
-
April
-
Y. Liu, R. Nassar, C. B. Leangsuksun, N. Naksinehaboon, M. Paun, and S. L. Scott, "An optimal checkpoint/restart model for a large scale high performance computing system," in International Parallel and Distributed Processing Symposium, April 2008
-
(2008)
International Parallel and Distributed Processing Symposium
-
-
Liu, Y.1
Nassar, R.2
Leangsuksun, C.B.3
Naksinehaboon, N.4
Paun, M.5
Scott, S.L.6
-
14
-
-
70349681940
-
-
http://www.latech.edu/~nta008/patterns.tgz
-
-
-
|