-
3
-
-
11144287593
-
An overview of the bluegene/l supercomputer
-
and The BlueGene/L Team
-
N. R. Adiga and The BlueGene/L Team. An overview of the bluegene/l supercomputer. In Proceedings of ACM Supercomputing, 2002.
-
(2002)
Proceedings of ACM Supercomputing
-
-
Adiga, N.R.1
-
8
-
-
0027797402
-
Faults, symptoms, and software fault tolerance in the tandem guardian90 operating system
-
I. Lee and R. K. Iyer. Faults, symptoms, and software fault tolerance in the tandem guardian90 operating system. In Fault-Tolerant Computing. FTCS-23. Digest of Papers., The Twenty-Third International Symposium on, pages 20-29, 1993.
-
(1993)
Fault-Tolerant Computing. FTCS-23. Digest of Papers., The Twenty-Third International Symposium on
, pp. 20-29
-
-
Lee, I.1
Iyer, R.K.2
-
9
-
-
33845589803
-
Blue gene/l failure analysis and prediction models
-
Y. Liang, Y. Zhang, M. Jette, A. Sivasubramaniam, and R. K. Sahoo. Blue gene/l failure analysis and prediction models. In Proceedings of the Intl. Conf. on Dependable Systems and Networks (DSN), pages 425-434, 2006.
-
(2006)
Proceedings of the Intl. Conf. on Dependable Systems and Networks (DSN)
, pp. 425-434
-
-
Liang, Y.1
Zhang, Y.2
Jette, M.3
Sivasubramaniam, A.4
Sahoo, R.K.5
-
10
-
-
27544497222
-
Filtering failure logs for a bluegene/l prototype
-
June
-
Y. Liang, Y. Zhang, A. Sivasubramaniam, R. K. Sahoo, J. Moreira, and M. Gupta. Filtering failure logs for a bluegene/l prototype. In Proceedings of the Intl. Conf. on Dependable Systems and Networks (DSN), pages 476-485, June 2005.
-
(2005)
Proceedings of the Intl. Conf. on Dependable Systems and Networks (DSN)
, pp. 476-485
-
-
Liang, Y.1
Zhang, Y.2
Sivasubramaniam, A.3
Sahoo, R.K.4
Moreira, J.5
Gupta, M.6
-
11
-
-
0025502686
-
Error log analysis: Statistical modeling and heuristic trend analysis
-
T. T. Y. Lin and D. P. Siewiorek. Error log analysis: statistical modeling and heuristic trend analysis. Reliability, IEEE Transactions on, 39(4):419-432, 1990.
-
(1990)
Reliability, IEEE Transactions on
, vol.39
, Issue.4
, pp. 419-432
-
-
Lin, T.T.Y.1
Siewiorek, D.P.2
-
13
-
-
0022324829
-
A methodology for analysis of failure prediction data
-
December
-
F. A. Nassar and D. M. Andrews. A methodology for analysis of failure prediction data. In Real-Time Systems Symposium, pages 160-166, December 1985.
-
(1985)
Real-Time Systems Symposium
, pp. 160-166
-
-
Nassar, F.A.1
Andrews, D.M.2
-
15
-
-
34547424386
-
Cooperative check-pointing: A robust approach to large-scale systems reliability
-
Cairns, Australia, June
-
A. Oliner, L. Rudolph, and R. K. Sahoo. Cooperative check-pointing: A robust approach to large-scale systems reliability. In Proceedings of the 20th Intl. Conf. on Supercomputing (ICS), Cairns, Australia, June 2006.
-
(2006)
Proceedings of the 20th Intl. Conf. on Supercomputing (ICS)
-
-
Oliner, A.1
Rudolph, L.2
Sahoo, R.K.3
-
16
-
-
27544438709
-
Probabilistic qos guarantees for supercomputing systems
-
A. J. Oliner, L. Rudolph, R. K. Sahoo, J. E. Moreira, and M. Gupta. Probabilistic qos guarantees for supercomputing systems. In Proceedings of the Intl. Conf. on Dependable Systems and Networks (DSN), pages 634-643, 2005.
-
(2005)
Proceedings of the Intl. Conf. on Dependable Systems and Networks (DSN)
, pp. 634-643
-
-
Oliner, A.J.1
Rudolph, L.2
Sahoo, R.K.3
Moreira, J.E.4
Gupta, M.5
-
17
-
-
12444257746
-
Fault-aware job scheduling for blue-gene/l systems
-
A. J. Oliner, R. K. Sahoo, J. E. Moreira, M. Gupta, and A. Sivasubramaniam. Fault-aware job scheduling for blue-gene/l systems. In Proceedings of the 18th Intl. Parallel and Distributed Processing Symposium (IPDPS), pages 64+, 2004.
-
(2004)
Proceedings of the 18th Intl. Parallel and Distributed Processing Symposium (IPDPS)
-
-
Oliner, A.J.1
Sahoo, R.K.2
Moreira, J.E.3
Gupta, M.4
Sivasubramaniam, A.5
-
19
-
-
77952378080
-
Critical event prediction for proactive management in large-scale computer clusters
-
ACM Press
-
R. K. Sahoo, A. J. Oliner, I. Rish, M. Gupta, J. E. Moreira, S. Ma, R. Vilalta, and A. Sivasubramaniam. Critical event prediction for proactive management in large-scale computer clusters. In Proceedings of the 9th ACM SIGKDD, International Conference on Knowledge Discovery and Data Mining, pages 426-435. ACM Press, 2003.
-
(2003)
Proceedings of the 9th ACM SIGKDD, International Conference on Knowledge Discovery and Data Mining
, pp. 426-435
-
-
Sahoo, R.K.1
Oliner, A.J.2
Rish, I.3
Gupta, M.4
Moreira, J.E.5
Ma, S.6
Vilalta, R.7
Sivasubramaniam, A.8
-
20
-
-
4544382099
-
Failure data analysis of a large-scale heterogeneous server environment
-
June
-
R. K. Sahoo, A. Sivasubramaniam, M. S. Squillante, and Y. Zhang. Failure data analysis of a large-scale heterogeneous server environment. In Proceedings of the Intl. Conf. on Dependable Systems and Networks (DSN), pages 772-781, June 2004.
-
(2004)
Proceedings of the Intl. Conf. on Dependable Systems and Networks (DSN)
, pp. 772-781
-
-
Sahoo, R.K.1
Sivasubramaniam, A.2
Squillante, M.S.3
Zhang, Y.4
-
24
-
-
36049028957
-
Defining and measuring supercomputer Reliability, Availability, and Serviceability (RAS)
-
See http
-
J. Stearley. Defining and measuring supercomputer Reliability, Availability, and Serviceability (RAS). In Proceedings of the Linux Clusters Institute Conference, 2005. See http://www.cs.sandia.gov/~jrstear/ras.
-
(2005)
Proceedings of the Linux Clusters Institute Conference
-
-
Stearley, J.1
-
25
-
-
0026869241
-
Analysis and modeling of correlated failures in multicomputer systems
-
D. Tang and R. K. Iyer. Analysis and modeling of correlated failures in multicomputer systems. Computers, IEEE Transactions on, 41(5):567-577, 1992.
-
(1992)
Computers, IEEE Transactions on
, vol.41
, Issue.5
, pp. 567-577
-
-
Tang, D.1
Iyer, R.K.2
-
28
-
-
32444434211
-
Dynamic syslog mining for network failure monitoring
-
New York, NY, USA, ACM Press
-
K. Yamanishi and Y. Maruyama. Dynamic syslog mining for network failure monitoring. In Proceedings of the 11th ACM SIGKDD, International Conference on Knowledge Discovery and Data Mining, pages 499-508, New York, NY, USA, 2005. ACM Press.
-
(2005)
Proceedings of the 11th ACM SIGKDD, International Conference on Knowledge Discovery and Data Mining
, pp. 499-508
-
-
Yamanishi, K.1
Maruyama, Y.2
|