-
1
-
-
53349092000
-
BlueGene /L Failure Analysis and Models
-
Y. Liang, Y. Zhang, et al., "BlueGene /L Failure Analysis and Models", Proc. of DSN06, 2006.
-
(2006)
Proc. of DSN06
-
-
Liang, Y.1
Zhang, Y.2
-
2
-
-
47249157799
-
Advanced Failure Prediction in Complex Software Systems
-
G. Hoffmann, F. Salfner, M. Malek, "Advanced Failure Prediction in Complex Software Systems", Proc. of SRDS, 2004.
-
(2004)
Proc. of SRDS
-
-
Hoffmann, G.1
Salfner, F.2
Malek, M.3
-
3
-
-
77952378080
-
Critical Event Prediction for Proactive Management in Large-scale Computer Clusters
-
R. Sahoo, A. Oliner, et al., "Critical Event Prediction for Proactive Management in Large-scale Computer Clusters", Proc. of KDD 2003,pp. 426-435,2003.
-
(2003)
Proc. of KDD
, pp. 426-435
-
-
Sahoo, R.1
Oliner, A.2
-
5
-
-
4544299163
-
Failure Diagnosis Using Decision Trees
-
M. Chen, A. X. Zheng, J. Lloyd, M. I. Jordan, E. Brewer , "Failure Diagnosis Using Decision Trees", International Conference on Autonomic Computing (ICAC-04), 2004.
-
(2004)
International Conference on Autonomic Computing (ICAC-04)
-
-
Chen, M.1
Zheng, A.X.2
Lloyd, J.3
Jordan, M.I.4
Brewer, E.5
-
6
-
-
78149354391
-
Predicting Rare Events in Temporal Domains
-
R. Vilalta and S. Ma, "Predicting Rare Events in Temporal Domains", Proc. of IEEE ICDM, 2002.
-
(2002)
Proc. of IEEE ICDM
-
-
Vilalta, R.1
Ma, S.2
-
7
-
-
47249153592
-
A meta-learning failure predictor for biuegene/l. systems
-
P. Gujrati, Y. Li, Z. Lan, R. Thakur, and J. White, "A meta-learning failure predictor for biuegene/l. systems," Proc. of ICPP"07, 2007.
-
(2007)
Proc. of ICPP07
-
-
Gujrati, P.1
Li, Y.2
Lan, Z.3
Thakur, R.4
White, J.5
-
8
-
-
4544337911
-
Automatic Methods for Predicting Machine Availability in Desktop Grid and Peer-to-Peer Systems
-
J. Brevik, D. Nurmi, and R. Wolski, "Automatic Methods for Predicting Machine Availability in Desktop Grid and Peer-to-Peer Systems", Proc. of IEEE CCGrid, 2004.
-
(2004)
Proc. of IEEE CCGrid
-
-
Brevik, J.1
Nurmi, D.2
Wolski, R.3
-
9
-
-
33847100080
-
Application classification through monitoring and learning of resource consumption patterns
-
Rhodes Island, Greece, Apr. 25-29
-
J. Zhang and R. Figueiredo, "Application classification through monitoring and learning of resource consumption patterns", 10th IEEE International Paralle & Distributed Processing Symposium, Rhodes Island, Greece, Apr. 25-29, 2006.
-
(2006)
10th IEEE International Paralle & Distributed Processing Symposium
-
-
Zhang, J.1
Figueiredo, R.2
-
10
-
-
33751082401
-
Exploit Failure Prediction for Adaptive Fault-Tolerance in Cluster Computing
-
Yawei Li and Zhiling Lan, "Exploit Failure Prediction for Adaptive Fault-Tolerance in Cluster Computing", Proc. of IEEE CCGrid'06, 2006.
-
(2006)
Proc. of IEEE CCGrid'06
-
-
Li, Y.1
Lan, Z.2
-
11
-
-
0003922190
-
-
Wiley-Interscience, New York, NY, 2nd edition
-
R. Duda, P. Hart, and D. Stork. Pattern Classification. Wiley-Interscience, New York, NY, 2001. 2nd edition.
-
(2001)
Pattern Classification
-
-
Duda, R.1
Hart, P.2
Stork, D.3
-
12
-
-
0034133513
-
Distance-based outliers: Algorithms and applications
-
Edwin M. Knorr, Raymond T. Ng, Vladimir Tucakov, "Distance-based outliers: algorithms and applications", The VLDB Joumal,(2000) 8: 237-253.
-
(2000)
The VLDB Joumal
, vol.8
, pp. 237-253
-
-
Knorr, E.M.1
Ng, R.T.2
Tucakov, V.3
-
14
-
-
53349171714
-
-
Hardware monitoring by lm. sensors http://secure.netroedge.com/lm78/info. html
-
Hardware monitoring by lm. sensors http://secure.netroedge.com/lm78/info. html
-
-
-
-
15
-
-
28044457320
-
Monitoring Hard Disk with SMART
-
January
-
B. Allen, "Monitoring Hard Disk with SMART", Linux Journal, January, 2004.
-
(2004)
Linux Journal
-
-
Allen, B.1
-
16
-
-
0012253727
-
Bayesian approaches to failure prediction for disk drives
-
Greg Hamerly and Charles Elkan, "Bayesian approaches to failure prediction for disk drives", ICML 2001,pp. 1-9, 2001.
-
(2001)
ICML 2001
, pp. 1-9
-
-
Hamerly, G.1
Elkan, C.2
-
17
-
-
85077332099
-
I/O System Performance Debugging Using Model-driven. Anomaly Characterization
-
Kai Shen, Ming Zhong, and Chuanpeng Li, "I/O System Performance Debugging Using Model-driven. Anomaly Characterization", 4th USENIX Conference on File and Storage Technologies, pp. 309C322, 2005.
-
(2005)
4th USENIX Conference on File and Storage Technologies
, Issue.C322
, pp. 309
-
-
Shen, K.1
Zhong, M.2
Li, C.3
-
18
-
-
34548294322
-
Problem Diagnosis in Large-Scale Computing Environments
-
A. V. Mirgorodskiy, N. Maruyama, B.P. Miller, "Problem Diagnosis in Large-Scale Computing Environments", Supercomputing, 2006, pp. 11-24, 2006.
-
(2006)
Supercomputing, 2006
, pp. 11-24
-
-
Mirgorodskiy, A.V.1
Maruyama, N.2
Miller, B.P.3
-
20
-
-
0003445361
-
Safeware: System safety and computers
-
Addison-Wesley
-
Nancy Leveson, "Safeware: System safety and computers", New York: Addison-Wesley, 1995.
-
(1995)
New York
-
-
Leveson, N.1
-
21
-
-
53349141565
-
-
A. Lakhina, M. Crovella, C. Diot, Mining anomalies using traffic feature distributions,Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications, 2005.
-
A. Lakhina, M. Crovella, C. Diot, "Mining anomalies using traffic feature distributions",Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications, 2005.
-
-
-
-
24
-
-
84887590348
-
The AppLeS Parameter Sweep Template: User-Level Middleware for the Grid
-
H. Casanova, G. Obertelli, F. Berman and R. Wolski, "The AppLeS Parameter Sweep Template: User-Level Middleware for the Grid", Supercomputing, 2000, pp. 60-69, 2000.
-
(2000)
Supercomputing, 2000
, pp. 60-69
-
-
Casanova, H.1
Obertelli, G.2
Berman, F.3
Wolski, R.4
-
25
-
-
0032003182
-
Multicommodity flow models, failure propagation, and reliable loss network design
-
Girard, A. , Brunilde, S., "Multicommodity flow models, failure propagation, and reliable loss network design", IEEE/ACM Transactions on Networking (TON), Volume 6, Issue 1, pp. 82-93, 1998.
-
(1998)
IEEE/ACM Transactions on Networking (TON)
, vol.6
, Issue.1
, pp. 82-93
-
-
Girard, A.1
Brunilde, S.2
-
27
-
-
33748083430
-
Supermon: A High- Speed Cluster Monitoring System
-
M. Scottile and R. Minnich, "Supermon: A High- Speed Cluster Monitoring System", Proc. IEEE Cluster, 2002.
-
(2002)
Proc. IEEE Cluster
-
-
Scottile, M.1
Minnich, R.2
-
28
-
-
53349160560
-
The Inca test harness and reporting Framework
-
S. Smallen, C. Olschanowski, K. Ericson, P. Bechman, and J. Schopf, "The Inca test harness and reporting Framework", Proc. of SC04, 2004.
-
(2004)
Proc. of SC04
-
-
Smallen, S.1
Olschanowski, C.2
Ericson, K.3
Bechman, P.4
Schopf, J.5
-
29
-
-
33746634918
-
Diagnosing Network-Wide Traffic Anomalies
-
A. Lakhina, M. Crovella, C. Diot, "Diagnosing Network-Wide Traffic Anomalies", Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications, 2004.
-
(2004)
Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
-
-
Lakhina, A.1
Crovella, M.2
Diot, C.3
-
30
-
-
0034503741
-
A fast, On-Line Algorithm for PCA and Its Convergence Characteristics
-
Rao. Y. N. and Principe. J. C., "A fast, On-Line Algorithm for PCA and Its Convergence Characteristics", Proceedings of the 2000 IEEE Signal Processing Society Workshop, vol.1, pp. 299-307, 2000.
-
(2000)
Proceedings of the 2000 IEEE Signal Processing Society Workshop
, vol.1
, pp. 299-307
-
-
Rao, Y.N.1
Principe, J.C.2
-
31
-
-
85029955743
-
Introduction to Parallel Computing (2nd Edition)
-
Ananth Grama, Vipin Kumar, Anshul Gupta, George Karpis, "Introduction to Parallel Computing (2nd Edition)", Addison-Wesley, 2003.
-
(2003)
Addison-Wesley
-
-
Grama, A.1
Kumar, V.2
Gupta, A.3
Karpis, G.4
-
32
-
-
33745488052
-
Combining Visualization and Statistical Analysis to Improve Operator Confidence and Efficiency for Failure Detection and Localization
-
P. Bodik, G. Friedman, L. Biewald, H. Levine, G. Candea, K. Patel, G. Tolle, J. Hui, A. Fox, M. I. Jordan and D. Patterson, "Combining Visualization and Statistical Analysis to Improve Operator Confidence and Efficiency for Failure Detection and Localization", The 2nd IEEE International Conference on Autonomic Computing (ICAC '05), 2005.
-
(2005)
The 2nd IEEE International Conference on Autonomic Computing (ICAC '05)
-
-
Bodik, P.1
Friedman, G.2
Biewald, L.3
Levine, H.4
Candea, G.5
Patel, K.6
Tolle, G.7
Hui, J.8
Fox, A.9
Jordan, M.I.10
Patterson, D.11
-
33
-
-
0034704222
-
Nonlinear dimensionality reduction by locally linear embedding
-
S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 2000, Vol. 290: 2323-2326
-
(2000)
Science
, vol.290
, pp. 2323-2326
-
-
Roweis, S.T.1
Saul, L.K.2
-
35
-
-
0012253727
-
Bayesian approaches to failure prediction for disk drives
-
G. Hamerly and C. Elkan, "Bayesian approaches to failure prediction for disk drives," in Proc. of ICML, 2001.
-
(2001)
Proc. of ICML
-
-
Hamerly, G.1
Elkan, C.2
-
37
-
-
33644804204
-
MSET Performance Optimization for Detection of Softtware Aging
-
K. Vaidyanathan and K. Gross, "MSET Performance Optimization for Detection of Softtware Aging", Proc. of ISSRE, 2003.
-
(2003)
Proc. of ISSRE
-
-
Vaidyanathan, K.1
Gross, K.2
-
38
-
-
12444268355
-
On the feasibility of incremental checkpointing for scientific computing
-
J. Sancho, F. Petrini, G. Johnson, J. Fernandez, and E. Frachtenberg, "On the feasibility of incremental checkpointing for scientific computing," in Proc. of IPDPS'04, 2004.
-
(2004)
Proc. of IPDPS'04
-
-
Sancho, J.1
Petrini, F.2
Johnson, G.3
Fernandez, J.4
Frachtenberg, E.5
|