SCOPUS 정보 검색 플랫폼

17th International Conference on High Performance Computing, HiPC 2010

Volumn , Issue , 2010, Pages

Anomaly detection in large-scale coalition clusters for dependability assurance

(3) Guan, Qiang a Smith, Derek b Fu, Song a

a Center for Advanced Research and Technology (United States)

b NEW MEXICO INSTITUTE OF MINING AND TECHNOLOGY (United States)

Author keywords

Anomaly detection; Autonomic systems; Coalition clusters; Compute grids; System dependability

Indexed keywords

HEALTH; METADATA;

ANOMALY DETECTION; AUTONOMIC SYSTEMS; COALITION CLUSTER; COMPUTE GRID; DEPENDABILITY ASSURANCE; HEALTH DATA; HIGH PERFORMANCE COMPUTING SYSTEMS; LARGE-SCALES; SYSTEM COMPONENTS; SYSTEM DEPENDABILITY;

CLUSTERING ALGORITHMS;

EID: 79952786041 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/HIPC.2010.5713169 Document Type: Conference Paper

Times cited : (16)

References (37)

1
- 85164992771
- Available at
- sysstat. Available at: http://pagesperso-orange.fr/sebastien.godard/.
- Sysstat

2
- 0037860971
- Inductive learning for fault diagnosis
- H. Berenji, J. Ametha, and D. Vengerov. Inductive learning for fault diagnosis. In Proceedings of IEEE International Conference on Fuzzy Systems, 2003.
- Proceedings of IEEE International Conference on Fuzzy Systems, 2003
- Berenji, H.¹ Ametha, J.² Vengerov, D.³

3
- 68049121093
- Anomaly detection: A survey
- V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey. ACM Computing Surveys, 41(3):1-58, 2009.
- (2009) ACM Computing Surveys , vol.41 , Issue.3 , pp. 1-58
- Chandola, V.¹ Banerjee, A.² Kumar, V.³

4
- 33745116461
- Technical Report IRB-TR-03-040, Intel Research Berkeley, November
- B. Chun and A. Vahdat. Workload and failure characterization on a large-scale federated testbed. Technical Report IRB-TR-03-040, Intel Research Berkeley, November 2003.
- (2003) Workload and Failure Characterization on A Large-scale Federated Testbed
- Chun, B.¹ Vahdat, A.²

5
- 77954752832
- Correlating instrumentation data to system states: A building block for automated diagnosis and control
- I. Cohen, M. Goldszmidt, T. Kelly, J. Symons, and J. S. Chase. Correlating instrumentation data to system states: a building block for automated diagnosis and control. In Proceedings of USENIX Symposium on Opearting Systems Design and Implementation (OSDI), 2004.
- Proceedings of USENIX Symposium on Opearting Systems Design and Implementation (OSDI), 2004
- Cohen, I.¹ Goldszmidt, M.² Kelly, T.³ Symons, J.⁴ Chase, J.S.⁵

6
- 84889281816
- John Wiley & Sons
- T. Cover and J. Thomas. Elements of Information Theory. John Wiley & Sons, 1991.
- (1991) Elements of Information Theory
- Cover, T.¹ Thomas, J.²

7
- 0031276011
- Bayesian Network Classifiers
- N. Friedman, D. Geiger, and M. Goldszmidt. Bayesian network classifiers. Machine Learning, 29(2-3):131-163, 1997. (Pubitemid 127510036)
- (1997) Machine Learning , vol.29 , Issue.2-3 , pp. 131-163
- Friedman, N.¹ Geiger, D.² Goldszmidt, M.³

8
- 70349735985
- Failure-aware construction and reconfiguration of distributed virtual machines for high availability computing
- S. Fu. Failure-aware construction and reconfiguration of distributed virtual machines for high availability computing. In Proceedings of IEEE/ACM International Symposium on Cluster Computing and the Grid (CC-Grid), 2009.
- Proceedings of IEEE/ACM International Symposium on Cluster Computing and the Grid (CC-Grid), 2009
- Fu, S.¹

9
- 77956500955
- Dependability enhancement for coalition clusters with autonomic failure management
- S. Fu. Dependability enhancement for coalition clusters with autonomic failure management. In Proceedings of the 15th IEEE International Symposium on Computers and Communications (ISCC), 2010.
- Proceedings of the 15th IEEE International Symposium on Computers and Communications (ISCC), 2010
- Fu, S.¹

10
- 76849100508
- Failure-aware resource management for high-availability computing clusters with distributed virtual machines
- S. Fu. Failure-aware resource management for high-availability computing clusters with distributed virtual machines. Journal of Parallel and Distributed Computing, 70(4):384-393, 2010.
- (2010) Journal of Parallel and Distributed Computing , vol.70 , Issue.4 , pp. 384-393
- Fu, S.¹

11
- 56749178938
- Exploring event correlation for failure prediction in coalitions of clusters
- S. Fu and C.-Z. Xu. Exploring event correlation for failure prediction in coalitions of clusters. In Proceedings of ACM/IEEE Supercomputing Conference (SC), 2007.
- Proceedings of ACM/IEEE Supercomputing Conference (SC), 2007
- Fu, S.¹ Xu, C.-Z.²

12
- 47249124464
- Quantifying temporal and spatial correlation of failure events for proactive management
- S. Fu and C.-Z. Xu. Quantifying temporal and spatial correlation of failure events for proactive management. In Proceedings of IEEE International Symposium on Reliable Distributed Systems (SRDS), 2007.
- Proceedings of IEEE International Symposium on Reliable Distributed Systems (SRDS), 2007
- Fu, S.¹ Xu, C.-Z.²

13
- 70349679287
- Proactive resource management for failure resilient high performance computing clusters
- S. Fu and C.-Z. Xu. Proactive resource management for failure resilient high performance computing clusters. In Proceedings of IEEE International Conference on Availability, Reliability and Security (ARES), March 2009.
- Proceedings of IEEE International Conference on Availability, Reliability and Security (ARES), March 2009
- Fu, S.¹ Xu, C.-Z.²

14
- 77956227790
- Quantifying event correlations for proactive failure management in networked computing systems
- doi:10.1016/j.jpdc.2010.06.010.
- S. Fu and C.-Z. Xu. Quantifying event correlations for proactive failure management in networked computing systems. Journal of Parallel and Distributed Computing, 2010. doi:10.1016/j.jpdc.2010.06.010.
- (2010) Journal of Parallel and Distributed Computing
- Fu, S.¹ Xu, C.-Z.²

15
- 0037236308
- The dawning of the autonomic computing era
- A. G. Ganek and T. A. Corbi. The dawning of the autonomic computing era. IBM Systems Journal, 42(1):5-18, 2003.
- (2003) IBM Systems Journal , vol.42 , Issue.1 , pp. 5-18
- Ganek, A.G.¹ Corbi, T.A.²

16
- 0003585297
- Morgan Kaufmann Publishers Inc.
- J. Han. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., 2005.
- (2005) Data Mining: Concepts and Techniques
- Han, J.¹

17
- 85117220234
- A power-aware run-time system for high-performance computing
- C.-H. Hsu and W.-C. Feng. A power-aware run-time system for high-performance computing. In Proceedings of ACM/IEEE Supercomputing Conference (SC), 2005.
- Proceedings of ACM/IEEE Supercomputing Conference (SC), 2005
- Hsu, C.-H.¹ Feng, W.-C.²

18
- 0037253062
- The vision of autonomic computing
- J. O. Kephart and D. M. Chess. The vision of autonomic computing. IEEE Computer, 36(1):41-50, 2003.
- (2003) IEEE Computer , vol.36 , Issue.1 , pp. 41-50
- Kephart, J.O.¹ Chess, D.M.²

19
- 33845589803
- BlueGene/L failure analysis and prediction models
- Y. Liang, Y. Zhang, A. Sivasubramaniam, M. Jette, and R. K. Sahoo. BlueGene/L failure analysis and prediction models. In Proceedings of IEEE International Conference on Dependable Systems and Networks (DSN), 2006.
- Proceedings of IEEE International Conference on Dependable Systems and Networks (DSN), 2006
- Liang, Y.¹ Zhang, Y.² Sivasubramaniam, A.³ Jette, M.⁴ Sahoo, R.K.⁵

20
- 27544497222
- Filtering failure logs for a BlueGene/L prototype
- Y. Liang, Y. Zhang, A. Sivasubramaniam, R. Sahoo, J. Moreira, and M. Gupta. Filtering failure logs for a BlueGene/L prototype. In Proceedings of IEEE International Conference on Dependable Systems and Networks (DSN), 2005.
- Proceedings of IEEE International Conference on Dependable Systems and Networks (DSN), 2005
- Liang, Y.¹ Zhang, Y.² Sivasubramaniam, A.³ Sahoo, R.⁴ Moreira, J.⁵ Gupta, M.⁶

21
- 53349174366
- A log mining approach to failure analysis of enterprise telephony systems
- C. Lim, N. Singh, and S. Yajnik. A log mining approach to failure analysis of enterprise telephony systems. In Proceedings of IEEE International Conference on Dependable Systems and Networks (DSN), 2008.
- Proceedings of IEEE International Conference on Dependable Systems and Networks (DSN), 2008
- Lim, C.¹ Singh, N.² Yajnik, S.³

22
- 85164990746
- Hellerstein. Mining partially periodic event patterns with unknown periods
- S. Ma and J. L. Hellerstein. Mining partially periodic event patterns with unknown periods. In Proceedings of IEEE International Conference on Data Engineering (ICDE), 2001.
- Proceedings of IEEE International Conference on Data Engineering (ICDE), 2001
- Ma, S.¹ L, J.²

23
- 0024132220
- Analysis of workload influence on dependability
- J. Meyer and L. Wei. Analysis of workload influence on dependability. In Proceedings of Symposium on Fault-Tolerant Computing (FTCS), 1988.
- Proceedings of Symposium on Fault-Tolerant Computing (FTCS), 1988
- Meyer, J.¹ Wei, L.²

24
- 36049013419
- What supercomputers say: A study of five system logs
- A. J. Oliner and J. Stearley. What supercomputers say: A study of five system logs. In Proceedings of IEEE International Conference on Dependable Systems and Networks (DSN), 2007.
- Proceedings of IEEE International Conference on Dependable Systems and Networks (DSN), 2007
- Oliner, A.J.¹ Stearley, J.²

25
- 84875570984
- Why do Internet services fail, and what can be done about it
- D. Oppenheimer, A. Ganapathi, and D. Patterson. Why do Internet services fail, and what can be done about it. In Proceedings of USENIX Symposium on Internet Technologies and Systems (USITS), 2003.
- Proceedings of USENIX Symposium on Internet Technologies and Systems (USITS), 2003
- Oppenheimer, D.¹ Ganapathi, A.² Patterson, D.³

26
- 33745508913
- Mining logs files for computing system management
- W. Peng, T. Li, and S. Ma. Mining logs files for computing system management. In Proceedings of IEEE International Conference on Automatic Computing (ICAC), 2005.
- Proceedings of IEEE International Conference on Automatic Computing (ICAC), 2005
- Peng, W.¹ Li, T.² Ma, S.³

27
- 34548010919
- Software failures and the road to a petaflop machine
- I. Philp. Software failures and the road to a petaflop machine. In Proceedings of International Symposium on High Performance Computer Architecture Workshop, 2005.
- Proceedings of International Symposium on High Performance Computer Architecture Workshop, 2005
- Philp, I.¹

28
- 4544382099
- Failure data analysis of a large-scale heterogeneous server environment
- R. K. Sahoo, A. Sivasubramaniam, M. S. Squillante, and Y. Zhang. Failure data analysis of a large-scale heterogeneous server environment. In Proceedings of IEEE Conference on Dependable Systems and Networks (DSN), 2004.
- Proceedings of IEEE Conference on Dependable Systems and Networks (DSN), 2004
- Sahoo, R.K.¹ Sivasubramaniam, A.² Squillante, M.S.³ Zhang, Y.⁴

29
- 33845593340
- A large-scale study of failures in high-performance-computing systems
- B. Schroeder and G. Gibson. A large-scale study of failures in high-performance-computing systems. In Proceedings of IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), June 2006.
- Proceedings of IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), June 2006
- Schroeder, B.¹ Gibson, G.²

30
- 50649093917
- Triage: Diagnosing production run failures at the user's site
- J. Tucek, S. Lu, C. Huang, S. Xanthos, and Y. Zhou. Triage: diagnosing production run failures at the user's site. In Proceedings of ACM Symposium on Operating Systems Principles (SOSP), 2007.
- Proceedings of ACM Symposium on Operating Systems Principles (SOSP), 2007
- Tucek, J.¹ Lu, S.² Huang, C.³ Xanthos, S.⁴ Zhou, Y.⁵

31
- 78149354391
- Predicting rare events in temporal domains
- R. Vilalta and S. Ma. Predicting rare events in temporal domains. In Proceedings of IEEE International Conference on Data Mining (ICDM), 2002.
- Proceedings of IEEE International Conference on Data Mining (ICDM), 2002
- Vilalta, R.¹ Ma, S.²

32
- 67650672322
- Beyond availability: Towards a deeper understanding of machine failure characteristics in large distributed systems
- P. Yalagandula, S. Nath, H. Yu, P. B. Gibbons, and S. Sesha. Beyond availability: Towards a deeper understanding of machine failure characteristics in large distributed systems. In Proceedings of USENIX Work- shop on Real, Large Distributed Systems (WORLDS), 2004.
- Proceedings of USENIX Work- Shop on Real, Large Distributed Systems (WORLDS), 2004
- Yalagandula, P.¹ Nath, S.² Yu, H.³ Gibbons, P.B.⁴ Sesha, S.⁵

33
- 32444434211
- Dynamic syslog mining for network failure monitoring
- K. Yamanishi and Y. Maruyama. Dynamic syslog mining for network failure monitoring. In Proceedings of ACM International Conference on Knowledge Discovery in Data Mining (KDD), 2005.
- Proceedings of ACM International Conference on Knowledge Discovery in Data Mining (KDD), 2005
- Yamanishi, K.¹ Maruyama, Y.²

34
- 33845595513
- Sahoo. Performance implications of failures in large-scale cluster scheduling
- Y. Zhang, M. S. Squillante, A. Sivasubramaniam, and R. K. Sahoo. Performance implications of failures in large-scale cluster scheduling. In Proceedings of the 10th Workshop on Job Scheduling Strategies for Parallel Processing, 2004.
- Proceedings of the 10th Workshop on Job Scheduling Strategies for Parallel Processing, 2004
- Zhang, Y.¹ Squillante, M.S.² Sivasubramaniam, A.³ K, R.⁴

35
- 77954054232
- Failure prediction for autonomic management of networked computer systems with availability assurance
- Z. Zhang and S. Fu. Failure prediction for autonomic management of networked computer systems with availability assurance. In Proceedings of IEEE Workshop on Dependable Parallel, Distributed and Network-Centric Systems in conjunction with IEEE International Parallel and Distributed Processing Sym- posium (IPDPS), 2010.
- Proceedings of IEEE Workshop on Dependable Parallel, Distributed and Network-Centric Systems in Conjunction with IEEE International Parallel and Distributed Processing Sym- Posium (IPDPS), 2010
- Zhang, Z.¹ Fu, S.²

36
- 77956424071
- Proactive failure management for high availability computing in computer clusters
- Z. Zhang and S. Fu. Proactive failure management for high availability computing in computer clusters. In Proceedings of IEEE International Conference on Computational Sciences and Optimization, 2010.
- Proceedings of IEEE International Conference on Computational Sciences and Optimization, 2010
- Zhang, Z.¹ Fu, S.²

37
- 4544299163
- Failure diagnosis using decision trees
- A. Zheng, J. Lloyd, and E. Brewer. Failure diagnosis using decision trees. In Proceedings of IEEE Inter- national Conference on Automatic Computing (ICAC), 2004.
- Proceedings of IEEE Inter- National Conference on Automatic Computing (ICAC), 2004
- Zheng, A.¹ Lloyd, J.² Brewer, E.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.