메뉴 건너뛰기




Volumn , Issue , 2012, Pages 181-190

3-Dimensional root cause diagnosis via co-analysis

Author keywords

Co analysis; Diagnosis; Large scale system

Indexed keywords

3-DIMENSIONAL; BLUE GENE; CO-ANALYSIS; DIAGNOSIS INFORMATION; ERROR PRONES; LOCATION INFORMATION; MANUAL PROCESSING; OAK RIDGE NATIONAL LABORATORY; ROOT CAUSE; ROOT CAUSE ANALYSIS; SYSTEM ADMINISTRATORS; SYSTEM SIZE;

EID: 84867695274     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/2371536.2371571     Document Type: Conference Paper
Times cited : (27)

References (35)
  • 8
    • 77955737995 scopus 로고    scopus 로고
    • High-end computing resilience: Analysis of issues facing the HEC community and path-forward for research and development
    • N. DeBardeleben, J. Laros, J. Daly, S. Scott, C. Engelmann, and B. Harrod. High-end computing resilience: Analysis of issues facing the HEC community and path-forward for research and development. White Paper, 2009.
    • (2009) White Paper
    • Debardeleben, N.1    Laros, J.2    Daly, J.3    Scott, S.4    Engelmann, C.5    Harrod, B.6
  • 10
    • 81055139569 scopus 로고    scopus 로고
    • Adaptive event prediction strategy with dynamic time window for large-scale HPC systems
    • A. Gainaru, F. Cappello, F. J., and S. Trausan. Adaptive event prediction strategy with dynamic time window for large-scale HPC systems. In Proceedings of SLAML, 2011.
    • (2011) Proceedings of SLAML
    • Gainaru, A.1    Cappello, F.J.F.2    Trausan, S.3
  • 11
    • 84867730282 scopus 로고    scopus 로고
    • DMTracker: Finding bugs in large-scale parallel programs by detecting anomaly in data movements
    • Q. Gao, F. Qin, and D. Panda. DMTracker: Finding bugs in large-scale parallel programs by detecting anomaly in data movements. In Proceedings of Supercomputing, 2006.
    • (2006) Proceedings of Supercomputing
    • Gao, Q.1    Qin, F.2    Panda, D.3
  • 15
    • 26844568000 scopus 로고    scopus 로고
    • Detecting application-level failures in component-based internet services
    • E. Kiciman and A. Fox. Detecting application-level failures in component-based internet services. IEEE Trans. Neural Networks, 16(5):1027-1041, 2005.
    • (2005) IEEE Trans. Neural Networks , vol.16 , Issue.5 , pp. 1027-1041
    • Kiciman, E.1    Fox, A.2
  • 16
    • 84867719410 scopus 로고    scopus 로고
    • Exascale computing study: Technology challenges in achieving exascale systems
    • P. Kogge and et al. Exascale computing study: Technology challenges in achieving exascale systems. White Paper, 2008.
    • (2008) White Paper
    • Kogge, P.1
  • 17
    • 84867725181 scopus 로고    scopus 로고
    • IBM BlueGene solution: System administration
    • G. Lakner and G. Mullen-Schultz. IBM BlueGene solution: System administration. IBM Redbook, 2007.
    • (2007) IBM Redbook
    • Lakner, G.1    Mullen-Schultz, G.2
  • 18
    • 75449097851 scopus 로고    scopus 로고
    • Toward automated anomaly identification in large-scale systems
    • Z. Lan, Z. Zheng, and Y. Li. Toward automated anomaly identification in large-scale systems. IEEE Trans. on Parallel and Distributed Systems, 21(2):174-187, 2010.
    • (2010) IEEE Trans. on Parallel and Distributed Systems , vol.21 , Issue.2 , pp. 174-187
    • Lan, Z.1    Zheng, Z.2    Li, Y.3
  • 21
    • 78650101944 scopus 로고    scopus 로고
    • Model-based fault localization: Finding behavioral outliers in large-scale computing systems
    • N. Maruyama and S. Matsuoka. Model-based fault localization: Finding behavioral outliers in large-scale computing systems. New Generation Comput, 28:237-255, 2010.
    • (2010) New Generation Comput , vol.28 , pp. 237-255
    • Maruyama, N.1    Matsuoka, S.2
  • 24
    • 36049013419 scopus 로고    scopus 로고
    • What supercomputers say: A study of five system logs
    • A. Oliner and J. Stearley. What supercomputers say: A study of five system logs. In Proceedings of DSN, 2007.
    • (2007) Proceedings of DSN
    • Oliner, A.1    Stearley, J.2
  • 27
    • 80051915968 scopus 로고    scopus 로고
    • Improving log-based field failure data analysis of multi-node computing systems
    • A. Pecchia, D. Cotroneo, Z. Kalbarczyk, and R. Iyer. Improving log-based field failure data analysis of multi-node computing systems. In Proceedings of DSN, 2011.
    • (2011) Proceedings of DSN
    • Pecchia, A.1    Cotroneo, D.2    Kalbarczyk, Z.3    Iyer, R.4
  • 28
    • 80052147473 scopus 로고    scopus 로고
    • Identifying faults in large-scale distributed systems by filtering noisy error logs
    • X. Rao, H. Wang, D. Shi, Z. Chen, H. Cai, and Q. Zhou. Identifying faults in large-scale distributed systems by filtering noisy error logs. In Proceedings of DSNW, 2011.
    • (2011) Proceedings of DSNW
    • Rao, X.1    Wang, H.2    Shi, D.3    Chen, Z.4    Cai, H.5    Zhou, Q.6
  • 31
    • 33845593340 scopus 로고    scopus 로고
    • A large-scale study of failures in high-performance computing systems
    • B. Schroeder and G. Gibson. A large-scale study of failures in high-performance computing systems. In Proceedings of DSN, 2006.
    • (2006) Proceedings of DSN
    • Schroeder, B.1    Gibson, G.2
  • 34


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.