-
1
-
-
0025502686
-
Error log analysis: Statistical modeling and heuristic trend analysis
-
T.-T. Y. Lin and D. P. Siewiorek, "Error log analysis: Statistical modeling and heuristic trend analysis," IEEE Transactions on Reliability, vol. 39, no. 4, 1990.
-
(1990)
IEEE Transactions on Reliability
, vol.39
, Issue.4
-
-
Lin, T.-T.Y.1
Siewiorek, D.P.2
-
3
-
-
33847328785
-
Availability assessment of sunos/solaris unix systems based on syslogd and wtmpx log files: A case study
-
C. Simache and M. Kaaniche, "Availability assessment of sunos/solaris unix systems based on syslogd and wtmpx log files: A case study," in Proceedings of IEEE PRDC, Dec 2005.
-
Proceedings of IEEE PRDC, Dec 2005
-
-
Simache, C.1
Kaaniche, M.2
-
5
-
-
33845593340
-
A large-scale study of failures in high-performance computing systems
-
B. Schroeder and G. Gibson, "A large-scale study of failures in high-performance computing systems," in Proceedings of IEEE/IFIP DSN, 2006, pp. 249-258.
-
Proceedings of IEEE/IFIP DSN, 2006
, pp. 249-258
-
-
Schroeder, B.1
Gibson, G.2
-
7
-
-
67349271621
-
An analysis of clustered failures on large supercomputing systems
-
T. J. Hacker, F. Romero, and C. D. Carothers, "An analysis of clustered failures on large supercomputing systems," Journal of Parallel and Distributed Computing, vol. 69, no. 7, 2009.
-
(2009)
Journal of Parallel and Distributed Computing
, vol.69
, Issue.7
-
-
Hacker, T.J.1
Romero, F.2
Carothers, C.D.3
-
8
-
-
85092792131
-
Analyzing system logs: A new view of what's important
-
S. Sabato, E. Yom-Tov, A. Tsherniak, and S. Rosset, "Analyzing system logs: A new view of what's important," in 2nd USENIX workshop on Tackling Computer Systems Problems with Machine Learning Techniques, 2007.
-
2nd USENIX Workshop on Tackling Computer Systems Problems with Machine Learning Techniques, 2007
-
-
Sabato, S.1
Yom-Tov, E.2
Tsherniak, A.3
Rosset, S.4
-
9
-
-
84856079819
-
One graph is worth a thousand logs: Uncovering hidden structures in massive system event logs
-
M. Aharon, G. Barash, I. Cohen, and E. Mordechai, "One graph is worth a thousand logs: Uncovering hidden structures in massive system event logs," in Proceedings of ECML PKDD, 2009.
-
Proceedings of ECML PKDD, 2009
-
-
Aharon, M.1
Barash, G.2
Cohen, I.3
Mordechai, E.4
-
11
-
-
75449097851
-
Toward automated anomaly identification in large-scale systems
-
Z. Lan, Z. Zheng, and Y. Li, "Toward automated anomaly identification in large-scale systems," IEEE Transactions on Parallel and Distributed Systems, vol. 21, no. 2, 2010.
-
(2010)
IEEE Transactions on Parallel and Distributed Systems
, vol.21
, Issue.2
-
-
Lan, Z.1
Zheng, Z.2
Li, Y.3
-
12
-
-
4243934975
-
-
PhD Thesis, Department of Electrical and Computer Engineering, Carnegie Mellon University
-
M. M. Tsao, "Trend analysis and fault prediction," PhD Thesis, Department of Electrical and Computer Engineering, Carnegie Mellon University, 1983.
-
(1983)
Trend Analysis and Fault Prediction
-
-
Tsao, M.M.1
-
13
-
-
33845589803
-
Bluegene/l failure analysis and prediction models
-
Y. Liang, Y. Zhang, M. Jette, A. Sivasubramaniam, and R. Sahoo, "Bluegene/l failure analysis and prediction models," in Proceedings of IEEE/IFIP DSN, 2006.
-
Proceedings of IEEE/IFIP DSN, 2006
-
-
Liang, Y.1
Zhang, Y.2
Jette, M.3
Sivasubramaniam, A.4
Sahoo, R.5
-
14
-
-
49749107565
-
Failure prediction in ibm bluegene/l event logs
-
Y. Liang, Y. Zhang, H. Xiong, and R. Sahoo, "Failure prediction in ibm bluegene/l event logs," in Proceedings of IEEE ICDM, 2007.
-
Proceedings of IEEE ICDM, 2007
-
-
Liang, Y.1
Zhang, Y.2
Xiong, H.3
Sahoo, R.4
-
15
-
-
56749178938
-
Exploring event correlation for failure prediction in coalitions of clusters
-
S. Fu and C.-Z. Xu, "Exploring event correlation for failure prediction in coalitions of clusters," in Proceedings of ACM/IEEE Supercomputing, no. 41, 2007.
-
(2007)
Proceedings of ACM/IEEE Supercomputing
, Issue.41
-
-
Fu, S.1
Xu, C.-Z.2
-
16
-
-
84856108300
-
A practical failure prediction with location and lead time for blue gene/p
-
Z. Zheng, Z. Lan, R. Gupta, S. Coghlan, and P. Beckman, "A practical failure prediction with location and lead time for blue gene/p," in 1st Workshop on Fault-Tolerance for HPC at Extreme Scale (in conjunction with IEEE/IFIP DSN 2010), 2010.
-
1st Workshop on Fault-Tolerance for HPC at Extreme Scale (In Conjunction with IEEE/IFIP DSN 2010), 2010
-
-
Zheng, Z.1
Lan, Z.2
Gupta, R.3
Coghlan, S.4
Beckman, P.5
-
17
-
-
77951205449
-
A study of dynamic meta-learning for failure prediction in large-scale systems
-
Z. Lan, J. Gu, Z. Zheng, R. Thakur, and S. Coghlan, "A study of dynamic meta-learning for failure prediction in large-scale systems," Journal of Parallel and Distributed Computing (JPDC), vol. 70, no. 6, 2010.
-
(2010)
Journal of Parallel and Distributed Computing (JPDC)
, vol.70
, Issue.6
-
-
Lan, Z.1
Gu, J.2
Zheng, Z.3
Thakur, R.4
Coghlan, S.5
-
18
-
-
70449794134
-
System log pre-processing to improve failure prediction
-
Z. Zheng, Z. Lan, B. Park, and A. Geist, "System log pre-processing to improve failure prediction," in Proceedings of IEEE/IFIP DSN, 2009.
-
Proceedings of IEEE/IFIP DSN, 2009
-
-
Zheng, Z.1
Lan, Z.2
Park, B.3
Geist, A.4
-
19
-
-
84856115820
-
A fault diagnosis and prognosis service for teragrid clusters
-
Z. Lan, P. Gujrati, Y. Li, Z. Zheng, R. Thakur, and J. White, "A fault diagnosis and prognosis service for teragrid clusters," in Proceedings of ACM TeraGrid, 2007.
-
Proceedings of ACM TeraGrid, 2007
-
-
Lan, Z.1
Gujrati, P.2
Li, Y.3
Zheng, Z.4
Thakur, R.5
White, J.6
-
20
-
-
77956291503
-
End-to-end framework for fault management for open source clusters: Ranger
-
J. L. Hammond, T. Minyard, and J. Browne, "End-to-end framework for fault management for open source clusters: Ranger," in Proceedings of ACM TeraGrid, no. 9, 2010.
-
(2010)
Proceedings of ACM TeraGrid
, Issue.9
-
-
Hammond, J.L.1
Minyard, T.2
Browne, J.3
-
21
-
-
79952790201
-
Diagnosing the root-causes of failures from cluster log files
-
E. Chuah, S.-H. Kuo, P. Hiew, W.-C. Tjhi, G. Lee, J. Hammond, M. T. Michalewicz, T. Hung, and J. C. Browne, "Diagnosing the root-causes of failures from cluster log files," in Proceedings of IEEE HiPC, Dec 19-22 2010.
-
Proceedings of IEEE HiPC, Dec 19-22 2010
-
-
Chuah, E.1
Kuo, S.-H.2
Hiew, P.3
Tjhi, W.-C.4
Lee, G.5
Hammond, J.6
Michalewicz, M.T.7
Hung, T.8
Browne, J.C.9
-
24
-
-
49049104267
-
Automated system monitoring and notification with swatch
-
S. E. Hansen and E. T. Atkins, "Automated system monitoring and notification with swatch," in USENIX LISA, 1993.
-
(1993)
USENIX LISA
-
-
Hansen, S.E.1
Atkins, E.T.2
-
26
-
-
26844519400
-
Path-based faliure and evolution management
-
M. Y. Chen, A. Accardi, E. Kiciman, J. Lloyd, D. Patterson, A. Fox, and E. Brewer, "Path-based faliure and evolution management," in Proceedings of NSDI, 2004.
-
Proceedings of NSDI, 2004
-
-
Chen, M.Y.1
Accardi, A.2
Kiciman, E.3
Lloyd, J.4
Patterson, D.5
Fox, A.6
Brewer, E.7
-
27
-
-
0036930823
-
Pinpoint: Problem determination in large, dynamic internet services
-
M. Y. Chen, E. Kiciman, E. Fratkin, A. Fox, and E. Brewer, "Pinpoint: Problem determination in large, dynamic internet services," in Proceedings of IEEE/IFIP DSN, 2002.
-
Proceedings of IEEE/IFIP DSN, 2002
-
-
Chen, M.Y.1
Kiciman, E.2
Fratkin, E.3
Fox, A.4
Brewer, E.5
-
28
-
-
77957761115
-
Problem diagnosis for mapreduce-based cloud computing environments
-
J. Tan, X. Pan, E. Marinelli, S. Kavulya, R. Gandhi, and P. Narasimhan, "Problem diagnosis for mapreduce-based cloud computing environments, "in Proceedings of IEEE/IFIP NOMS, 2010.
-
Proceedings of IEEE/IFIP NOMS, 2010
-
-
Tan, J.1
Pan, X.2
Marinelli, E.3
Kavulya, S.4
Gandhi, R.5
Narasimhan, P.6
-
29
-
-
77955941295
-
Visual, log-based causal tracing for performance debugging of mapreduce systems
-
J. Tan, S. Kavulya, R. Gandhi, and P. Narasimhan, "Visual, log-based causal tracing for performance debugging of mapreduce systems," in Proceedings of IEEE ICDCS, 2010.
-
Proceedings of IEEE ICDCS, 2010
-
-
Tan, J.1
Kavulya, S.2
Gandhi, R.3
Narasimhan, P.4
|