-
1
-
-
70349657128
-
Blue-Gene/L Log Analysis and Time to Interrupt Estimation
-
N. Taerat, Y. Zhang, A. Sivasubramaniam, M. Jette, R. Sahoo: Blue-Gene/L Log Analysis and Time to Interrupt Estimation. International Conference on Availability, Reliability and Security, pp.173-180, 2009
-
(2009)
International Conference on Availability, Reliability and Security
, pp. 173-180
-
-
Taerat, N.1
Zhang, Y.2
Sivasubramaniam, A.3
Jette, M.4
Sahoo, R.5
-
3
-
-
78650462147
-
PGP-mc: Towards a Multicore Parallel Approach for Mining Gradual Patterns
-
A. Laurent, B. Negrevergne, N. Sicard, and A. Termier: PGP-mc: Towards a Multicore Parallel Approach for Mining Gradual Patterns Database Systems for Advanced Applications, volume 5981, pp 78-84, 2010
-
(2010)
Database Systems for Advanced Applications
, vol.5981
, pp. 78-84
-
-
Laurent, A.1
Negrevergne, B.2
Sicard, N.3
Termier, A.4
-
6
-
-
77952378080
-
Critical Event Prediction for Proactive Management In Large-scale Computer Clusters
-
R. K. Sahoo et al: Critical Event Prediction for Proactive Management In Large-scale Computer Clusters. International conference on Knowledge discovery and data mining, pp 426-435, 2003
-
(2003)
International Conference on Knowledge Discovery and Data Mining
, pp. 426-435
-
-
Sahoo, R.K.1
-
7
-
-
79951644113
-
Analysis and Modeling of Time-Correlated Failures in Large-Scale Distributed Systems
-
N. Yigitbasi et al: Analysis and Modeling of Time-Correlated Failures in Large-Scale Distributed Systems. IEEE/ACM International Conference on Grid Computing, pp 65-72, 2010
-
(2010)
IEEE/ACM International Conference on Grid Computing
, pp. 65-72
-
-
Yigitbasi, N.1
-
8
-
-
77958132122
-
Mining Dependency in Distributed Systems through Unstructured Logs Analysis
-
January
-
J. G. Lou et al: Mining Dependency in Distributed Systems through Unstructured Logs Analysis ACM SIGOPS Volume 44 Issue 1, January 2010
-
(2010)
ACM SIGOPS
, vol.44
, Issue.1
-
-
Lou, J.G.1
-
9
-
-
77951145583
-
Online System Problem Detection by Mining Patterns of Console Logs
-
W. Xu et al: Online System Problem Detection by Mining Patterns of Console Logs IEEE International Conference on Data Mining, pp 588-597, 2009
-
(2009)
IEEE International Conference on Data Mining
, pp. 588-597
-
-
Xu, W.1
-
11
-
-
55849147399
-
Dynamic Meta-Learning for Failure Prediction in Large-Scale Systems: A case Study
-
J. Gu et al: Dynamic Meta-Learning for Failure Prediction in Large-Scale Systems: A case Study International Conference on Parallel Processing, pp 157-164, 2008
-
(2008)
International Conference on Parallel Processing
, pp. 157-164
-
-
Gu, J.1
-
12
-
-
81055139569
-
Adaptive Event Prediction Strategy with Dynamic Time Window for Large-Scale HPC Systems
-
A. Gainaru, F. Cappello, S. Trausan-Matu, W. Kramer: Adaptive Event Prediction Strategy with Dynamic Time Window for Large-Scale HPC Systems. System Log Analysis with Machine Learning Workshop, SLAML, 2011
-
System Log Analysis with Machine Learning Workshop, SLAML, 2011
-
-
Gainaru, A.1
Cappello, F.2
Trausan-Matu, S.3
Kramer, W.4
-
13
-
-
84877719832
-
LogMaster: Mining Event Correlations in Logs of Large-scale Cluster Systems
-
abs/1003.0951
-
R. Ren et al: LogMaster: Mining Event Correlations in Logs of Large-scale Cluster Systems CoRR abs/1003.0951, 2010
-
(2010)
CoRR
-
-
Ren, R.1
-
15
-
-
80052352428
-
Event log mining tool for large scale HPC systems
-
A. Gainaru, F. Cappello, S. Trausan-Matu, W. Kramer: Event log mining tool for large scale HPC systems. International Conference on Parallel Processing Euro-Par, volume 1, pp 52-64, 2011.
-
(2011)
International Conference on Parallel Processing Euro-Par
, vol.1
, pp. 52-64
-
-
Gainaru, A.1
Cappello, F.2
Trausan-Matu, S.3
Kramer, W.4
-
16
-
-
0025416073
-
Automatic recognition of intermittent failures: An experimental study of field data
-
R. Iyer et al: Automatic recognition of intermittent failures: An experimental study of field data. IEEE Transactions on Computers, 39:525537, 1990.
-
(1990)
IEEE Transactions on Computers
, vol.39
, pp. 525537
-
-
Iyer, R.1
-
17
-
-
80051915968
-
Improving Log-Based Field Failure Data Analysis of Multi-Node Computing Systems
-
A. Pecchia, D. Cotroneo, Z. Kalbarczyk, R. Iyer: Improving Log-Based Field Failure Data Analysis of Multi-Node Computing Systems International Conference on Dependable Systems and Networks (DSN), pp 97-108, 2011
-
(2011)
International Conference on Dependable Systems and Networks (DSN)
, pp. 97-108
-
-
Pecchia, A.1
Cotroneo, D.2
Kalbarczyk, Z.3
Iyer, R.4
-
18
-
-
70449794134
-
System log preprocessing to improve failure prediction
-
Z. Zheng, Z. Lan, B. Park, and A. Geist: System log preprocessing to improve failure prediction. International Conference on Dependable Systems and Networks, pp 572-577, 2009.
-
(2009)
International Conference on Dependable Systems and Networks
, pp. 572-577
-
-
Zheng, Z.1
Lan, Z.2
Park, B.3
Geist, A.4
-
20
-
-
83155160934
-
I Modeling and Tolerating Heterogeneous Failures in Large Parallel Systems
-
E. Heien, D. Kondo, A. Gainaru, D. LaPine, W. Kramer, F. Cappello: I Modeling and Tolerating Heterogeneous Failures in Large Parallel Systems. International Conference for High Performance Computing, Networking, Storage and Analysis, 2011
-
International Conference for High Performance Computing, Networking, Storage and Analysis, 2011
-
-
Heien, E.1
Kondo, D.2
Gainaru, A.3
LaPine, D.4
Kramer, W.5
Cappello, F.6
-
22
-
-
0001265595
-
An Extended Table of Critical Values for the Mann-Whitney (Wilcoxon) Two-Sample Statistic
-
R. C. Milton: An Extended Table of Critical Values for the Mann-Whitney (Wilcoxon) Two-Sample Statistic. Journal of the American Statistical Association, Volume 59, Issue 3, 1978
-
(1978)
Journal of the American Statistical Association
, vol.59
, Issue.3
-
-
Milton, R.C.1
-
23
-
-
84877687209
-
-
Accessed on 2010
-
National Center for Supercomputing Applications at the University of Illinois. www.ncsa.illinois.edu. Accessed on 2010.
-
-
-
-
26
-
-
20444463494
-
FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI
-
G. Zheng et al: FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI. International Conference on Cluster Computing CLUSTER, pp 93-103, 2004
-
(2004)
International Conference on Cluster Computing CLUSTER
, pp. 93-103
-
-
Zheng, G.1
-
27
-
-
85032796232
-
Rebound: Scalable Checkpointing for Coherent Shared Memory
-
R. Agarwal et al: Rebound: Scalable Checkpointing for Coherent Shared Memory. ACM SIGARCH Computer Architecture News, Volume 39 Issue 3, 2011
-
(2011)
ACM SIGARCH Computer Architecture News
, vol.39
, Issue.3
-
-
Agarwal, R.1
-
28
-
-
78650831692
-
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System
-
A. Moody, G. Bronevetsky, K. Mohror, B. R. de Supinski: Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System. ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-11, 2010
-
(2010)
ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
, pp. 1-11
-
-
Moody, A.1
Bronevetsky, G.2
Mohror, K.3
De Supinski, B.R.4
-
29
-
-
77956589566
-
A Practical Failure Prediction with Location and Lead Time for Blue Gene/P
-
Z. Zheng, Z. Lan, R. Gupta, S. Coghlan, P. Beckman: A Practical Failure Prediction with Location and Lead Time for Blue Gene/P Proceedings of the 2010 International Conference on Dependable Systems and Networks Workshops, pp 15-22, 2010
-
(2010)
Proceedings of the 2010 International Conference on Dependable Systems and Networks Workshops
, pp. 15-22
-
-
Zheng, Z.1
Lan, Z.2
Gupta, R.3
Coghlan, S.4
Beckman, P.5
-
30
-
-
70350755748
-
Proactive process-level live migration in HPC environments
-
C. Wang, F. Mueller, C. Engelmann, and S. Scott: Proactive process-level live migration in HPC environments. International Conference for High Performance Computing, Networking, Storage and Analysis, 2008.
-
International Conference for High Performance Computing, Networking, Storage and Analysis, 2008
-
-
Wang, C.1
Mueller, F.2
Engelmann, C.3
Scott, S.4
-
31
-
-
55849147399
-
Dynamic meta-learning for failure prediction in large-scale systems: A case study
-
J. Gu, Z. Zheng, Z. Lan, J. White, and B. Park: Dynamic meta-learning for failure prediction in large-scale systems: A case study. International Conference on Parallel Processing, 2008.
-
International Conference on Parallel Processing, 2008
-
-
Gu, J.1
Zheng, Z.2
Lan, Z.3
White, J.4
Park, B.5
-
33
-
-
4444380999
-
A survey of fault localization techniques in computer networks
-
M. Steinder and A. Sethi: A survey of fault localization techniques in computer networks. Science of Computer Programming, volume 53, issue 2, 2004
-
(2004)
Science of Computer Programming
, vol.53
, Issue.2
-
-
Steinder, M.1
Sethi, A.2
|