-
1
-
-
72049094135
-
-
Cobalt project web site, http://trac.mcs.anl.gov/projects/cobalt.
-
-
-
-
2
-
-
72049133036
-
-
Blue Gene/P red books, http://www.redbooks.ibm.com/abstracts/sg247417. html.
-
Blue Gene/P Red Books
-
-
-
3
-
-
28044457320
-
Monitoring hard disk with SMART
-
B. Allen, "Monitoring Hard Disk with SMART," Linux Journal, 2004.
-
(2004)
Linux Journal
-
-
Allen, B.1
-
5
-
-
72049095187
-
-
DOE INCITE project web site, http://www.sc.doe.gov/ascr/incite.
-
-
-
-
6
-
-
55749104919
-
Simulating failures on large-scale systems
-
N. Desai, E. Lusk, D. Buettner, A. Cherry, and Theron Voran, "Simulating Failures on Large-Scale Systems," International Conference on Parallel Processing - Workshops (ICPPW'08), 2008.
-
(2008)
International Conference on Parallel Processing - Workshops (ICPPW'08)
-
-
Desai, N.1
Lusk, E.2
Buettner, D.3
Cherry, A.4
Voran, T.5
-
8
-
-
72049099479
-
-
Health API, http://www.renci.org.
-
Health API
-
-
-
9
-
-
55849147399
-
Dynamic meta-learning for failure prediction in large-scale systems: A case study
-
J. Gu, Z. Zheng, Z. Lan, J. White, E. Hocks, and B. Park, "Dynamic Meta-Learning for Failure Prediction in Large-Scale Systems: A Case Study," Proc. of International Conference on Parallel Processing (ICPP'08), 2008.
-
(2008)
Proc. of International Conference on Parallel Processing (ICPP'08)
-
-
Gu, J.1
Zheng, Z.2
Lan, Z.3
White, J.4
Hocks, E.5
Park, B.6
-
10
-
-
47249153592
-
A meta-learning failure predictor for blue gene/l systems
-
P. Gujrati, Y. Li, Z. Lan, R. Thakur, and J. White, "A Meta-Learning Failure Predictor for Blue Gene/L Systems," Proc. of International Conference on Parallel Processing (ICPP'07), 2007.
-
(2007)
Proc. of International Conference on Parallel Processing (ICPP'07)
-
-
Gujrati, P.1
Li, Y.2
Lan, Z.3
Thakur, R.4
White, J.5
-
13
-
-
67649883517
-
Fault-aware runtime strategies for high performance computing
-
Y. Li, Z. Lan, P. Gujrati, and X. Sun, "Fault-Aware Runtime Strategies for High Performance Computing," IEEE Transactions on Parallel and Distributed Systems, vol.20, no.4, pp. 460-473, 2009.
-
(2009)
IEEE Transactions on Parallel and Distributed Systems
, vol.20
, Issue.4
, pp. 460-473
-
-
Li, Y.1
Lan, Z.2
Gujrati, P.3
Sun, X.4
-
14
-
-
53349092000
-
Blue gene /L failure analysis and models
-
Y. Liang, Y. Zhang, A. Sivasubramaniam, M. Jette, and R. Sahoo,"Blue Gene /L Failure Analysis and Models," Proc. of DSN'06, 2006.
-
(2006)
Proc. of DSN'06
-
-
Liang, Y.1
Zhang, Y.2
Sivasubramaniam, A.3
Jette, M.4
Sahoo, R.5
-
16
-
-
72049096753
-
-
Maui project web site, http://mauischeduler.sourceforge.net.
-
-
-
-
17
-
-
12444257746
-
Fault-aware job scheduling for bluegene/L systems
-
A. Oliner, R. Sahoo, J. Moreira, and M. Gupta, "Fault-aware Job Scheduling for BlueGene/L Systems," Proc. of the 18th International Parallel and Distributed Processing Symposium (IPDPS'04), 2004.
-
(2004)
Proc. of the 18th International Parallel and Distributed Processing Symposium (IPDPS'04)
-
-
Oliner, A.1
Sahoo, R.2
Moreira, J.3
Gupta, M.4
-
18
-
-
36049013419
-
What supercomputers say: A study of five system logs
-
A. Oliner and J. Stearly, "What Supercomputers Say: A Study of Five System Logs," Proc. of DSN'07, 2007.
-
(2007)
Proc. of DSN'07
-
-
Oliner, A.1
Stearly, J.2
-
19
-
-
72049103536
-
-
TOP500 Super Computing Web Site, http://www.top500.org.
-
-
-
-
20
-
-
77952378080
-
Critical event prediction for proactive management in large-scale computer clusters
-
R. Sahoo, A. Oliner, I. Rish, M. Gupta, J. Moreira, S. Ma, R. Vilalta, and A. Sivasubramaniam, "Critical Event Prediction for Proactive Management in Large-scale Computer Clusters," Proc. of International Conference on Knowledge Discovery and Data Mining, 2003.
-
(2003)
Proc. of International Conference on Knowledge Discovery and Data Mining
-
-
Sahoo, R.1
Oliner, A.2
Rish, I.3
Gupta, M.4
Moreira, J.5
Ma, S.6
Vilalta, R.7
Sivasubramaniam, A.8
-
21
-
-
33845593340
-
A large-scale study of failures in high performance computing systems
-
B. Schroeder and G. A. Gibson, "A Large-scale Study of Failures in High Performance Computing Systems," Proc. of DSN'06, 2006.
-
(2006)
Proc. of DSN'06
-
-
Schroeder, B.1
Gibson, G.A.2
-
22
-
-
53349098075
-
Evaluation of fault-tolerant policies using simulation
-
A. Tikotekar, G. Vallee, T. Naughton, S. Scott, C. Leangsuksum, "Evaluation of Fault-Tolerant Policies Using Simulation," Proc. of IEEE Cluster'07, 2007
-
(2007)
Proc. of IEEE Cluster'07
-
-
Tikotekar, A.1
Vallee, G.2
Naughton, T.3
Scott, S.4
Leangsuksum, C.5
-
23
-
-
0033355546
-
A measurement-based model for estimation of resource exhaustion in operational software systems
-
K. Trivedi and K. Vaidyanathan, "A Measurement-based Model for Estimation of Resource Exhaustion in Operational Software Systems," Proc. of ISSRE'99, 1999.
-
(1999)
Proc. of ISSRE'99
-
-
Trivedi, K.1
Vaidyanathan, K.2
-
24
-
-
72049089964
-
Adaptive data-aware utility-based scheduling in resource-constrained systems
-
D. Vengerov, L. Mastroleon, D. Murphy, and N. Bambos, "Adaptive Data-Aware Utility-Based Scheduling in Resource-Constrained Systems," Research Disclosure, No.513, pp. 38-39, 2007.
-
(2007)
Research Disclosure
, vol.513
, pp. 38-39
-
-
Vengerov, D.1
Mastroleon, L.2
Murphy, D.3
Bambos, N.4
-
25
-
-
78149354391
-
Predicting rare events in temporal domains
-
R. Vilalta and S. Ma, "Predicting Rare Events in Temporal Domains," Proc. of ICDM'02, 2002.
-
(2002)
Proc. of ICDM'02
-
-
Vilalta, R.1
Ma, S.2
-
26
-
-
85166352696
-
Learning to predict rare events in eevent sequences
-
G. Weiss and H.Hirsh, "Learning to Predict Rare Events in Eevent Sequences," Proc. of SIGKDD, 1998.
-
(1998)
Proc. of SIGKDD
-
-
Weiss, G.1
Hirsh, H.2
|