-
1
-
-
70350774594
-
-
S. Alam, R. Barrett, M. Bast, M. Fahey, J. Kuehn, C. McCurdy, J. Rogers, P. Roth, R. Sankaran, J. Vetter, P. Worley, and W. Yu. Early evaluation of IBM BlueGene/P. Proc. of Supercomputing, 2008.
-
(2008)
Early Evaluation of IBM BlueGene/P. Proc. of Supercomputing
-
-
Alam, S.1
Barrett, R.2
Bast, M.3
Fahey, M.4
Kuehn, J.5
Mccurdy, C.6
Rogers, J.7
Roth, P.8
Sankaran, R.9
Vetter, J.10
Worley, P.11
Yu, W.12
-
3
-
-
84867725176
-
-
SAND 2010-7109, Sandia National Laboratories, October
-
J. Brandt, A. Gentile, C. Houf, J. Mayo, P. Pebay, D. Roe, D. Thompson, and M. Wong. OVIS 3.2 user's guide. SAND 2010-7109, Sandia National Laboratories, October 2010.
-
(2010)
OVIS 3.2 User's Guide
-
-
Brandt, J.1
Gentile, A.2
Houf, C.3
Mayo, J.4
Pebay, P.5
Roe, D.6
Thompson, D.7
Wong, M.8
-
4
-
-
77956600750
-
AutomaDeD: Automata-based debugging for dissimilar parallel tasks
-
G. Bronevetsky, I. Laguna, S. Bagchi, R. Bronis, D. Ahn, and M. Schulz. AutomaDeD: Automata-based debugging for dissimilar parallel tasks. In Proceedings of DSN, 2010.
-
(2010)
Proceedings of DSN
-
-
Bronevetsky, G.1
Laguna, I.2
Bagchi, S.3
Bronis, R.4
Ahn, D.5
Schulz, M.6
-
5
-
-
0036930823
-
Pinpoint: Problem determination in large, dynamic Internet services
-
M. Chen, E. Kiciman, E. Fratkin, A. Fox, and E. Brewer. Pinpoint: problem determination in large, dynamic Internet services. In Proceedings of DSN, 2002.
-
(2002)
Proceedings of DSN
-
-
Chen, M.1
Kiciman, E.2
Fratkin, E.3
Fox, A.4
Brewer, E.5
-
6
-
-
73149122840
-
Holmes: Effective statistical debugging via efficient path profiling
-
T. Chilimbi, B. Liblit, K. Mehra, A. Nori, and K. Vaswani. Holmes: Effective statistical debugging via efficient path profiling. In Proceedings of ICSE, 2009.
-
(2009)
Proceedings of ICSE
-
-
Chilimbi, T.1
Liblit, B.2
Mehra, K.3
Nori, A.4
Vaswani, K.5
-
7
-
-
79952790201
-
Diagnosing the root-causes of failures from cluster log files
-
E. Chuah, S. Kuo, P. Hiew, W. Tjhi, G. Lee, J. Hammond, M. Michalewicz, T. Hung, and J. Browne. Diagnosing the root-causes of failures from cluster log files. In Proceedings of HiPC, 2010.
-
(2010)
Proceedings of HiPC
-
-
Chuah, E.1
Kuo, S.2
Hiew, P.3
Tjhi, W.4
Lee, G.5
Hammond, J.6
Michalewicz, M.7
Hung, T.8
Browne, J.9
-
8
-
-
77955737995
-
High-end computing resilience: Analysis of issues facing the HEC community and path-forward for research and development
-
N. DeBardeleben, J. Laros, J. Daly, S. Scott, C. Engelmann, and B. Harrod. High-end computing resilience: Analysis of issues facing the HEC community and path-forward for research and development. White Paper, 2009.
-
(2009)
White Paper
-
-
Debardeleben, N.1
Laros, J.2
Daly, J.3
Scott, S.4
Engelmann, C.5
Harrod, B.6
-
9
-
-
83455240695
-
Petascale systemmanagement experiences
-
N. Desai, R. Bradshaw, C. Lueninghoener, A. Cherry, S. Coghlan, and W. Scullin. Petascale systemmanagement experiences. In Proceedings of LISA, 2008.
-
(2008)
Proceedings of LISA
-
-
Desai, N.1
Bradshaw, R.2
Lueninghoener, C.3
Cherry, A.4
Coghlan, S.5
Scullin, W.6
-
10
-
-
81055139569
-
Adaptive event prediction strategy with dynamic time window for large-scale HPC systems
-
A. Gainaru, F. Cappello, F. J., and S. Trausan. Adaptive event prediction strategy with dynamic time window for large-scale HPC systems. In Proceedings of SLAML, 2011.
-
(2011)
Proceedings of SLAML
-
-
Gainaru, A.1
Cappello, F.J.F.2
Trausan, S.3
-
11
-
-
84867730282
-
DMTracker: Finding bugs in large-scale parallel programs by detecting anomaly in data movements
-
Q. Gao, F. Qin, and D. Panda. DMTracker: Finding bugs in large-scale parallel programs by detecting anomaly in data movements. In Proceedings of Supercomputing, 2006.
-
(2006)
Proceedings of Supercomputing
-
-
Gao, Q.1
Qin, F.2
Panda, D.3
-
14
-
-
84866493206
-
Dustminer: Troubleshooting interactive complexity bugs in sensor networks
-
M. Khan, H. Le, H. Ahmadi, T. Abdelzaher, and J. Han. Dustminer: troubleshooting interactive complexity bugs in sensor networks. In Proceedings of SenSys, 2008.
-
(2008)
Proceedings of SenSys
-
-
Khan, M.1
Le, H.2
Ahmadi, H.3
Abdelzaher, T.4
Han, J.5
-
15
-
-
26844568000
-
Detecting application-level failures in component-based internet services
-
E. Kiciman and A. Fox. Detecting application-level failures in component-based internet services. IEEE Trans. Neural Networks, 16(5):1027-1041, 2005.
-
(2005)
IEEE Trans. Neural Networks
, vol.16
, Issue.5
, pp. 1027-1041
-
-
Kiciman, E.1
Fox, A.2
-
16
-
-
84867719410
-
Exascale computing study: Technology challenges in achieving exascale systems
-
P. Kogge and et al. Exascale computing study: Technology challenges in achieving exascale systems. White Paper, 2008.
-
(2008)
White Paper
-
-
Kogge, P.1
-
17
-
-
84867725181
-
IBM BlueGene solution: System administration
-
G. Lakner and G. Mullen-Schultz. IBM BlueGene solution: System administration. IBM Redbook, 2007.
-
(2007)
IBM Redbook
-
-
Lakner, G.1
Mullen-Schultz, G.2
-
18
-
-
75449097851
-
Toward automated anomaly identification in large-scale systems
-
Z. Lan, Z. Zheng, and Y. Li. Toward automated anomaly identification in large-scale systems. IEEE Trans. on Parallel and Distributed Systems, 21(2):174-187, 2010.
-
(2010)
IEEE Trans. on Parallel and Distributed Systems
, vol.21
, Issue.2
, pp. 174-187
-
-
Lan, Z.1
Zheng, Z.2
Li, Y.3
-
19
-
-
27544497222
-
Filtering failure logs for a BlueGene/L prototype
-
Y. Liang, Y. Zhang, A. Sivasubramanium, R. Sahoo, J. Moreia, and M. Gupta. Filtering failure logs for a BlueGene/L prototype. In Proceedings of DSN, 2005.
-
(2005)
Proceedings of DSN
-
-
Liang, Y.1
Zhang, Y.2
Sivasubramanium, A.3
Sahoo, R.4
Moreia, J.5
Gupta, M.6
-
20
-
-
78650904433
-
Towards automated performance diagnosis in a large IPTV network
-
A. Mahimkar, Z. Ge, A. Shaikh, J. Wang, J. Yates, Y. Zhang, and Q. Zhao. Towards automated performance diagnosis in a large IPTV network. In Proceedings of SIGCOMM, 2009.
-
(2009)
Proceedings of SIGCOMM
-
-
Mahimkar, A.1
Ge, Z.2
Shaikh, A.3
Wang, J.4
Yates, J.5
Zhang, Y.6
Zhao, Q.7
-
21
-
-
78650101944
-
Model-based fault localization: Finding behavioral outliers in large-scale computing systems
-
N. Maruyama and S. Matsuoka. Model-based fault localization: Finding behavioral outliers in large-scale computing systems. New Generation Comput, 28:237-255, 2010.
-
(2010)
New Generation Comput
, vol.28
, pp. 237-255
-
-
Maruyama, N.1
Matsuoka, S.2
-
24
-
-
36049013419
-
What supercomputers say: A study of five system logs
-
A. Oliner and J. Stearley. What supercomputers say: A study of five system logs. In Proceedings of DSN, 2007.
-
(2007)
Proceedings of DSN
-
-
Oliner, A.1
Stearley, J.2
-
25
-
-
78650427379
-
Blind men and the elephant: Piecing together hadoop for diagnosis
-
X. Pan, J. Tan, S. Kalvulya, R. Gandhi, and P. Narasimhan. Blind men and the elephant: Piecing together hadoop for diagnosis. In Proceedings of ISSRE, 2009.
-
(2009)
Proceedings of ISSRE
-
-
Pan, X.1
Tan, J.2
Kalvulya, S.3
Gandhi, R.4
Narasimhan, P.5
-
28
-
-
80052147473
-
Identifying faults in large-scale distributed systems by filtering noisy error logs
-
X. Rao, H. Wang, D. Shi, Z. Chen, H. Cai, and Q. Zhou. Identifying faults in large-scale distributed systems by filtering noisy error logs. In Proceedings of DSNW, 2011.
-
(2011)
Proceedings of DSNW
-
-
Rao, X.1
Wang, H.2
Shi, D.3
Chen, Z.4
Cai, H.5
Zhou, Q.6
-
30
-
-
84889668312
-
Diagnosing performance changes by comparing request flows
-
R. Sambasivan, A. Zheng, M. Rosa, E. Krevat, S. Whitman, M. Stroucken, W. Wang, L. Xu, and G. Ganger. Diagnosing performance changes by comparing request flows. In Proceedings of NSDI, 2011.
-
(2011)
Proceedings of NSDI
-
-
Sambasivan, R.1
Zheng, A.2
Rosa, M.3
Krevat, E.4
Whitman, S.5
Stroucken, M.6
Wang, W.7
Xu, L.8
Ganger, G.9
-
31
-
-
33845593340
-
A large-scale study of failures in high-performance computing systems
-
B. Schroeder and G. Gibson. A large-scale study of failures in high-performance computing systems. In Proceedings of DSN, 2006.
-
(2006)
Proceedings of DSN
-
-
Schroeder, B.1
Gibson, G.2
-
33
-
-
84866637169
-
Filtering log data: Finding the needles in the haystack
-
L. Yu, Z. Zheng, Z. Lan, T. Jones, J. Brandt, and A. Gentile. Filtering log data: Finding the needles in the haystack. In Proceedings of DSN, 2012.
-
(2012)
Proceedings of DSN
-
-
Yu, L.1
Zheng, Z.2
Lan, Z.3
Jones, T.4
Brandt, J.5
Gentile, A.6
-
35
-
-
80053278089
-
Co-analysis of RAS log and job log on Blue Gene/P
-
Z. Zheng, L. Yu, W. Tang, Z. Lan, R. Gupta, N. Desai, S. Coghlan, and D. Buettner. Co-analysis of RAS log and job log on Blue Gene/P. In Proceedings of IPDPS, 2011.
-
(2011)
Proceedings of IPDPS
-
-
Zheng, Z.1
Yu, L.2
Tang, W.3
Lan, Z.4
Gupta, R.5
Desai, N.6
Coghlan, S.7
Buettner, D.8
|