-
2
-
-
21044437801
-
-
A. Gara, M.A. Blumrich, D. Chen, G. Chiu, P. Coteus, M.E. Giampapa, R. A. Haring, P. Heidelberger, D. Hoenicke, G. V. Kopcsay, T. A. Liebsch, M. Ohmacht, B. D. Steinmacher-Burow, T. Takken, P. Vranas, Overview of the BlueGene/L architecture, IBM J. Res. & Dev. 49, IBM Corp., New York, 195-212, May 2005.
-
A. Gara, M.A. Blumrich, D. Chen, G. Chiu, P. Coteus, M.E. Giampapa, R. A. Haring, P. Heidelberger, D. Hoenicke, G. V. Kopcsay, T. A. Liebsch, M. Ohmacht, B. D. Steinmacher-Burow, T. Takken, P. Vranas, "Overview of the BlueGene/L architecture," IBM J. Res. & Dev. 49, IBM Corp., New York, 195-212, May 2005.
-
-
-
-
3
-
-
21044455436
-
-
J. E. Moreira, G. Almasi, C. Archer, R. Bellofatto, P. Bergner, J. R. Brunheroto, M. Brutman, J. G. Castanos, P. G. Crumley, M. Gupta, T. Inglett, D. Lieber, D. Limpert, P. McCarthy, M. Megerian, M. Mendell, M. Mundy, D. Reed, R. K. Sahoo, A. Sanomiya, R. Shok, B. Smith, G. G. Stewart, BlueGene/L programming and operating environment, IBM J. Res. & Dev. 49, IBM Corp., New York, 367-376, May 2005.
-
J. E. Moreira, G. Almasi, C. Archer, R. Bellofatto, P. Bergner, J. R. Brunheroto, M. Brutman, J. G. Castanos, P. G. Crumley, M. Gupta, T. Inglett, D. Lieber, D. Limpert, P. McCarthy, M. Megerian, M. Mendell, M. Mundy, D. Reed, R. K. Sahoo, A. Sanomiya, R. Shok, B. Smith, G. G. Stewart, "BlueGene/L programming and operating environment," IBM J. Res. & Dev. 49, IBM Corp., New York, 367-376, May 2005.
-
-
-
-
4
-
-
0036931372
-
Modeling the effect of technology trends on soft error rate of combinational logic
-
P. Shivakumar, M. Kistler, S. Keckler, D. Burger, L. Alvisi, "Modeling the effect of technology trends on soft error rate of combinational logic," in Proceedings of the 2002 International Conference on Dependable Systems and Networks, pp. 389-398.
-
Proceedings of the 2002 International Conference on Dependable Systems and Networks
, pp. 389-398
-
-
Shivakumar, P.1
Kistler, M.2
Keckler, S.3
Burger, D.4
Alvisi, L.5
-
5
-
-
4544227478
-
The Impact of Technology Scaling on Processor Lifetime Reliability
-
June
-
J. Srinivasan, S. Adve, P. Bose, J. Rivers, "The Impact of Technology Scaling on Processor Lifetime Reliability," in Proceedings of the International Conference on Dependable Systems and Networks (DSN-2004), June 2004.
-
(2004)
Proceedings of the International Conference on Dependable Systems and Networks (DSN-2004)
-
-
Srinivasan, J.1
Adve, S.2
Bose, P.3
Rivers, J.4
-
6
-
-
77952378080
-
Critical Event Prediction for Proactive Management in Large-scale Computer Clusters
-
August
-
R. Sahoo, A. Oliner, I. Rish, M. Gupta, J. Moreira, S. Ma, R. Vilalta, "Critical Event Prediction for Proactive Management in Large-scale Computer Clusters,", in Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2003.
-
(2003)
Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
-
-
Sahoo, R.1
Oliner, A.2
Rish, I.3
Gupta, M.4
Moreira, J.5
Ma, S.6
Vilalta, R.7
-
7
-
-
51349123791
-
Providing Persistent and Consistent Resources through Event Log Analysis and Predictions for Large-scale computing systems
-
R. Sahoo, M. Bae, R. Vilalta, J. Moreira, S. Ma, M. Gupta, "Providing Persistent and Consistent Resources through Event Log Analysis and Predictions for Large-scale computing systems,", in Workshop on Self-Healing, Adaptive and SelfMANaged Systems (SHAMAN), 2002.
-
(2002)
Workshop on Self-Healing, Adaptive and SelfMANaged Systems (SHAMAN)
-
-
Sahoo, R.1
Bae, M.2
Vilalta, R.3
Moreira, J.4
Ma, S.5
Gupta, M.6
-
8
-
-
4043157227
-
-
M. L. Fair, C. R. Conklin, S. B. Swaney, P. J. Meaney, W. J. Clarke, L. C. Alves, I. N. Modi, F. Freier, W. Fischer, N. E. Weber, Reliability, Availability and Serviceability (RAS) of the IBM Server z990, TBM J. Res. & Dev. 48, IBM Corp., New York, 2004, pp. 519-534.
-
M. L. Fair, C. R. Conklin, S. B. Swaney, P. J. Meaney, W. J. Clarke, L. C. Alves, I. N. Modi, F. Freier, W. Fischer, N. E. Weber, "Reliability, Availability and Serviceability (RAS) of the IBM Server z990," TBM J. Res. & Dev. 48, IBM Corp., New York, 2004, pp. 519-534.
-
-
-
-
9
-
-
33746286070
-
Performance Implications of Periodic Checkpointing on Large-scale Cluster Systems
-
A. Oliner, R. Sahoo, J. Moreira, M. Gupta, "Performance Implications of Periodic Checkpointing on Large-scale Cluster Systems", in Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05), 2005.
-
(2005)
Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05)
-
-
Oliner, A.1
Sahoo, R.2
Moreira, J.3
Gupta, M.4
-
10
-
-
0021503883
-
Optimization Criteria for Checkpoint Placement
-
October
-
C. Krishna, K. Shing, Y. Lee, "Optimization Criteria for Checkpoint Placement," in Communications of the ACM, Vol. 27 NO. 10, October 1984.
-
(1984)
Communications of the ACM
, vol.27
, Issue.10
-
-
Krishna, C.1
Shing, K.2
Lee, Y.3
-
11
-
-
0035390088
-
A Variational Calculus Approach to Optimal Checkpoint Placement
-
July
-
Y. Ling, J. Mi, X. Lin, "A Variational Calculus Approach to Optimal Checkpoint Placement," IEEE Transaction on Computers, Vol. 50 NO. 7, July 2001.
-
(2001)
IEEE Transaction on Computers
, vol.50
, Issue.7
-
-
Ling, Y.1
Mi, J.2
Lin, X.3
-
15
-
-
33645820506
-
A Classification Approach for Prediction of Target Events in Temporal Sequences
-
C. Domeniconi, C. Perng, R. Vilalta, S. Ma, "A Classification Approach for Prediction of Target Events in Temporal Sequences," in Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery, 2002.
-
(2002)
Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
-
-
Domeniconi, C.1
Perng, C.2
Vilalta, R.3
Ma, S.4
-
16
-
-
27544497222
-
Filtering Failure Logs for a BlueGene/L Prototype
-
Y. Liang, Y. Zhang, A. Sivasubramaniam, R. Sahoo, J. Moreira, M. Gupta, "Filtering Failure Logs for a BlueGene/L Prototype," in Proceedings of the 2005 International Conference on Dependable Systems and Networks (DSN'05), 2005, pp. 476-485.
-
(2005)
Proceedings of the 2005 International Conference on Dependable Systems and Networks (DSN'05)
, pp. 476-485
-
-
Liang, Y.1
Zhang, Y.2
Sivasubramaniam, A.3
Sahoo, R.4
Moreira, J.5
Gupta, M.6
-
17
-
-
57949103205
-
-
online, Available
-
Top 500 supercomputer sites [online]. Available: http://www.top500.Org/ list/2007/l1
-
Top 500 supercomputer sites
-
-
-
19
-
-
33845589803
-
BlueGene /L Failure Analysis and Prediction Models
-
Y. Liang, Y. Zhang, A. Sivasubramaniam, M. Jette, R. Sahoo, "BlueGene /L Failure Analysis and Prediction Models," in International Conference on Dependable Systems and Networks (DSN'06), 2006, pp. 425-134.
-
(2006)
International Conference on Dependable Systems and Networks (DSN'06)
, pp. 425-134
-
-
Liang, Y.1
Zhang, Y.2
Sivasubramaniam, A.3
Jette, M.4
Sahoo, R.5
-
20
-
-
33847147616
-
Proactive Fault Tolerance in Large Systems
-
S. Chakravorty, C. Mendes, L. Kale, "Proactive Fault Tolerance in Large Systems," in HPCRI: 1st Workshop on High Performance Computing Reliability Issues, in Proceedings of the 11th International Symposium on High Performance Computer Architecture (HPCA-11). IEEE Computer Society, 2005.
-
(2005)
HPCRI: 1st Workshop on High Performance Computing Reliability Issues, in Proceedings of the 11th International Symposium on High Performance Computer Architecture (HPCA-11). IEEE Computer Society
-
-
Chakravorty, S.1
Mendes, C.2
Kale, L.3
-
21
-
-
12444257746
-
Fault-aware Job Scheduling for Blue Gene/L Systems
-
April
-
A. J. Oliner, R. K. Sahoo, J. E. Moreira, M. Gupta, A. Sivasubramaniam, "Fault-aware Job Scheduling for Blue Gene/L Systems," in Proceedings of the International Parallel and Distributed Processing Symposium, April 2004.
-
(2004)
Proceedings of the International Parallel and Distributed Processing Symposium
-
-
Oliner, A.J.1
Sahoo, R.K.2
Moreira, J.E.3
Gupta, M.4
Sivasubramaniam, A.5
|