-
1
-
-
85033359163
-
-
N. Adiga and T. B. Team. An overview of the bluegene/1 supercomputer. In SC2002, Supercomputing, Technical Papers, Nov. 2002.
-
N. Adiga and T. B. Team. An overview of the bluegene/1 supercomputer. In SC2002, Supercomputing, Technical Papers, Nov. 2002.
-
-
-
-
2
-
-
8344232253
-
Adaptive incremental checkpointing for massively parallel systems
-
S. Agarwal, R. Garg, M. S. Gupta, and J. E. Moreira. Adaptive incremental checkpointing for massively parallel systems. In ICS 2004, Pages 277-286, 2004.
-
(2004)
ICS 2004
, pp. 277-286
-
-
Agarwal, S.1
Garg, R.2
Gupta, M.S.3
Moreira, J.E.4
-
3
-
-
84966586023
-
A prediction-based real-time scheduling advisor
-
P. Dinda. A prediction-based real-time scheduling advisor. In IPDPS, 2002.
-
(2002)
IPDPS
-
-
Dinda, P.1
-
4
-
-
9144223280
-
Checkpointing for peta-scale systems: A look into the future of practical rollback-recovery
-
E. N. Elnozahy and J. S. Plank. Checkpointing for peta-scale systems: A look into the future of practical rollback-recovery. IEEE Trans. Dependable Secur. Comput., 1(2):97-108, 2004.
-
(2004)
IEEE Trans. Dependable Secur. Comput
, vol.1
, Issue.2
, pp. 97-108
-
-
Elnozahy, E.N.1
Plank, J.S.2
-
5
-
-
27544497222
-
Filtering failure logs for a bluegene/1 prototype
-
June
-
Y. Liang, Y. Zhang, A. Sivasubramaniam, R. K. Sahoo, J. Moreira, and M. Gupta. Filtering failure logs for a bluegene/1 prototype. In Proceedings of the Intl. Conf., on Dependable Systems and Networks (DSN), June 2005.
-
(2005)
Proceedings of the Intl. Conf., on Dependable Systems and Networks (DSN)
-
-
Liang, Y.1
Zhang, Y.2
Sivasubramaniam, A.3
Sahoo, R.K.4
Moreira, J.5
Gupta, M.6
-
8
-
-
27544438709
-
Probabilistic qos guarantees for supercomputing systems
-
June
-
A. J. Oliner, L. Rudolph, R. K. Sahoo, J. Moreira, and M. Gupta. Probabilistic qos guarantees for supercomputing systems. In Proceedings of the Intl. Conf., on Dependable Systems and Networks (DSN), June 2005.
-
(2005)
Proceedings of the Intl. Conf., on Dependable Systems and Networks (DSN)
-
-
Oliner, A.J.1
Rudolph, L.2
Sahoo, R.K.3
Moreira, J.4
Gupta, M.5
-
9
-
-
33746286070
-
Performance implications of periodic checkpointing on large-scale cluster systems
-
Apr
-
A. J. Oliner, R. K. Sahoo, J. E. Moreira, and M. Gupta. Performance implications of periodic checkpointing on large-scale cluster systems. In IEEE IPDPS, Workshop on System Management Tools for Large-scale Parallel Systems, Apr. 2005.
-
(2005)
IEEE IPDPS, Workshop on System Management Tools for Large-scale Parallel Systems
-
-
Oliner, A.J.1
Sahoo, R.K.2
Moreira, J.E.3
Gupta, M.4
-
10
-
-
12444257746
-
Fault-aware job scheduling for bluegene/1 systems
-
Apr
-
A. J. Oliner, R. K. Sahoo, J. E. Moreira, M. Gupta, and A. Sivasubramaniam. Fault-aware job scheduling for bluegene/1 systems. In IEEE IPDPS, Intl. Parallel and Distributed Processing Symposium, Apr. 2004.
-
(2004)
IEEE IPDPS, Intl. Parallel and Distributed Processing Symposium
-
-
Oliner, A.J.1
Sahoo, R.K.2
Moreira, J.E.3
Gupta, M.4
Sivasubramaniam, A.5
-
12
-
-
0035201417
-
Processor allocation and checkpoint interval selection in cluster computing systems
-
November
-
J. S. Plank and M. G. Thomason. Processor allocation and checkpoint interval selection in cluster computing systems. Journal of Parallel and Distributed Computing, 61(11):1570-1590, November 2001.
-
(2001)
Journal of Parallel and Distributed Computing
, vol.61
, Issue.11
, pp. 1570-1590
-
-
Plank, J.S.1
Thomason, M.G.2
-
13
-
-
12444287222
-
Providing persistent and consistent resources through event log analysis and predictions for large-scale computing systems
-
New York, June
-
R. K. Sahoo, M. Bae, R. Vilalta, J. Moreira, S. Ma, and M. Gupta. Providing persistent and consistent resources through event log analysis and predictions for large-scale computing systems. In SHAMAN, Workshop, ICS'02, New York, June 2002.
-
(2002)
SHAMAN, Workshop, ICS'02
-
-
Sahoo, R.K.1
Bae, M.2
Vilalta, R.3
Moreira, J.4
Ma, S.5
Gupta, M.6
-
14
-
-
77952378080
-
Critical event prediction for proactive management in large-scale computer clusters
-
August
-
R. K. Sahoo, A. J. Oliner, I. Rish, M. Gupta, J. E. Moreira, S. Ma, R. Vilalta, and A. Sivasubramaniam. Critical event prediction for proactive management in large-scale computer clusters. In Proceedings of the ACM SIGKDD, Intl. Conf. on Knowledge Discovery and Data Mining, pages 426-435, August 2003.
-
(2003)
Proceedings of the ACM SIGKDD, Intl. Conf. on Knowledge Discovery and Data Mining
, pp. 426-435
-
-
Sahoo, R.K.1
Oliner, A.J.2
Rish, I.3
Gupta, M.4
Moreira, J.E.5
Ma, S.6
Vilalta, R.7
Sivasubramaniam, A.8
-
15
-
-
12444278321
-
Autonomic computing features for large-scale server management and control
-
Aug
-
R. K. Sahoo, I. Rish, A. J. Oliner, M. Gupta, J. E. Moreira, S. Ma, R. Vilalta, and A. Sivasubramaniam. Autonomic computing features for large-scale server management and control. In AIAC Workshop, IJCAI 2003, Aug. 2003.
-
(2003)
AIAC Workshop, IJCAI 2003
-
-
Sahoo, R.K.1
Rish, I.2
Oliner, A.J.3
Gupta, M.4
Moreira, J.E.5
Ma, S.6
Vilalta, R.7
Sivasubramaniam, A.8
-
16
-
-
4544382099
-
Failure data analysis of a large-scale heterogeneous server environment
-
June
-
R. K. Sahoo, A. Sivasubramanian, M. S. Squillante, and Y. Zhang. Failure data analysis of a large-scale heterogeneous server environment. In Proceedings of the Intl. Conf. on Dependable Systems and Networks (DSN), pages 772-781, June 2004.
-
(2004)
Proceedings of the Intl. Conf. on Dependable Systems and Networks (DSN)
, pp. 772-781
-
-
Sahoo, R.K.1
Sivasubramanian, A.2
Squillante, M.S.3
Zhang, Y.4
-
17
-
-
20444444457
-
The LAM/MPI checkpoint/restart framework: Systeminitiated checkpointing
-
Sante Fe, New Mexico, USA, October
-
S. Sankaran, J. M. Squyres, B. Barrett, A. Lumsdaine, J. Duell, P. Hargrove, and E. Roman. The LAM/MPI checkpoint/restart framework: Systeminitiated checkpointing. In Proceedings, LACSI Symposium, Sante Fe, New Mexico, USA, October 2003.
-
(2003)
Proceedings, LACSI Symposium
-
-
Sankaran, S.1
Squyres, J.M.2
Barrett, B.3
Lumsdaine, A.4
Duell, J.5
Hargrove, P.6
Roman, E.7
|