-
1
-
-
53349160576
-
The software challenges of petascale computing
-
K. Yelick, "The software challenges of petascale computing," HPCwire interview, 2006.
-
(2006)
HPCwire interview
-
-
Yelick, K.1
-
2
-
-
34548100442
-
Investigating lightweight storage and overlay networks for fault tolerance
-
Santa Fe, New Mexico, USA: Held in conjunction with LACSI, OCT
-
R. Oldfield, "Investigating lightweight storage and overlay networks for fault tolerance," in HAPCW'06: High Availability and Performance Computing Workshop. Santa Fe, New Mexico, USA: Held in conjunction with LACSI 2006, OCT 2006.
-
(2006)
HAPCW'06: High Availability and Performance Computing Workshop
, pp. 2006
-
-
Oldfield, R.1
-
3
-
-
53349146014
-
Gang scheduling performance on a cluster of non-dedicated workstations
-
Washington, DC, USA: IEEE Computer Society
-
H. D. Karatza, "Gang scheduling performance on a cluster of non-dedicated workstations," in SS 02: Proceedings of the 35th Annual Simulation Symposium. Washington, DC, USA: IEEE Computer Society, 2002, p. 235.
-
(2002)
SS 02: Proceedings of the 35th Annual Simulation Symposium
, pp. 235
-
-
Karatza, H.D.1
-
4
-
-
33947272883
-
An evaluation of parallel job scheduling for asei blue-pacific
-
New York, NY, USA: ACM Press
-
H. Franke, J. Jann, J. E. Moreira, P. Pattnaik, and M. A. Jette, "An evaluation of parallel job scheduling for asei blue-pacific," in Supercomputing '99: Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM). New York, NY, USA: ACM Press, 1999, p. 45.
-
(1999)
Supercomputing '99: Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM)
, pp. 45
-
-
Franke, H.1
Jann, J.2
Moreira, J.E.3
Pattnaik, P.4
Jette, M.A.5
-
5
-
-
46149124815
-
-
H. Rajaei, M. Dadfar, and P. Joshi, Simulation of job scheduling for small scale clusters, in WSC '06: Proceedings of the 38th conference on Winter simulation. Winter Simulation Conference, 2006, pp. 1195-1201.
-
H. Rajaei, M. Dadfar, and P. Joshi, "Simulation of job scheduling for small scale clusters," in WSC '06: Proceedings of the 38th conference on Winter simulation. Winter Simulation Conference, 2006, pp. 1195-1201.
-
-
-
-
7
-
-
0027868954
-
-
R. Goswami, K.K.; Iyer, Simulation of software behavior under hardware faults, in Fault-Tolerant Computing, 1993. FTCS-23. Digest of Papers., The Twenty-Third Internat Symposium on, Iss., 22-24 Jun 1993, 1993, pp. 218-227.
-
R. Goswami, K.K.; Iyer, "Simulation of software behavior under hardware faults," in Fault-Tolerant Computing, 1993. FTCS-23. Digest of Papers., The Twenty-Third Internat Symposium on, Vol., Iss., 22-24 Jun 1993, 1993, pp. 218-227.
-
-
-
-
9
-
-
33746286070
-
-
A. J. Oliner, R. K. Sahoo, J. E. Moreira, and M. Gupta, Performance implications of periodic checkpointing on large-scale cluster systems, in IPDPS '05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 18. Washington, DC, USA: IEEE Computer Society, 2005, p. 299.2.
-
A. J. Oliner, R. K. Sahoo, J. E. Moreira, and M. Gupta, "Performance implications of periodic checkpointing on large-scale cluster systems," in IPDPS '05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 18. Washington, DC, USA: IEEE Computer Society, 2005, p. 299.2.
-
-
-
-
10
-
-
0033719985
-
A simulation-based study of scheduling mechanisms for a dynamic cluster environment
-
New York, NY, USA: ACM Press
-
Y. Zhang, A. Sivasubramaniam, J. Moreira, and H. Franke, "A simulation-based study of scheduling mechanisms for a dynamic cluster environment," in ICS '00: Proceedings of the 14th international conference on Supercomputing. New York, NY, USA: ACM Press, 2000, pp. 100-109.
-
(2000)
ICS '00: Proceedings of the 14th international conference on Supercomputing
, pp. 100-109
-
-
Zhang, Y.1
Sivasubramaniam, A.2
Moreira, J.3
Franke, H.4
-
13
-
-
84875366511
-
Slurm: Simple linux utility for resource management
-
Online, Available:, http://www.llnl.gov/linux/slurm/slurm.html
-
"Slurm: Simple linux utility for resource management," http://www.llnl.gov/linux/slurm/slurm.html. [Online]. Available: http://www.llnl.gov/linux/slurm/slurm.html
-
-
-
-
14
-
-
53349151375
-
Deja vu software
-
Online, Available:, http://www.californiadigital.com/sw.html
-
"Deja vu software," http://www.californiadigital.com/sw.html. [Online], Available: http://www.californiadigital.com/sw.html
-
-
-
-
15
-
-
85084159983
-
Libckpt: Transparent checkpointing under unix
-
New Orleans USA, pp, Jan. 1995, Online, Available
-
J. S. Plank, M. Beck, G. Kingsley, and K. Li, "Libckpt: Transparent checkpointing under unix," Proceedings of USENIX Winter 1995 Technical Conference, New Orleans USA, pp. 213-224, Jan. 1995. [Online]. Available: citeseer.ist.psu.edu/plank95libckpt.html
-
Proceedings of USENIX Winter 1995 Technical Conference
, pp. 213-224
-
-
Plank, J.S.1
Beck, M.2
Kingsley, G.3
Li, K.4
-
16
-
-
53349092011
-
-
G. Stellner, Cocheck: Checkpointing and process migration for mpi, In Proceedings of the 10th International Parallel Processing Symposium (IPPS '96), Honolulu, 1996.
-
G. Stellner, "Cocheck: Checkpointing and process migration for mpi," In Proceedings of the 10th International Parallel Processing Symposium (IPPS '96), Honolulu, 1996.
-
-
-
-
18
-
-
53349156961
-
-
G. Hamerly and C. Elkan, Bayesian approaches to failure prediction for disk drives, In Proceedings of the eighteenth international conference on machine learning, 2001.
-
G. Hamerly and C. Elkan, "Bayesian approaches to failure prediction for disk drives," In Proceedings of the eighteenth international conference on machine learning, 2001.
-
-
-
-
19
-
-
53349092010
-
Technical Report UCB/CSD-99-1042, University of California, Berkeley, Computer Science Division
-
N. Talagala and D. Patterson, "An analysis of error behavior in a large storage system," Technical Report UCB/CSD-99-1042, University of California, Berkeley, Computer Science Division, 1999.
-
(1999)
-
-
Talagala, N.1
Patterson, D.2
-
21
-
-
0012237782
-
Minimizing completion time of a program by checkpointing and rejuvenation
-
S. Garg, Y. Huang, C. Kintala, and K. S. Trivedi, "Minimizing completion time of a program by checkpointing and rejuvenation," Proceedings of the 1996 ACM SIGMETRICS Coherence, 1996.
-
(1996)
Proceedings of the 1996 ACM SIGMETRICS Coherence
-
-
Garg, S.1
Huang, Y.2
Kintala, C.3
Trivedi, K.S.4
-
22
-
-
33751082401
-
Exploit failure prediction for adaptive fault-tolerance in cluster computing
-
Y. Li and Z. Lan, "Exploit failure prediction for adaptive fault-tolerance in cluster computing," CCGrid, vol. 0, pp. 531-538, 2006.
-
(2006)
CCGrid
, vol.0
, pp. 531-538
-
-
Li, Y.1
Lan, Z.2
-
23
-
-
21644433634
-
-
E. Dragovic, P. Barham, K. Fraser, S. Hand, T. H. A. Ho, R. Neugebauery, I. Pratt, and A. Warfield, Xen and the art of virtualization, In the Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), 2003.
-
E. Dragovic, P. Barham, K. Fraser, S. Hand, T. H. A. Ho, R. Neugebauery, I. Pratt, and A. Warfield, "Xen and the art of virtualization," In the Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), 2003.
-
-
-
-
24
-
-
85059766484
-
Live Migration of Virtual Machines
-
Boston, MA: USENIX, May 2-4
-
C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach, I. Pratt, and A. Warfield, "Live Migration of Virtual Machines," in Proceedings of the 2nd Symposium on Networked Systems Design and Implementation (NSDI). Boston, MA: USENIX, May 2-4, 2005.
-
(2005)
Proceedings of the 2nd Symposium on Networked Systems Design and Implementation (NSDI)
-
-
Clark, C.1
Fraser, K.2
Hand, S.3
Hansen, J.G.4
Jul, E.5
Limpach, C.6
Pratt, I.7
Warfield, A.8
|