-
1
-
-
34548025750
-
-
Ganglia, http://ganglia.sourceforge.net/.
-
Ganglia
-
-
-
2
-
-
34548052823
-
-
OpenIPMI
-
OpenIPMI. http://openipmi.sourceforge.net/.
-
-
-
-
4
-
-
12444268370
-
Architecture of LA-MPI, a network-fault-tolerant MPI
-
R. T. Aulwes, D. J. Daniel, N. N. Desai, R. L. Graham, L. D. Risinger, M. A. Taylor, T. S. Woodall, and M. W. Sukalski. Architecture of LA-MPI, a network-fault-tolerant MPI. In International Parallel and Distributed Processing Symposium, 2004.
-
(2004)
International Parallel and Distributed Processing Symposium
-
-
Aulwes, R.T.1
Daniel, D.J.2
Desai, N.N.3
Graham, R.L.4
Risinger, L.D.5
Taylor, M.A.6
Woodall, T.S.7
Sukalski, M.W.8
-
5
-
-
0344867889
-
MOSIX: An integrated multiprocessor UNIX. In USENIX Association, editor
-
San Diego, California, USA, Berkeley, CA, USA, Winter, USENIX
-
A. Barak and R. Wheeler. MOSIX: An integrated multiprocessor UNIX. In USENIX Association, editor, Proceedings of the Winter 1989 USENIX Conference: January 30-February 3, 1989, San Diego, California, USA, pages 101-112, Berkeley, CA, USA, Winter 1989. USENIX.
-
(1989)
Proceedings of the Winter 1989 USENIX Conference: January 30-February 3
, pp. 101-112
-
-
Barak, A.1
Wheeler, R.2
-
6
-
-
21644433634
-
Xen and the art of virtualization
-
P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield. Xen and the art of virtualization. In Symposium on Operating Systems Principles, pages 164-177, 2003.
-
(2003)
Symposium on Operating Systems Principles
, pp. 164-177
-
-
Barham, P.1
Dragovic, B.2
Fraser, K.3
Hand, S.4
Harris, T.5
Ho, A.6
Neugebauer, R.7
Pratt, I.8
Warfield, A.9
-
7
-
-
0038194608
-
MPICH-V: Toward a scalable fault tolerant MPI for volatile nodes
-
Nov
-
G. Bosilca, A. Boutellier, and F. Cappello. MPICH-V: Toward a scalable fault tolerant MPI for volatile nodes. In Supercomputing, Nov. 2002.
-
(2002)
Supercomputing
-
-
Bosilca, G.1
Boutellier, A.2
Cappello, F.3
-
8
-
-
84957017252
-
A scalable process-management environment for parallel programs
-
R. Butler, W. Gropp, and E. L. Lusk. A scalable process-management environment for parallel programs. In Euro PVM/MPI, pages 168-175, 2000.
-
(2000)
Euro PVM/MPI
, pp. 168-175
-
-
Butler, R.1
Gropp, W.2
Lusk, E.L.3
-
12
-
-
85059766484
-
Live migration of virtual machines
-
May
-
C. Clark, K. Fraser, S. Hand, J. Hansem, E. Jul, C. Limpach, I. Pratt, and A. Warfield. Live migration of virtual machines. In 2nd Symposium on Networked Systems Design and Implementation, May 2005.
-
(2005)
2nd Symposium on Networked Systems Design and Implementation
-
-
Clark, C.1
Fraser, K.2
Hand, S.3
Hansem, J.4
Jul, E.5
Limpach, C.6
Pratt, I.7
Warfield, A.8
-
13
-
-
0026205353
-
Transparent process migration: Design alternatives and the sprite implementation
-
F. Douglis and J. K. Ousterhout. Transparent process migration: Design alternatives and the sprite implementation. Softw., Pract. Exper., 21(8):757-785, 1991.
-
(1991)
Softw., Pract. Exper
, vol.21
, Issue.8
, pp. 757-785
-
-
Douglis, F.1
Ousterhout, J.K.2
-
14
-
-
34548049440
-
-
J. Duell. The design and implementation of berkeley lab's linux checkpoint/restart. Tr, Lawrence Berkeley National Laboratory, 2000.
-
J. Duell. The design and implementation of berkeley lab's linux checkpoint/restart. Tr, Lawrence Berkeley National Laboratory, 2000.
-
-
-
-
15
-
-
0026867749
-
Manetho: Transparent roll back-recovery with low overhead, limited rollback, and fast output commit
-
E. N. Elnozahy and W. Zwaenepoel. Manetho: Transparent roll back-recovery with low overhead, limited rollback, and fast output commit. IEEE Trans. Comput., 41(5):526-531, 1992.
-
(1992)
IEEE Trans. Comput
, vol.41
, Issue.5
, pp. 526-531
-
-
Elnozahy, E.N.1
Zwaenepoel, W.2
-
16
-
-
84940567900
-
FT-MPI: Fault Tolerant MPI, supporting dynamic applications in a dynamic world
-
G. E. Fagg and J. J. Dongarra. FT-MPI: Fault Tolerant MPI, supporting dynamic applications in a dynamic world. In Euro PVM/MPI User's Group Meeting, Lecture Notes in Computer Science, volume 1908, pages 346-353, 2000.
-
(2000)
Euro PVM/MPI User's Group Meeting, Lecture Notes in Computer Science
, vol.1908
, pp. 346-353
-
-
Fagg, G.E.1
Dongarra, J.J.2
-
18
-
-
77951432293
-
Self-migration of operating systems
-
New York, NY, USA, ACM Press
-
J. G. Hansen and E. Jul. Self-migration of operating systems. In EW11: Proceedings of the 11th workshop on ACM SIGOPS European workshop: beyond the PC, page 23, New York, NY, USA, 2004. ACM Press.
-
(2004)
EW11: Proceedings of the 11th workshop on ACM SIGOPS European workshop: beyond the PC
, pp. 23
-
-
Hansen, J.G.1
Jul, E.2
-
19
-
-
0031540885
-
The performance of μ-Kernel-based systems
-
New York, Oct, ACM Press
-
H. Härtig, M. Hohmuth, J. Liedtke, S. Schönberg, and J. Wolter. The performance of μ-Kernel-based systems. In Proceedings of the 16th Symposium on Operating Systems Principles (SOSP-97), volume 31,5 of Operating Systems Review, pages 66-77, New York, Oct. 1997. ACM Press.
-
(1997)
Proceedings of the 16th Symposium on Operating Systems Principles (SOSP-97), volume 31,5 of Operating Systems Review
, pp. 66-77
-
-
Härtig, H.1
Hohmuth, M.2
Liedtke, J.3
Schönberg, S.4
Wolter, J.5
-
22
-
-
34548033627
-
Personal communications. Ruud Haring
-
July
-
IBM T.J. Watson. Personal communications. Ruud Haring, July 2005.
-
(2005)
-
-
Watson, I.T.J.1
-
23
-
-
0023960862
-
Fine-grained mobility in the emerald system
-
E. Jul, H. M. Levy, N. C. Hutchinson, and A. P. Black. Fine-grained mobility in the emerald system. ACM Trans. Comput. Syst., 6(1):109-133, 1988.
-
(1988)
ACM Trans. Comput. Syst
, vol.6
, Issue.1
, pp. 109-133
-
-
Jul, E.1
Levy, H.M.2
Hutchinson, N.C.3
Black, A.P.4
-
27
-
-
12444257746
-
Fault-aware job scheduling for bluegene/1 systems
-
A. Oliner, R. Sahoo, J. Moreira, M. Gupta, and A. Sivasubramaniam. Fault-aware job scheduling for bluegene/1 systems. In International Parallel and Distributed Processing Symposium, 2004.
-
(2004)
International Parallel and Distributed Processing Symposium
-
-
Oliner, A.1
Sahoo, R.2
Moreira, J.3
Gupta, M.4
Sivasubramaniam, A.5
-
29
-
-
2642552074
-
The design and implementation of zap: A system for migrating computing environments
-
S. Osman, D. Subhraveti, G. Su, and J. Nieh. The design and implementation of zap: A system for migrating computing environments. In OSDI, 2002.
-
(2002)
OSDI
-
-
Osman, S.1
Subhraveti, D.2
Su, G.3
Nieh, J.4
-
32
-
-
34548051006
-
-
S. Rani, C. Leangsuksun, A. Tikotekar, V. Rampure, and S. Scott. Toward efficient failre detection and recovery in hpc. In High Availability and Performance Computing Workshop, page (accepted), 2006.
-
S. Rani, C. Leangsuksun, A. Tikotekar, V. Rampure, and S. Scott. Toward efficient failre detection and recovery in hpc. In High Availability and Performance Computing Workshop, page (accepted), 2006.
-
-
-
-
33
-
-
77952378080
-
Critical event prediction for proactive management in large-scale computer clusters
-
R. Sahoo, A. Oliner, I. Rish, M. Gupta, J. Moreira, S. Ma, R. Vilalta, and A. Sivasubramaniam. Critical event prediction for proactive management in large-scale computer clusters. In KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 426-435, 2003.
-
(2003)
KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
, pp. 426-435
-
-
Sahoo, R.1
Oliner, A.2
Rish, I.3
Gupta, M.4
Moreira, J.5
Ma, S.6
Vilalta, R.7
Sivasubramaniam, A.8
-
34
-
-
20444444457
-
The LAM/MPI checkpoint/restart framework: System-initiated checkpointing
-
Oct
-
S. Sankaran, J. M. Squyres, B. Barrett, A. Lumsdaine, J. Duell, P. Hargrove, and E. Roman. The LAM/MPI checkpoint/restart framework: System-initiated checkpointing. In Proceedings, LACSI Symposium, Oct. 2003.
-
(2003)
Proceedings, LACSI Symposium
-
-
Sankaran, S.1
Squyres, J.M.2
Barrett, B.3
Lumsdaine, A.4
Duell, J.5
Hargrove, P.6
Roman, E.7
-
35
-
-
3242754339
-
Optimizing the migration of virtual computers
-
C P. Sapuntzakis, R. Chandra, B. Pfaff, J. Chow, M. S. Lam, and M. Rosenblum. Optimizing the migration of virtual computers. In OSDI, 2002.
-
(2002)
OSDI
-
-
Sapuntzakis, C.P.1
Chandra, R.2
Pfaff, B.3
Chow, J.4
Lam, M.S.5
Rosenblum, M.6
-
37
-
-
33750936415
-
Availability modeling and analysis on high performance cluster computing systems
-
H. Song, C. Leangsuksun, and R. Nassar. Availability modeling and analysis on high performance cluster computing systems. In First International Conference on Availability, Reliability and Security, pages 305-313, 2006.
-
(2006)
First International Conference on Availability, Reliability and Security
, pp. 305-313
-
-
Song, H.1
Leangsuksun, C.2
Nassar, R.3
-
38
-
-
0029713612
-
-
G. Stellner. CoCheck: checkpointing and process migration for MPI. In IEEE, editor, Proceedings of IPPS '96. The 10th International Parallel Processing Symposium: Honolulu, HI, USA, 15-19 April 1996, pages 526-531, 1109 Spring Street, Suite 300, Silver Spring, MD 20910, USA, 1996. IEEE Computer Society Press.
-
G. Stellner. CoCheck: checkpointing and process migration for MPI. In IEEE, editor, Proceedings of IPPS '96. The 10th International Parallel Processing Symposium: Honolulu, HI, USA, 15-19 April 1996, pages 526-531, 1109 Spring Street, Suite 300, Silver Spring, MD 20910, USA, 1996. IEEE Computer Society Press.
-
-
-
-
39
-
-
0022302038
-
Preemptable remote execution facilities for the v-system
-
M. Theimer, K. A. Lantz, and D. R. Cheriton. Preemptable remote execution facilities for the v-system. In SOSP, pages 2-12, 1985.
-
(1985)
SOSP
, pp. 2-12
-
-
Theimer, M.1
Lantz, K.A.2
Cheriton, D.R.3
-
40
-
-
34548768671
-
-
C. Wang, F. Mueller, C. Engelmann, and S. Scott. A job pause service under lam/mpi+blcr for transparent fault tolerance. In International Parallel and Distributed Processing Symposium, page (accepted), Apr. 2007.
-
C. Wang, F. Mueller, C. Engelmann, and S. Scott. A job pause service under lam/mpi+blcr for transparent fault tolerance. In International Parallel and Distributed Processing Symposium, page (accepted), Apr. 2007.
-
-
-
-
41
-
-
67650081621
-
Constructing services with interposable virtual hardware
-
A. Whitaker, R. S. Cox, M. Shaw, and S. D. Gribble. Constructing services with interposable virtual hardware. In Symposium on Networked Systems Design and Implementation, pages 169-182, 2004.
-
(2004)
Symposium on Networked Systems Design and Implementation
, pp. 169-182
-
-
Whitaker, A.1
Cox, R.S.2
Shaw, M.3
Gribble, S.D.4
-
43
-
-
84987240026
-
Attacking the process migration bottleneck
-
E. R. Zayas. Attacking the process migration bottleneck. In SOSP, pages 13-24, 1987.
-
(1987)
SOSP
, pp. 13-24
-
-
Zayas, E.R.1
|