-
1
-
-
80455174959
-
-
Starcluster (2010), http://web.mit.edu/stardev/cluster/
-
(2010)
Starcluster
-
-
-
4
-
-
85060036181
-
Validity of the single processor approach to achieving large scale computing capabilities
-
ACM, New York
-
Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing capabilities. In: Proceedings of the Spring Joint Computer Conference, April 18-20, pp. 483-485. ACM, New York (1967)
-
(1967)
Proceedings of the Spring Joint Computer Conference, April 18-20
, pp. 483-485
-
-
Amdahl, G.M.1
-
5
-
-
78049508316
-
Decision model for cloud computing under sla constraints
-
Andrzejak, A., Kondo, D., Yi, S.: Decision model for cloud computing under sla constraints. In: Proc. IEEE Int Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS) Symp., pp. 257-266 (2010)
-
(2010)
Proc. IEEE Int Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS) Symp.
, pp. 257-266
-
-
Andrzejak, A.1
Kondo, D.2
Yi, S.3
-
6
-
-
21644433634
-
Xen and the art of virtualization
-
ACM, New York
-
Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., Warfield, A.: Xen and the art of virtualization. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, pp. 164-177. ACM, New York (2003)
-
(2003)
Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles
, pp. 164-177
-
-
Barham, P.1
Dragovic, B.2
Fraser, K.3
Hand, S.4
Harris, T.5
Ho, A.6
Neugebauer, R.7
Pratt, I.8
Warfield, A.9
-
7
-
-
0039877166
-
Timing models and local stopping criteria for asynchronous iterative algorithms
-
Blathras, K., Szyld, D.B., Shi, Y.: Timing models and local stopping criteria for asynchronous iterative algorithms. Journal of Parallel and Distributed Computing 58(3), 446-465 (1999)
-
(1999)
Journal of Parallel and Distributed Computing
, vol.58
, Issue.3
, pp. 446-465
-
-
Blathras, K.1
Szyld, D.B.2
Shi, Y.3
-
9
-
-
85088778522
-
See spot run: Using spot instances for mapreduce workflows
-
USENIX Association
-
Chohan, N., Castillo, C., Spreitzer, M., Steinder, M., Tantawi, A., Krintz, C.: See spot run: using spot instances for mapreduce workflows. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, p. 7. USENIX Association (2010)
-
(2010)
Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing
, pp. 7
-
-
Chohan, N.1
Castillo, C.2
Spreitzer, M.3
Steinder, M.4
Tantawi, A.5
Krintz, C.6
-
10
-
-
28044460018
-
A higher order estimate of the optimum checkpoint interval for restart dumps
-
Daly, J.T.: A higher order estimate of the optimum checkpoint interval for restart dumps. Future Generation Computer Systems 22(3), 303-312 (2006)
-
(2006)
Future Generation Computer Systems
, vol.22
, Issue.3
, pp. 303-312
-
-
Daly, J.T.1
-
11
-
-
37549003336
-
Mapreduce: Simplified data processing on large clusters
-
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. Communications of the ACM 51(1), 107-113 (2008)
-
(2008)
Communications of the ACM
, vol.51
, Issue.1
, pp. 107-113
-
-
Dean, J.1
Ghemawat, S.2
-
12
-
-
84940567900
-
Ft-mpi: Fault tolerant mpi, supporting dynamic applications in a dynamic world
-
Dongarra, J., Kacsuk, P., Podhorszki, N. (eds.) PVM/MPI 2000. Springer, Heidelberg
-
Fagg, G., Dongarra, J.: Ft-mpi: Fault tolerant mpi, supporting dynamic applications in a dynamic world. In: Dongarra, J., Kacsuk, P., Podhorszki, N. (eds.) PVM/MPI 2000. LNCS, vol. 1908, pp. 346-353. Springer, Heidelberg (2000)
-
(2000)
LNCS
, vol.1908
, pp. 346-353
-
-
Fagg, G.1
Dongarra, J.2
-
13
-
-
0347133226
-
A network-failure-tolerant message-passing system for terascale clusters
-
Graham, R.L., Choi, S.E., Daniel, D.J., Desai, N.N., Minnich, R.G., Rasmussen, C.E., Risinger, L.D., Sukalski, M.W.: A network-failure-tolerant message-passing system for terascale clusters. International Journal of Parallel Programming 31(4), 285-303 (2003)
-
(2003)
International Journal of Parallel Programming
, vol.31
, Issue.4
, pp. 285-303
-
-
Graham, R.L.1
Choi, S.E.2
Daniel, D.J.3
Desai, N.N.4
Minnich, R.G.5
Rasmussen, C.E.6
Risinger, L.D.7
Sukalski, M.W.8
-
14
-
-
33749067567
-
Berkeley lab checkpoint/restart (blcr) for linux clusters
-
IOP Publishing
-
Hargrove, P.H., Duell, J.C.: Berkeley lab checkpoint/restart (blcr) for linux clusters. In: Journal of Physics: Conference Series, vol. 46, p. 494. IOP Publishing (2006)
-
(2006)
Journal of Physics: Conference Series
, vol.46
, pp. 494
-
-
Hargrove, P.H.1
Duell, J.C.2
-
15
-
-
80455151190
-
-
PhD thesis, Indiana University, Bloomington, IN, USA July
-
Hursey, J.: Coordinated Checkpoint/Restart Process Fault Tolerance for MPI Applications on HPC Systems. PhD thesis, Indiana University, Bloomington, IN, USA (July 2010)
-
(2010)
Coordinated Checkpoint/Restart Process Fault Tolerance for MPI Applications on HPC Systems
-
-
Hursey, J.1
-
16
-
-
34548789748
-
The design and implementation of checkpoint/restart process fault tolerance for open mpi
-
Hursey, J., Squyres, J.M., Mattox, T.I., Lumsdaine, A.: The design and implementation of checkpoint/restart process fault tolerance for open mpi. In: Proc. IEEE Int. Parallel and Distributed Processing Symp. IPDPS 2007, pp. 1-8 (2007)
-
(2007)
Proc. IEEE Int. Parallel and Distributed Processing Symp. IPDPS 2007
, pp. 1-8
-
-
Hursey, J.1
Squyres, J.M.2
Mattox, T.I.3
Lumsdaine, A.4
-
17
-
-
85027938495
-
Performance analysis of cloud computing services for many-tasks scientific computing
-
Iosup, A., Ostermann, S., Yigitbasi, N., Prodan, R., Fahringer, T., Epema, D.: Performance analysis of cloud computing services for many-tasks scientific computing. IEEE Transactions on Parallel and Distributed Systems 22(6), 931-945 (2011)
-
(2011)
IEEE Transactions on Parallel and Distributed Systems
, vol.22
, Issue.6
, pp. 931-945
-
-
Iosup, A.1
Ostermann, S.2
Yigitbasi, N.3
Prodan, R.4
Fahringer, T.5
Epema, D.6
-
18
-
-
0003912256
-
-
Technical report, Technical Report
-
Litzkow, M., Tannenbaum, T., Basney, J., Livny, M.: Checkpoint and migration of unix processes in the condor distributed processing system. Technical report, Technical Report (1997)
-
(1997)
Checkpoint and Migration of Unix Processes in the Condor Distributed Processing System
-
-
Litzkow, M.1
Tannenbaum, T.2
Basney, J.3
Livny, M.4
-
20
-
-
78650831692
-
Design, modeling, and evaluation of a scalable multi-level checkpointing system
-
Moody, A., Bronevetsky, G., Mohror, K., de Supinski, B.R.: Design, modeling, and evaluation of a scalable multi-level checkpointing system. In: Proc. Int. High Performance Computing, Networking, Storage and Analysis (SC) Conf. for, pp. 1-11 (2010)
-
(2010)
Proc. Int. High Performance Computing, Networking, Storage and Analysis (SC) Conf. for
, pp. 1-11
-
-
Moody, A.1
Bronevetsky, G.2
Mohror, K.3
De Supinski, B.R.4
-
21
-
-
0032597696
-
Egida: An extensible toolkit for low-overhead fault-tolerance
-
Digest of Papers, IEEE, Los Alamitos
-
Rao, S., Alvisi, L., Vin, H.M.: Egida: An extensible toolkit for low-overhead fault-tolerance. In: Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing, 1999. Digest of Papers, pp. 48-55. IEEE, Los Alamitos (1999)
-
(1999)
Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing, 1999
, pp. 48-55
-
-
Rao, S.1
Alvisi, L.2
Vin, H.M.3
-
23
-
-
81455155109
-
Sustainable gpu computing at scale
-
Shi, J.Y., Taifi, M., Khreishah, A., Wu, J.: Sustainable gpu computing at scale. In: 14th IEEE International Conference in Computational Science and Engneering 2011 (2011)
-
(2011)
14th IEEE International Conference in Computational Science and Engneering 2011
-
-
Shi, J.Y.1
Taifi, M.2
Khreishah, A.3
Wu, J.4
-
24
-
-
0029713612
-
Cocheck: Checkpointing and process migration for mpi
-
IEEE Computer Society, Washington, DC, USA
-
Stellner, G.: Cocheck: Checkpointing and process migration for mpi. In: Proceedings of the 10th International Parallel Processing Symposium, IPPS 1996, pp. 526-531. IEEE Computer Society, Washington, DC, USA (1996)
-
(1996)
Proceedings of the 10th International Parallel Processing Symposium, IPPS 1996
, pp. 526-531
-
-
Stellner, G.1
-
25
-
-
77949790526
-
High-performance cloud computing: A view of scientific applications
-
Vecchiola, C., Pandey, S., Buyya, R.: High-performance cloud computing: A view of scientific applications. In: Proc. 10th Int. Pervasive Systems, Algorithms, and Networks (ISPAN) Symp., pp. 4-16 (2009)
-
(2009)
Proc. 10th Int. Pervasive Systems, Algorithms, and Networks (ISPAN) Symp.
, pp. 4-16
-
-
Vecchiola, C.1
Pandey, S.2
Buyya, R.3
-
26
-
-
77957960970
-
Reducing costs of spot instances via checkpointing in the amazon elastic compute cloud
-
IEEE, Los Alamitos
-
Yi, S., Kondo, D., Andrzejak, A.: Reducing costs of spot instances via checkpointing in the amazon elastic compute cloud. In: 2010 IEEE 3rd International Conference on Cloud Computing, pp. 236-243. IEEE, Los Alamitos (2010)
-
(2010)
2010 IEEE 3rd International Conference on Cloud Computing
, pp. 236-243
-
-
Yi, S.1
Kondo, D.2
Andrzejak, A.3
-
27
-
-
84976846528
-
A first order approximation to the optimum checkpoint interval
-
Young, J.W.: A first order approximation to the optimum checkpoint interval. Communications of the ACM 17(9), 530-531 (1974)
-
(1974)
Communications of the ACM
, vol.17
, Issue.9
, pp. 530-531
-
-
Young, J.W.1
-
28
-
-
48249138490
-
Evaluating the performance impact of xen on mpi and process execution for hpc systems
-
IEEE Computer Society, Los Alamitos
-
Youseff, L., Wolski, R., Gorda, B., Krintz, C.: Evaluating the performance impact of xen on mpi and process execution for hpc systems. In: Proceedings of the 2nd International Workshop on Virtualization Technology in Distributed computing, p. 1. IEEE Computer Society, Los Alamitos (2006)
-
(2006)
Proceedings of the 2nd International Workshop on Virtualization Technology in Distributed Computing
, pp. 1
-
-
Youseff, L.1
Wolski, R.2
Gorda, B.3
Krintz, C.4
-
29
-
-
84908019185
-
Dynamic resource allocation for spot markets in clouds
-
Zhang, Q., Grses, E., Boutaba, R., Xiao, J.: Dynamic resource allocation for spot markets in clouds. In: Proceedings of the 11th USENIX Conference on Hot Topics in Management of Internet, Cloud, and Enterprise Networks and Services (2011)
-
Proceedings of the 11th USENIX Conference on Hot Topics in Management of Internet, Cloud, and Enterprise Networks and Services (2011)
-
-
Zhang, Q.1
Grses, E.2
Boutaba, R.3
Xiao, J.4
-
30
-
-
20444463494
-
Ftc-charm++: An in-memory checkpoint-based fault tolerant runtime for charm++ and mpi
-
IEEE, Los Alamitos
-
Zheng, G., Shi, L., Kalé, L.V.: Ftc-charm++: An in-memory checkpoint-based fault tolerant runtime for charm++ and mpi. In: 2004 IEEE International Conference on Cluster Computing, pp. 93-103. IEEE, Los Alamitos (2004)
-
(2004)
2004 IEEE International Conference on Cluster Computing
, pp. 93-103
-
-
Zheng, G.1
Shi, L.2
Kalé, L.V.3
|