-
1
-
-
79956268427
-
Intercloud: Utilityoriented federation of cloud computing environments for scaling of application services
-
R. Buyya, R. Ranjan, and R. N. Calheiros, "Intercloud: Utilityoriented federation of cloud computing environments for scaling of application services," in Proc. 10th Int. Conf. Algorithms Archit. Parallel Process., 2010, pp. 13-31.
-
(2010)
Proc. 10th Int. Conf. Algorithms Archit. Parallel Process.
, pp. 13-31
-
-
Buyya, R.1
Ranjan, R.2
Calheiros, R.N.3
-
3
-
-
84893305113
-
Mesos: A platform for finegrained resource sharing in the data center
-
B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. H. Katz, S. Shenker, and I. Stoica, "Mesos: A platform for finegrained resource sharing in the data center," in Proc. 8th USENIX Conf. Netw. Syst. Des. Implementation, 2011, pp. 295-308.
-
(2011)
Proc. 8th USENIX Conf. Netw. Syst. Des. Implementation
, pp. 295-308
-
-
Hindman, B.1
Konwinski, A.2
Zaharia, M.3
Ghodsi, A.4
Joseph, A.D.5
Katz, R.H.6
Shenker, S.7
Stoica, I.8
-
4
-
-
84893249524
-
Apache hadoop yarn: Yet another resource negotiator
-
Art. no. 5
-
V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, et al., "Apache hadoop yarn: Yet another resource negotiator," in Proc. 4th Annu. Symp. Cloud Comput., 2013, Art. no. 5.
-
(2013)
Proc. 4th Annu. Symp. Cloud Comput.
-
-
Vavilapalli, V.K.1
Murthy, A.C.2
Douglas, C.3
Agarwal, S.4
Konar, M.5
Evans, R.6
-
5
-
-
84905826576
-
Fuxi: A faulttolerant resource management and job scheduling system at internet scale
-
Z. Zhang, C. Li, Y. Tao, R. Yang, H. Tang, and J. Xu, "Fuxi: A faulttolerant resource management and job scheduling system at internet scale," in Proc. Int. Conf. Very Large Databases, 2014, pp. 1393-1404.
-
(2014)
Proc. Int. Conf. Very Large Databases
, pp. 1393-1404
-
-
Zhang, Z.1
Li, C.2
Tao, Y.3
Yang, R.4
Tang, H.5
Xu, J.6
-
6
-
-
77950489272
-
-
San Rafael, CA, USA: Morgan & Claypool Publishers
-
L. A. Barroso, J. Clidaras, and U. Hölzle, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. San Rafael, CA, USA: Morgan & Claypool Publishers, 2013.
-
(2013)
The Datacenter As A Computer: An Introduction to the Design of Warehouse-Scale Machines
-
-
Barroso, L.A.1
Clidaras, J.2
Hölzle, U.3
-
7
-
-
79951839892
-
Fault tolerance and scaling in e-science cloud applications: Observations from the continuing development of modisazure
-
J. Li, M. Humphrey, Y.-W. Cheah, Y. Ryu, D. Agarwal, K. Jackson, and C. van Ingen, "Fault tolerance and scaling in e-science cloud applications: Observations from the continuing development of modisazure," in Proc. IEEE 6th Int. Conf. e-Sci., 2010, pp. 246-253.
-
(2010)
Proc. IEEE 6th Int. Conf. E-Sci.
, pp. 246-253
-
-
Li, J.1
Humphrey, M.2
Cheah, Y.-W.3
Ryu, Y.4
Agarwal, D.5
Jackson, K.6
Van Ingen, C.7
-
8
-
-
84978952293
-
Computing at massive scale: Scalability and dependability challenges
-
Oxford, U.K.
-
R. Yang and J. Xu, "Computing at massive scale: Scalability and dependability challenges," in presented at the IEEE 10th Int. Symp. Service Oriented System Engineering, Oxford, U.K., 2016.
-
(2016)
IEEE 10th Int. Symp. Service Oriented System Engineering
-
-
Yang, R.1
Xu, J.2
-
10
-
-
0001314414
-
The evolution of the recovery block concept
-
New York, NY, USA: Wiley
-
B. Randell and J. Xu, "The evolution of the recovery block concept," in Softw. Fault Tolerance, New York, NY, USA: Wiley, 1995.
-
(1995)
Softw. Fault Tolerance
-
-
Randell, B.1
Xu, J.2
-
13
-
-
85076887355
-
Apollo: Scalable and coordinated scheduling for cloud-scale computing
-
E. Boutin, J. Ekanayake, W. Lin, B. Shi, J. Zhou, Z. Qian, M. Wu, and L. Zhou, "Apollo: Scalable and coordinated scheduling for cloud-scale computing," in Proc. 11th USENIX Conf. Operating Syst. Des. Implementation, 2014, pp. 285-300.
-
(2014)
Proc. 11th USENIX Conf. Operating Syst. Des. Implementation
, pp. 285-300
-
-
Boutin, E.1
Ekanayake, J.2
Lin, W.3
Shi, B.4
Zhou, J.5
Qian, Z.6
Wu, M.7
Zhou, L.8
-
14
-
-
85044264755
-
-
(2013). [Online].Available: Https://issues.apache.org/jira/browse/YARN-556
-
(2013)
-
-
-
15
-
-
85044300760
-
-
(2013). [Online].Available: Https://issues.apache.org/jira/browse/YARN-1336
-
(2013)
-
-
-
16
-
-
12344308304
-
Basic concepts and taxonomy of dependable and secure computing
-
Jan.-Mar.
-
A. Avizienis, J.-C. Laprie, B. Randell, and C. Landwehr, "Basic concepts and taxonomy of dependable and secure computing," IEEE Trans. Dependable Secure Comput., vol. 1, no. 1, pp. 11-33, Jan.-Mar. 2004.
-
(2004)
IEEE Trans. Dependable Secure Comput.
, vol.1
, Issue.1
, pp. 11-33
-
-
Avizienis, A.1
Laprie, J.-C.2
Randell, B.3
Landwehr, C.4
-
17
-
-
84898609036
-
An empirical failureanalysis of a large-scale cloud computing environment
-
P. Garraghan, P. Townend, and J. Xu, "An empirical failureanalysis of a large-scale cloud computing environment," in Proc. IEEE 15th Int. Symp. High-Assurance Syst. Eng., 2014, pp. 113-120.
-
(2014)
Proc. IEEE 15th Int. Symp. High-Assurance Syst. Eng.
, pp. 113-120
-
-
Garraghan, P.1
Townend, P.2
Xu, J.3
-
18
-
-
84870524514
-
Heterogeneity and dynamicity of clouds at scale: Google trace analysis
-
Art. no. 7
-
C. Reiss, A. Tumanov, G. R. Ganger, R. H. Katz, and M. A. Kozuch, "Heterogeneity and dynamicity of clouds at scale: Google trace analysis," in Proc. 3rdACMSymp. Cloud Comput., 2012, Art. no. 7.
-
(2012)
Proc. 3rdACMSymp. Cloud Comput.
-
-
Reiss, C.1
Tumanov, A.2
Ganger, G.R.3
Katz, R.H.4
Kozuch, M.A.5
-
19
-
-
78649459815
-
Cdrm: A cost-effective dynamic replication management scheme for cloud storage cluster
-
Q. Wei, B. Veeravalli, B. Gong, L. Zeng, and D. Feng, "Cdrm: A cost-effective dynamic replication management scheme for cloud storage cluster," in Proc. IEEE Int. Conf. Cluster Comput., 2010, pp. 188-196.
-
(2010)
Proc. IEEE Int. Conf. Cluster Comput.
, pp. 188-196
-
-
Wei, Q.1
Veeravalli, B.2
Gong, B.3
Zeng, L.4
Feng, D.5
-
20
-
-
80053400298
-
Adaptive fault tolerance in real time cloud computing
-
S. Malik and F. Huet, "Adaptive fault tolerance in real time cloud computing," in Proc. IEEE World Congr. Servi., 2011, pp. 280-287.
-
(2011)
Proc. IEEE World Congr. Servi.
, pp. 280-287
-
-
Malik, S.1
Huet, F.2
-
21
-
-
84962844401
-
D2ps: A dependable data provisioning service in multi-tenants cloud environments
-
R. Yang, T. Wo, C. Hu, J. Xu, and M. Zhang, "D2ps: A dependable data provisioning service in multi-tenants cloud environments," in Proc. IEEE 17th Int. Symp. High Assurance Syst. Eng., 2016, pp. 252-259.
-
(2016)
Proc. IEEE 17th Int. Symp. High Assurance Syst. Eng.
, pp. 252-259
-
-
Yang, R.1
Wo, T.2
Hu, C.3
Xu, J.4
Zhang, M.5
-
23
-
-
76849100508
-
Failure-aware resource management for high-availability computing clusters with distributed virtual machines
-
S. Fu, "Failure-aware resource management for high-availability computing clusters with distributed virtual machines," J. Parallel Distrib. Comput., vol. 70, no. 4, pp. 384-393, 2010.
-
(2010)
J. Parallel Distrib. Comput.
, vol.70
, Issue.4
, pp. 384-393
-
-
Fu, S.1
-
24
-
-
85044273960
-
-
(2013). Amazon web services suffers outage [Online]. Available: Http://www.zdnet.com/article/amazon-web-services-suffersoutage-takes-d own-vine-instagram-others-with-it/
-
(2013)
Amazon Web Services Suffers Outage
-
-
-
25
-
-
0003217728
-
The methodology of n-version programming
-
Hoboken, NJ, USA: Wiley
-
A. Avizienis, "The methodology of n-version programming," in Software Fault Tolerance, Hoboken, NJ, USA: Wiley, 1995.
-
(1995)
Software Fault Tolerance
-
-
Avizienis, A.1
-
27
-
-
80051928903
-
A scalable availability model for infrastructure-as-a-service cloud
-
F. Longo, R. Ghosh, V. K. Naik, and K. S. Trivedi, "A scalable availability model for infrastructure-as-a-service cloud," in Proc. IEEE/IFIP 41st Int. Conf. Dependable Syst. Netw., 2011, pp. 335-346.
-
(2011)
Proc. IEEE/IFIP 41st Int. Conf. Dependable Syst. Netw.
, pp. 335-346
-
-
Longo, F.1
Ghosh, R.2
Naik, V.K.3
Trivedi, K.S.4
-
28
-
-
0029212717
-
Reliability analysis of a complex standby redundant systems
-
R. Subramanian and V. Anantharaman, "Reliability analysis of a complex standby redundant systems," Rel. Eng. Syst. Safety, vol. 48, no. 1, pp. 57-70, 1995.
-
(1995)
Rel. Eng. Syst. Safety
, vol.48
, Issue.1
, pp. 57-70
-
-
Subramanian, R.1
Anantharaman, V.2
-
29
-
-
79951761350
-
Zookeeper: Wait-free coordination for internet-scale systems
-
P. Hunt, M. Konar, F. P. Junqueira, and B. Reed, "Zookeeper: Wait-free coordination for internet-scale systems," in Proc. USENIX Conf. USENIX Annu. Tech. Conf., 2010, p. 11.
-
(2010)
Proc. USENIX Conf. USENIX Annu. Tech. Conf.
, pp. 11
-
-
Hunt, P.1
Konar, M.2
Junqueira, F.P.3
Reed, B.4
-
30
-
-
85065181066
-
The chubby lock service for loosely-coupled distributed systems
-
M. Burrows, "The chubby lock service for loosely-coupled distributed systems," in Proc. 7th Symp. Operating Syst. Des. Implementation, 2006, pp. 335-350.
-
(2006)
Proc. 7th Symp. Operating Syst. Des. Implementation
, pp. 335-350
-
-
Burrows, M.1
-
31
-
-
84930247783
-
An analysis of failure-related energy waste in a large-scale cloud environment
-
Jun.
-
P. Garraghan, I. S. Moreno, P. Townend, and J. Xu, "An analysis of failure-related energy waste in a large-scale cloud environment," IEEE Trans. Emerging Topics Comput., vol. 2, no. 2, pp. 166-180, Jun. 2014.
-
(2014)
IEEE Trans. Emerging Topics Comput.
, vol.2
, Issue.2
, pp. 166-180
-
-
Garraghan, P.1
Moreno, I.S.2
Townend, P.3
Xu, J.4
-
32
-
-
0023090161
-
Checkpointing and rollback-recovery for distributed systems
-
Jan.
-
R. Koo and S. Toueg, "Checkpointing and rollback-recovery for distributed systems," IEEE Trans. Softw. Eng., vol. SE-13, no. 1, pp. 23-31, Jan. 1987.
-
(1987)
IEEE Trans. Softw. Eng.
, vol.SE13
, Issue.1
, pp. 23-31
-
-
Koo, R.1
Toueg, S.2
-
34
-
-
84946125131
-
Service-oriented computing: Concepts, characteristics and directions
-
M. P. Papazoglou, "Service-oriented computing: Concepts, characteristics and directions," in Proc. 4th Int. Conf. Web Inform. Syst. Eng., 2003, 3-12.
-
(2003)
Proc. 4th Int. Conf. Web Inform. Syst. Eng.
, pp. 3-12
-
-
Papazoglou, M.P.1
-
35
-
-
0036601844
-
Grid services for distributed system integration
-
Jun.
-
I. Foster, C. Kesselman, J. M. Nick, and S. Tuecke, "Grid services for distributed system integration," in IEEE Comput., vol. 35, no. 6, pp. 37-46, Jun. 2002.
-
(2002)
IEEE Comput.
, vol.35
, Issue.6
, pp. 37-46
-
-
Foster, I.1
Kesselman, C.2
Nick, J.M.3
Tuecke, S.4
-
36
-
-
84995887987
-
Hotrestore: A fast restore system for virtual machine cluster
-
L. Cui, J. Li, T. Wo, B. Li, R. Yang, Y. Cao, and J. Huai, "Hotrestore: A fast restore system for virtual machine cluster," in Proc. 28th USENIX Conf. Large Installation Syst. Admin., 2014, pp. 1-16.
-
(2014)
Proc. 28th USENIX Conf. Large Installation Syst. Admin.
, pp. 1-16
-
-
Cui, L.1
Li, J.2
Wo, T.3
Li, B.4
Yang, R.5
Cao, Y.6
Huai, J.7
-
37
-
-
84929574917
-
Large-scale cluster management at google with borg
-
Art. no. 18
-
A. Verma, L. Pedrosa, M. Korupolu, D. Oppenheimer, E. Tune, and J. Wilkes, "Large-scale cluster management at google with borg," in Proc. 10th Eur. Conf. Comput. Syst., 2015, Art. no. 18.
-
(2015)
Proc. 10th Eur. Conf. Comput. Syst.
-
-
Verma, A.1
Pedrosa, L.2
Korupolu, M.3
Oppenheimer, D.4
Tune, E.5
Wilkes, J.6
-
38
-
-
78650831692
-
Design, modeling, and evaluation of a scalable multi-level checkpointing system
-
A. Moody, G. Bronevetsky, K. Mohror, and B. R. De Supinski, "Design, modeling, and evaluation of a scalable multi-level checkpointing system," in Proc. Int. Conf. High Perform. Comput. Netw. Storage Anal., 2010, pp. 1-11.
-
(2010)
Proc. Int. Conf. High Perform. Comput. Netw. Storage Anal.
, pp. 1-11
-
-
Moody, A.1
Bronevetsky, G.2
Mohror, K.3
De Supinski, B.R.4
-
39
-
-
84903598167
-
VMCSnap: Taking snapshots of virtual machine cluster with memory deduplication
-
Y. Huang, R. Yang, L. Cui, T. Wo, C. Hu, and B. Li, "VMCSnap: Taking snapshots of virtual machine cluster with memory deduplication," in Proc. IEEE 8th Int. Symp. Serv. Oriented Syst. Eng., 2014, pp. 314-319.
-
(2014)
Proc. IEEE 8th Int. Symp. Serv. Oriented Syst. Eng.
, pp. 314-319
-
-
Huang, Y.1
Yang, R.2
Cui, L.3
Wo, T.4
Hu, C.5
Li, B.6
-
40
-
-
84988273398
-
Consnap: Taking continuous snapshots for running state protection of virtual machines
-
J. Li, J. Zheng, L. Cui, and R. Yang, "Consnap: Taking continuous snapshots for running state protection of virtual machines," in Proc. IEEE 20th Int. Conf. Parallel Distrib. Syst., 2014, pp. 677-684.
-
(2014)
Proc. IEEE 20th Int. Conf. Parallel Distrib. Syst.
, pp. 677-684
-
-
Li, J.1
Zheng, J.2
Cui, L.3
Yang, R.4
-
41
-
-
84870488163
-
-
EECS Dept. Univ. California, Berkeley, CA, USA, Tech. Rep. UCB/EECS-2012-17
-
Y. Chen, S. Alspaugh, and R. H. Katz, "Design insights for Map-Reduce from diverse production workloads," EECS Dept. Univ. California, Berkeley, CA, USA, Tech. Rep. UCB/EECS-2012-17, 2012.
-
(2012)
Design Insights for Map-Reduce from Diverse Production Workloads
-
-
Chen, Y.1
Alspaugh, S.2
Katz, R.H.3
-
42
-
-
77954901315
-
An analysis of traces from a production MapReduce cluster
-
S. Kavulya, J. Tan, R. Gandhi, and P. Narasimhan, "An analysis of traces from a production MapReduce cluster," in Proc. IEEE/ACM 10th Int. Conf. Cluster, Cloud Grid Comput., 2010, pp. 94-103.
-
(2010)
Proc. IEEE/ACM 10th Int. Conf. Cluster, Cloud Grid Comput.
, pp. 94-103
-
-
Kavulya, S.1
Tan, J.2
Gandhi, R.3
Narasimhan, P.4
-
43
-
-
85031898917
-
Towards characterizing cloud backend workloads: Insights from google compute clusters
-
A. K. Mishra, J. L. Hellerstein, W. Cirne, and C. R. Das, "Towards characterizing cloud backend workloads: Insights from google compute clusters," ACM SIGMETRICS Perform. Eval. Rev., vol. 37, no. 4, pp. 34-41, 2010.
-
(2010)
ACM SIGMETRICS Perform. Eval. Rev.
, vol.37
, Issue.4
, pp. 34-41
-
-
Mishra, A.K.1
Hellerstein, J.L.2
Cirne, W.3
Das, C.R.4
-
44
-
-
84965042403
-
Analysis, modeling and simulation of workload patterns in a large-scale utility cloud
-
Apr.-Jun.
-
I. Solis Moreno, P. Garraghan, P. Townend, and J. Xu, "Analysis, modeling and simulation of workload patterns in a large-scale utility cloud," IEEE Trans. Cloud Comput., vol. 2, no. 2, pp. 208-221, Apr.-Jun. 2014.
-
(2014)
IEEE Trans. Cloud Comput.
, vol.2
, Issue.2
, pp. 208-221
-
-
Solis Moreno, I.1
Garraghan, P.2
Townend, P.3
Xu, J.4
-
45
-
-
84881145178
-
An analysis of the server characteristics and resource utilization in Google cloud
-
P. Garraghan, P. Townend, and J. Xu, "An analysis of the server characteristics and resource utilization in Google cloud," in Proc. IEEE Int. Conf. Cloud Eng., 2013, pp. 124-131.
-
(2013)
Proc. IEEE Int. Conf. Cloud Eng.
, pp. 124-131
-
-
Garraghan, P.1
Townend, P.2
Xu, J.3
-
46
-
-
84889640333
-
Sparrow: Distributed, low latency scheduling
-
K. Ousterhout, P. Wendell, M. Zaharia, and I. Stoica, "Sparrow: Distributed, low latency scheduling," in Proc. 24th ACM Symp. Operating Syst. Principles, 2013, pp. 69-84.
-
(2013)
Proc. 24th ACM Symp. Operating Syst. Principles
, pp. 69-84
-
-
Ousterhout, K.1
Wendell, P.2
Zaharia, M.3
Stoica, I.4
-
47
-
-
84873622276
-
The tail at scale
-
J. Dean and L. A. Barroso, "The tail at scale," in ACM Commun., vol. 56, no. 2, pp. 74-80, 2013.
-
(2013)
ACM Commun.
, vol.56
, Issue.2
, pp. 74-80
-
-
Dean, J.1
Barroso, L.A.2
-
48
-
-
84962886155
-
Timely long tail identification through agent based monitoring and analytics
-
P. Garraghan, X. Ouyang, P. Townend, and J. Xu, "Timely long tail identification through agent based monitoring and analytics," proc. IEEE 18th Int. Symp. Real-Time Distrib. Comput., 2015, pp. 19-26.
-
(2015)
Proc. IEEE 18th Int. Symp. Real-Time Distrib. Comput.
, pp. 19-26
-
-
Garraghan, P.1
Ouyang, X.2
Townend, P.3
Xu, J.4
|