-
1
-
-
72049130706
-
-
Technical Report UTEP-CS-08-24
-
S. Arunagiri, J. Daly, P. Teller, S. Seelam, R. Oldfield, M. Varela, and R. Riesen, "Opportunistic Checkpoint Intervals to Improve System Performance," Technical Report UTEP-CS-08-24, 2008.
-
(2008)
Opportunistic Checkpoint Intervals to Improve System Performance
-
-
Arunagiri, S.1
Daly, J.2
Teller, P.3
Seelam, S.4
Oldfield, R.5
Varela, M.6
Riesen, R.7
-
2
-
-
84976789801
-
The recovery box: Using fast recovery to provide high availability in the UNIX environment
-
M. Baker and M. Sullivan, "The Recovery Box: Using Fast Recovery to Provide High Availability in the UNIX Environment," Proc. Summer USENIX Technical Conf., 1992.
-
(1992)
Proc. Summer USENIX Technical Conf.
-
-
Baker, M.1
Sullivan, M.2
-
3
-
-
85059766484
-
Live migration of virtual machines
-
C. Clark, K. Fraser, H. Steven, J. Hansen, E. Jul, C. Limpach, I. Pratt, and A. Warfield, "Live Migration of Virtual Machines," Proc. ACM/USENIX Symp. Networked Systems Design and Implementation, 2005.
-
(2005)
Proc. ACM/USENIX Symp. Networked Systems Design and Implementation
-
-
Clark, C.1
Fraser, K.2
Steven, H.3
Hansen, J.4
Jul, E.5
Limpach, C.6
Pratt, I.7
Warfield, A.8
-
4
-
-
27544461132
-
A model for predicting the optimum checkpoint interval for restart dumps
-
J. Daly, "A Model for Predicting the Optimum Checkpoint Interval for Restart Dumps," Proc. Int'l Conf. Computational Science, 2003.
-
(2003)
Proc. Int'l Conf. Computational Science
-
-
Daly, J.1
-
6
-
-
9144223280
-
Checkpointing for peta-scale systems: A look into the future of practical rollback-recovery
-
Apr.-June
-
E. Elnozahy and J. Plank, "Checkpointing for Peta-Scale Systems: A Look Into the Future of Practical Rollback-Recovery," IEEE Trans. Dependable and Secure Computing, vol. 1, no. 2, pp. 97-108, Apr.-June 2004.
-
(2004)
IEEE Trans. Dependable and Secure Computing
, vol.1
, Issue.2
, pp. 97-108
-
-
Elnozahy, E.1
Plank, J.2
-
9
-
-
31344436964
-
On designing direct dependency - Based fast recovery algorithms for distributed systems
-
DOI 10.1145/974104.974110
-
B. Gupta, Z. Liu, and Z. Liang, "On Designing Direct Dependency-Based Fast Recovery Algorithms for Distributed Systems," ACM SIGOPS Operating Systems Rev., vol. 38, no. 1, pp. 58-73, 2004. (Pubitemid 46746979)
-
(2004)
Operating Systems Review (ACM)
, vol.38
, Issue.1
, pp. 58-73
-
-
Gupta, B.1
Liu, Z.2
Liang, Z.3
-
11
-
-
77950594233
-
SPEC CPU2000 memory footprint
-
J. Henning, "SPEC CPU2000 Memory Footprint," ACM SIGARCH Computer Architecture News, vol. 35, no. 1, pp. 84-89, 2007.
-
(2007)
ACM SIGARCH Computer Architecture News
, vol.35
, Issue.1
, pp. 84-89
-
-
Henning, J.1
-
12
-
-
0032095071
-
Virtual memory: Issues of implementation
-
B. Jacob and T. Mudge, "Virtual Memory: Issues of Implementation," Computer, vol. 31, no. 6, pp. 33-43, June 1998. (Pubitemid 128550816)
-
(1998)
Computer
, vol.31
, Issue.6
, pp. 33-43
-
-
Jacob, B.1
Mudge, T.2
-
13
-
-
85160681664
-
Transparent checkpoint-restart of multiple processes on commodity operating systems
-
O. Laadan and J. Nieh, "Transparent Checkpoint-Restart of Multiple Processes on Commodity Operating Systems," Proc. USENIX Ann. Technical Conf., 2007.
-
(2007)
Proc. USENIX Ann. Technical Conf.
-
-
Laadan, O.1
Nieh, J.2
-
14
-
-
57049111494
-
Adaptive fault management of parallel applications for high performance computing
-
Dec.
-
Z. Lan and Y. Li, "Adaptive Fault Management of Parallel Applications for High Performance Computing," IEEE Trans. Computers, vol. 57, no. 12, pp. 1647-1660, Dec. 2008.
-
(2008)
IEEE Trans. Computers
, vol.57
, Issue.12
, pp. 1647-1660
-
-
Lan, Z.1
Li, Y.2
-
16
-
-
67649883517
-
Fault-aware runtime strategies for high-performance computing
-
Apr.
-
Y. Li, Z. Lan, P. Gujrati, and X. Sun, "Fault-Aware Runtime Strategies for High-Performance Computing," IEEE Trans. Parallel and Distributed Systems, vol. 20, no. 4, pp. 460-473, Apr. 2009.
-
(2009)
IEEE Trans. Parallel and Distributed Systems
, vol.20
, Issue.4
, pp. 460-473
-
-
Li, Y.1
Lan, Z.2
Gujrati, P.3
Sun, X.4
-
17
-
-
0028485392
-
Low-latency, concurrent checkpointing for parallel programs
-
Aug.
-
K. Li, J. Naughton, and J.S. Plank, "Low-Latency, Concurrent Checkpointing for Parallel Programs," IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 8, pp. 874-879, Aug. 1994.
-
(1994)
IEEE Trans. Parallel and Distributed Systems
, vol.5
, Issue.8
, pp. 874-879
-
-
Li, K.1
Naughton, J.2
Plank, J.S.3
-
18
-
-
0035390088
-
A variational calculus approach to optimal checkpoint placement
-
DOI 10.1109/12.936236
-
Y. Ling, J. Mi, and X. Lin, "A Variational Calculus Approach to Optimal Checkpoint Placement," IEEE Trans. Computers, vol. 50, no. 7, pp. 699-708, July 2001. (Pubitemid 32720123)
-
(2001)
IEEE Transactions on Computers
, vol.50
, Issue.7
, pp. 699-708
-
-
Ling, Y.1
Mi, J.2
Lin, X.3
-
20
-
-
0345044000
-
Process migration
-
D. Milojičić, F. Douglis, Y. Paindaveine, R. Wheeler, and S. Zhou, "Process Migration," ACM Computing Surveys, vol. 32, no. 3, pp. 241-299, 2000.
-
(2000)
ACM Computing Surveys
, vol.32
, Issue.3
, pp. 241-299
-
-
Milojičić, D.1
Douglis, F.2
Paindaveine, Y.3
Wheeler, R.4
Zhou, S.5
-
21
-
-
79953179921
-
-
NCSA web site
-
NCSA web site, http://teragrid.ncsa.uiuc.edu, 2009.
-
(2009)
-
-
-
22
-
-
34547424386
-
Cooperative checkpointing: A robust approach to large-scale systems reliability
-
A. Oliner, L. Rudolph, and R. Sahoo, "Cooperative Checkpointing: A Robust Approach to Large-Scale Systems Reliability," Proc. Int'l Conf. Supercomputing, 2006.
-
(2006)
Proc. Int'l Conf. Supercomputing
-
-
Oliner, A.1
Rudolph, L.2
Sahoo, R.3
-
23
-
-
79953221715
-
-
OpenSolaris
-
OpenSolaris, http://hub.opensolaris.org, 2010.
-
(2010)
-
-
-
24
-
-
79953192410
-
-
Oracle high availability document
-
Oracle high availability document, http://www.oracle.com/technology/ deploy/availability/htdocs/fs-on-demand-rollback.htm, 2010.
-
(2010)
-
-
-
25
-
-
0004015896
-
-
Technical Report UCB//CSD-02-1175, UC Berkeley Computer Science
-
D. Patterson et al., "Recovery-Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies," Technical Report UCB//CSD-02-1175, UC Berkeley Computer Science, 2002.
-
(2002)
Recovery-Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies
-
-
Patterson, D.1
-
26
-
-
0033077475
-
Memory exclusion: Optimizing the performance of checkpointing systems
-
J. Plank, Y. Chen, K. Li, M. Beck, and G. Kingsley, "Memory Exclusion: Optimizing the Performance of Checkpointing Systems," Software-Practice and Experience, vol. 29, no. 2, pp. 125-142, 1999.
-
(1999)
Software-Practice and Experience
, vol.29
, Issue.2
, pp. 125-142
-
-
Plank, J.1
Chen, Y.2
Li, K.3
Beck, M.4
Kingsley, G.5
-
27
-
-
0032179680
-
Diskless checkpointing
-
J. Plank, K. Li, and M. Puening, "Diskless Checkpointing," IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 10, pp. 972-986, Oct. 1998. (Pubitemid 128747893)
-
(1998)
IEEE Transactions on Parallel and Distributed Systems
, vol.9
, Issue.10
, pp. 972-986
-
-
Plank, J.S.1
Li, K.2
Puening, M.A.3
-
28
-
-
0035201417
-
Processor allocation and checkpoint interval selection in cluster computing systems
-
DOI 10.1006/jpdc.2001.1757
-
J. Plank and M.G. Thomason, "Processor Allocation and Checkpoint Interval Selection in Cluster Computing Systems," J. Parallel and Distributed Computing, vol. 61, no. 11, pp. 1570-1590, 2001. (Pubitemid 33119054)
-
(2001)
Journal of Parallel and Distributed Computing
, vol.61
, Issue.11
, pp. 1570-1590
-
-
Plank, J.S.1
Thomason, M.G.2
-
29
-
-
0033721199
-
The cost of recovery in message logging protocols
-
Mar./Apr.
-
S. Rao, L. Alvisi, and H. Vin, "The Cost of Recovery in Message Logging Protocols," IEEE Trans. Knowledge and Data Eng., vol. 12, no. 2, pp. 160-173, Mar./Apr. 2000.
-
(2000)
IEEE Trans. Knowledge and Data Eng.
, vol.12
, Issue.2
, pp. 160-173
-
-
Rao, S.1
Alvisi, L.2
Vin, H.3
-
31
-
-
79953216957
-
-
SPEC CPU benchmark
-
SPEC CPU benchmark, http://www.spec.org/cpu2006/, 2006.
-
(2006)
-
-
-
32
-
-
12444268355
-
On the feasibility of incremental checkpointing for scientific computing
-
J. Sancho, F. Petrini, G. Johnson, J. Fernández, and E. Frachtenberg, "On the Feasibility of Incremental Checkpointing for Scientific Computing," Proc. Int'l Parallel and Distributed Processing Symp., 2004.
-
(2004)
Proc. Int'l Parallel and Distributed Processing Symp.
-
-
Sancho, J.1
Petrini, F.2
Johnson, G.3
Fernández, J.4
Frachtenberg, E.5
-
35
-
-
39449084838
-
Managing disruptions to supply chains
-
L. Snyder and Z. Shen, "Managing Disruptions to Supply Chains," The Bridge, vol. 36, no. 4, pp. 39-45, 2006.
-
(2006)
The Bridge
, vol.36
, Issue.4
, pp. 39-45
-
-
Snyder, L.1
Shen, Z.2
-
36
-
-
0029251277
-
The condor distributed processing system
-
T. Tannenbaum and M. Litzkow, "The Condor Distributed Processing System," Dr. Dobb's J., vol. 227, pp. 40-48, 1995.
-
(1995)
Dr. Dobb's J.
, vol.227
, pp. 40-48
-
-
Tannenbaum, T.1
Litzkow, M.2
-
38
-
-
79953200370
-
-
The FreeBSD Project
-
The FreeBSD Project, http://www.freebsd.org, 2010.
-
(2010)
-
-
-
39
-
-
0031388399
-
Impact of checkpoint latency on overhead ratio of a checkpointing scheme
-
N. Vaidya, "Impact of Checkpoint Latency on Overhead Ratio of a Checkpointing Scheme," IEEE Trans. Computers, vol. 46, no. 8, pp. 942-947, 1997. (Pubitemid 127760644)
-
(1997)
IEEE Transactions on Computers
, vol.46
, Issue.8
, pp. 942-947
-
-
Vaidya, N.H.1
-
40
-
-
77952260024
-
On the design of a new linux readahead framework
-
F. Wu, H. Xi, and C. Xu, "On the Design of a New Linux Readahead Framework," ACM SIGOPS Operating Systems Rev., vol. 42, no.5, pp. 75-84, 2008.
-
(2008)
ACM SIGOPS Operating Systems Rev.
, vol.42
, Issue.5
, pp. 75-84
-
-
Wu, F.1
Xi, H.2
Xu, C.3
-
41
-
-
85130634439
-
Dynamically forecasting network performance using the network weather service
-
R. Wolski, "Dynamically Forecasting Network Performance Using the Network Weather Service," J. Cluster Computing, vol. 1, no.1, pp. 119-132, 1998.
-
(1998)
J. Cluster Computing
, vol.1
, Issue.1
, pp. 119-132
-
-
Wolski, R.1
-
42
-
-
84976846528
-
A first order approximation to the optimal checkpoint interval
-
J. Young, "A First Order Approximation to the Optimal Checkpoint Interval," Comm. ACM, vol. 17, no. 9, pp. 530-531, 1974.
-
(1974)
Comm. ACM
, vol.17
, Issue.9
, pp. 530-531
-
-
Young, J.1
-
43
-
-
20444463494
-
FTC-Charm++: An in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI
-
G. Zheng, L. Shi, and L. Kale, "FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI," Proc. IEEE Cluster Computing, 2004.
-
(2004)
Proc. IEEE Cluster Computing
-
-
Zheng, G.1
Shi, L.2
Kale, L.3
|