-
3
-
-
84863973589
-
A virtual memory based runtime to support multi-tenancy in clusters with GPUs
-
June
-
M. Becchi, K. Sajjapongse, I. Graves, A. Procter, V. Ravi, and S. Chakradhar. A virtual memory based runtime to support multi-tenancy in clusters with GPUs. In Proc. of the Intl. Symposium on High-Perf. Parallel and Distributed Computing, pages 97-108, June 2012.
-
(2012)
Proc. of the Intl. Symposium on High-Perf. Parallel and Distributed Computing
, pp. 97-108
-
-
Becchi, M.1
Sajjapongse, K.2
Graves, I.3
Procter, A.4
Ravi, V.5
Chakradhar, S.6
-
5
-
-
84880082129
-
COSMIC: Middleware for high performance and reliable multiprocessing on Xeon Phi coprocessors
-
June
-
S. Cadambi, G. Coviello, C.-H. Li, R. Phull, K. Rao, M. Sankaradass, and S. Chakradhar. COSMIC: Middleware for high performance and reliable multiprocessing on Xeon Phi coprocessors. In Proc. of the Intl. Symposium on High-Perf. Parallel and Distributed Computing, pages 215-226, June 2013.
-
(2013)
Proc. of the Intl. Symposium on High-Perf. Parallel and Distributed Computing
, pp. 215-226
-
-
Cadambi, S.1
Coviello, G.2
Li, C.-H.3
Phull, R.4
Rao, K.5
Sankaradass, M.6
Chakradhar, S.7
-
6
-
-
70450206305
-
Toward exascale resilience
-
Nov.
-
F. Cappello, A. Geist, B. Gropp, L. Kale, B. Kramer, and M. Snir. Toward exascale resilience. Int. J. High Perform. Comput. Appl., 23(4):374-388, Nov. 2009.
-
(2009)
Int. J. High Perform. Comput. Appl.
, vol.23
, Issue.4
, pp. 374-388
-
-
Cappello, F.1
Geist, A.2
Gropp, B.3
Kale, L.4
Kramer, B.5
Snir, M.6
-
7
-
-
0022020346
-
Distributed snapshots: Determining global states of distributed systems
-
Feb.
-
K. M. Chandy and L. Lamport. Distributed snapshots: Determining global states of distributed systems. ACM Tran. on Computer Syst., 3(1):63-75, Feb. 1985.
-
(1985)
ACM Tran. on Computer Syst.
, vol.3
, Issue.1
, pp. 63-75
-
-
Chandy, K.M.1
Lamport, L.2
-
8
-
-
85059766484
-
Live migration of virtual machines
-
C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach, I. Pratt, and A. Warfield. Live migration of virtual machines. In Proc. of the USENIX Conf. on Networked Syst. Design and Implementation, pages 273-286, 2005.
-
(2005)
Proc. of the USENIX Conf. on Networked Syst. Design and Implementation
, pp. 273-286
-
-
Clark, C.1
Fraser, K.2
Hand, S.3
Hansen, J.G.4
Jul, E.5
Limpach, C.6
Pratt, I.7
Warfield, A.8
-
12
-
-
0042078549
-
A survey of rollback-recovery protocols in message-passing systems
-
Sept.
-
E. N. M. Elnozahy, L. Alvisi, Y.-M. Wang, and D. B. Johnson. A survey of rollback-recovery protocols in message-passing systems. ACM Comput. Surv., 34(3):375-408, Sept. 2002.
-
(2002)
ACM Comput. Surv.
, vol.34
, Issue.3
, pp. 375-408
-
-
Elnozahy, E.N.M.1
Alvisi, L.2
Wang, Y.-M.3
Johnson, D.B.4
-
13
-
-
84904445454
-
-
http://www.criu.org/.
-
-
-
-
14
-
-
84904445455
-
-
http://www.top500.org/.
-
-
-
-
19
-
-
27544488196
-
Cruz: Application-transparent distributed checkpoint-restart on standard operating systems
-
June
-
G. J. Janakiraman, J. Renato, S. D. Subhraveti, and Y. Turner. Cruz: Application-transparent distributed checkpoint-restart on standard operating systems. In Dependable Systems and Networks, 2005. DSN 2005. Proceedings. International Conference on, pages 260-269, June 2005.
-
(2005)
Dependable Systems and Networks, 2005. DSN 2005. Proceedings. International Conference On
, pp. 260-269
-
-
Janakiraman, G.J.1
Renato, J.2
Subhraveti, S.D.3
Turner, Y.4
-
21
-
-
84884898622
-
Optimizing checkpoints using NVM as virtual memory
-
S. Kannan, A. Gavrilovska, K. Schwan, and D. Milojicic. Optimizing checkpoints using NVM as virtual memory. Proc. of Intl. Parallel and Distributed Processing Symposium, pages 29-40, 2013.
-
(2013)
Proc. of Intl. Parallel and Distributed Processing Symposium
, pp. 29-40
-
-
Kannan, S.1
Gavrilovska, A.2
Schwan, K.3
Milojicic, D.4
-
22
-
-
0023090161
-
Checkpointing and rollback-recovery for distributed systems
-
Jan.
-
R. Koo and S. Toueg. Checkpointing and rollback-recovery for distributed systems. IEEE Tran. on Software Engineering, SE-13(1):23-31, Jan. 1987.
-
(1987)
IEEE Tran. on Software Engineering
, vol.SE-13
, Issue.1
, pp. 23-31
-
-
Koo, R.1
Toueg, S.2
-
24
-
-
84978437417
-
The design and implementation of Zap: A system for migrating computing environments
-
Dec.
-
S. Osman, D. Subhraveti, G. Su, and J. Nieh. The design and implementation of Zap: A system for migrating computing environments. In Proc. of the USENIX Conf. on Oper. Syst. Design and Implementation, pages 361-376, Dec. 2002.
-
(2002)
Proc. of the USENIX Conf. on Oper. Syst. Design and Implementation
, pp. 361-376
-
-
Osman, S.1
Subhraveti, D.2
Su, G.3
Nieh, J.4
-
25
-
-
84863933095
-
Interference-driven resource management for GPU-based heterogeneous clusters
-
June
-
R. Phull, C.-H. Li, K. Rao, H. Cadambi, and S. Chakradhar. Interference-driven resource management for GPU-based heterogeneous clusters. In Proc. of the Intl. Symposium on High-Perf. Parallel and Distributed Computing, pages 109-120, June 2012.
-
(2012)
Proc. of the Intl. Symposium on High-Perf. Parallel and Distributed Computing
, pp. 109-120
-
-
Phull, R.1
Li, C.-H.2
Rao, K.3
Cadambi, H.4
Chakradhar, S.5
-
28
-
-
34548771116
-
DejaVu: Transparent user-level checkpointing, migration, and recovery for distributed systems
-
Mar.
-
J. F. Ruscio, M. A. Heffner, and S. Varadarajan. DejaVu: Transparent user-level checkpointing, migration, and recovery for distributed systems. In Proc. of Intl. Parallel and Distributed Processing Symposium, Mar. 2007.
-
(2007)
Proc. of Intl. Parallel and Distributed Processing Symposium
-
-
Ruscio, J.F.1
Heffner, M.A.2
Varadarajan, S.3
-
30
-
-
77950267881
-
A survey of online failure prediction methods
-
Mar.
-
F. Salfner, M. Lenk, and M. Malek. A survey of online failure prediction methods. ACM Comput. Surv., 42(3), Mar. 2010.
-
(2010)
ACM Comput. Surv.
, vol.42
, Issue.3
-
-
Salfner, F.1
Lenk, M.2
Malek, M.3
-
31
-
-
20444444457
-
The LAM/MPI checkpoint/restart framework: System-initiated checkpointing
-
S. Sankaran, J. M. Squyres, B. Barrett, and A. Lumsdaine. The LAM/MPI checkpoint/restart framework: System-initiated checkpointing. In Proc. of the Symposium of Los Alamos Computer Science Institute, pages 479-493, 2003.
-
(2003)
Proc. of the Symposium of los Alamos Computer Science Institute
, pp. 479-493
-
-
Sankaran, S.1
Squyres, J.M.2
Barrett, B.3
Lumsdaine, A.4
-
32
-
-
84877700680
-
Design and modeling of a non-blocking checkpointing system
-
Nov.
-
K. Sato, N. Maruyama, K. Mohror, A. Moody, T. Gamblin, B. R. de Supinski, and S. Matsuoka. Design and modeling of a non-blocking checkpointing system. In Proc. of the ACM/IEEE Intl. Conf. for High Perf. Computing, Networking, Storage and Analysis, Nov. 2012.
-
(2012)
Proc. of the ACM/IEEE Intl. Conf. for High Perf. Computing, Networking, Storage and Analysis
-
-
Sato, K.1
Maruyama, N.2
Mohror, K.3
Moody, A.4
Gamblin, T.5
De Supinski, B.R.6
Matsuoka, S.7
-
34
-
-
80053270870
-
CheCL: Transparent checkpointing and process migration of OpenCL applications
-
May
-
H. Takizawa, K. Koyama, K. Sato, K. Komatsu, and H. Kobayashi. CheCL: Transparent checkpointing and process migration of OpenCL applications. In Proc. of Intl. Parallel and Distributed Processing Symposium, pages 864-876, May 2011.
-
(2011)
Proc. of Intl. Parallel and Distributed Processing Symposium
, pp. 864-876
-
-
Takizawa, H.1
Koyama, K.2
Sato, K.3
Komatsu, K.4
Kobayashi, H.5
-
35
-
-
77950975351
-
CheCUDA: A checkpoint/restart tool for CUDA applications
-
Dec.
-
H. Takizawa, K. Sato, K. Komatsu, and H. Kobayashi. CheCUDA: A checkpoint/restart tool for CUDA applications. In Proc. of the Intl. Conf. on Parallel and Distributed Computing, Applications and Technologies, pages 408-413, Dec. 2009.
-
(2009)
Proc. of the Intl. Conf. on Parallel and Distributed Computing, Applications and Technologies
, pp. 408-413
-
-
Takizawa, H.1
Sato, K.2
Komatsu, K.3
Kobayashi, H.4
-
36
-
-
16244422723
-
Checkpointing and migration of unix processes in the Condor distributed processing system
-
Feb.
-
T. Tannenbaum and M. Litzkow. Checkpointing and migration of unix processes in the Condor distributed processing system. Dr Dobbs Journal, Feb. 1995.
-
(1995)
Dr Dobbs Journal
-
-
Tannenbaum, T.1
Litzkow, M.2
-
37
-
-
70350755748
-
Proactive process-level live migration in HPC environments
-
C. Wang, F. Mueller, C. Engelmann, and S. L. Scott. Proactive process-level live migration in HPC environments. In Proc. of the ACM/IEEE Intl. Conf. for High Perf. Computing, Networking, Storage and Analysis, 2008.
-
(2008)
Proc. of the ACM/IEEE Intl. Conf. for High Perf. Computing, Networking, Storage and Analysis
-
-
Wang, C.1
Mueller, F.2
Engelmann, C.3
Scott, S.L.4
|