-
1
-
-
0032597670
-
An analysis of communication induced checkpointing
-
L. Alvisi, E. Elnozahy, S. Rao, S. Husain, and A. Mel. An analysis of communication induced checkpointing. In Proceedings of the symposium on fault-tolerant computing, pages 242-249, 1999.
-
(1999)
Proceedings of the symposium on fault-tolerant computing
, pp. 242-249
-
-
Alvisi, L.1
Elnozahy, E.2
Rao, S.3
Husain, S.4
Mel, A.5
-
2
-
-
0010689456
-
Some related article I wrote
-
January
-
I. M. Author. Some related article I wrote. Some Fine Journal, 99(7):1-100, January 1999.
-
(1999)
Some Fine Journal
, vol.99
, Issue.7
, pp. 1-100
-
-
Author, I.M.1
-
3
-
-
27144556171
-
A hybrid message logging-CIC protocol for constrained checkpointability
-
EUROPAR: Parallel Processing, 11th International EUROPAR Conference
-
Baude, Caromel, Delbe, and Henrio. A hybrid message logging-CIC protocol for constrained checkpointability. In EUROPAR: Parallel Processing, 11th International EUROPAR Conference. LNCS, 2005.
-
(2005)
LNCS
-
-
Baude, C.1
Delbe2
Henrio3
-
4
-
-
33746779994
-
Mpich-v: A multiprotocol fault tolerant mpi
-
fall
-
A. Bouteiller, T. Herault, G. Krawezik, P. Lemarinier, and F. Cappello. Mpich-v: a multiprotocol fault tolerant mpi. International Journal of High Performance Computing and Applications, 20(8):319-333, fall 2006.
-
(2006)
International Journal of High Performance Computing and Applications
, vol.20
, Issue.8
, pp. 319-333
-
-
Bouteiller, A.1
Herault, T.2
Krawezik, G.3
Lemarinier, P.4
Cappello, F.5
-
7
-
-
50649094637
-
-
T. L.-S. L. S. C. Leangsuksun, V. K. Munganuru and C. Engelmann. Asymmetric active-active high availability for In Proceedings of the 2nd International Workshop on Operating Systems, Programming Environments and Computing on Clusters (COSET-2), in conjunction with the 19th ACM
-
T. L.-S. L. S. C. Leangsuksun, V. K. Munganuru and C. Engelmann. Asymmetric active-active high availability for In Proceedings of the 2nd International Workshop on Operating Systems, Programming Environments and Computing on Clusters (COSET-2), in conjunction with the 19th ACM
-
-
-
-
8
-
-
34548782109
-
-
S. Chakravorty and L. V. Kalé. A fault tolerance protocol with fast faultrecovery. In IPDPS, pages 1-10. IEEE, 2007.
-
S. Chakravorty and L. V. Kalé. A fault tolerance protocol with fast faultrecovery. In IPDPS, pages 1-10. IEEE, 2007.
-
-
-
-
9
-
-
50649108554
-
S. Chakravorty, C. L. Mendes, and L. V. Kalé. Proactive fault tolerance in MPI applications via task migration. In Y. Robert, M. Parashar, R. Badrinath, and V. K. Prasanna, editors
-
HiPC, of, Springer
-
S. Chakravorty, C. L. Mendes, and L. V. Kalé. Proactive fault tolerance in MPI applications via task migration. In Y. Robert, M. Parashar, R. Badrinath, and V. K. Prasanna, editors, HiPC, volume 4297 of Lecture Notes in Computer Science, pages 485-496. Springer, 2006.
-
(2006)
Lecture Notes in Computer Science
, vol.4297
, pp. 485-496
-
-
-
10
-
-
0030197368
-
The weakest failure detector for solving consensus. JACM
-
Chandra, Hadzilacos, and Toueg. The weakest failure detector for solving consensus. JACM: Journal of the ACM, 43, 1996.
-
(1996)
Journal of the ACM
, vol.43
-
-
Chandra, H.1
Toueg2
-
11
-
-
0022020346
-
Distributed snapshots: Determining global states of distributed systems
-
feb
-
K. M. Chandy and L. Lamport. Distributed snapshots: Determining global states of distributed systems. ACM Transactions on Computer Systems (TOCS), 3(1):63-75, feb 1985.
-
(1985)
ACM Transactions on Computer Systems (TOCS)
, vol.3
, Issue.1
, pp. 63-75
-
-
Chandy, K.M.1
Lamport, L.2
-
12
-
-
31844451082
-
Building fault survivable MPI programs with FT-MPI using diskless-checkpointing
-
PPoPP, Chicago, IL, USA, June
-
Z. Chen, G. E. Fagg, E. Gabriel, J. Langou, T. Angskun, G. Bosilca, and J. Dongarra. Building fault survivable MPI programs with FT-MPI using diskless-checkpointing. In Proceedings of the tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 213-223, Chicago, IL, USA, June 2005.
-
(2005)
Proceedings of the tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
, pp. 213-223
-
-
Chen, Z.1
Fagg, G.E.2
Gabriel, E.3
Langou, J.4
Angskun, T.5
Bosilca, G.6
Dongarra, J.7
-
13
-
-
0042078549
-
A survey of rollback-recovery protocols in message-passing systems. CSURV
-
Elnozahy, Alvisi, Wang, and Johnson. A survey of rollback-recovery protocols in message-passing systems. CSURV: Computing Surveys, 34, 2002.
-
(2002)
Computing Surveys
, vol.34
-
-
Elnozahy, A.1
Wang2
Johnson3
-
14
-
-
50649119868
-
-
A Book He Wrote. His Publisher, Erewhon, NC
-
A. N. Expert. A Book He Wrote. His Publisher, Erewhon, NC, 1999.
-
(1999)
A. N. Expert
-
-
-
15
-
-
0022045868
-
Impossibility of distributed consensus with one faulty process. JACM
-
Fischer, Lynch, and Paterson. Impossibility of distributed consensus with one faulty process. JACM: Journal of the ACM, 32, 1985.
-
(1985)
Journal of the ACM
, vol.32
-
-
Fischer, L.1
Paterson2
-
16
-
-
0031122148
-
Software based replication for fault tolerance
-
apr
-
R. Guerraoui and A. Schiper. Software based replication for fault tolerance. IEEE Computer, 30(4):68-74, apr 1997.
-
(1997)
IEEE Computer
, vol.30
, Issue.4
, pp. 68-74
-
-
Guerraoui, R.1
Schiper, A.2
-
17
-
-
84964723149
-
-
E. D. Houda Lamehamedi, Boleslaw Szymanski and Z. Shentu. Data replication strategies in grid environments. In ICA3PP '02: Proceedings of the Fifth International Conference on Algorithms and Architectures for Parallel Processing, page 378, Washington, DC, USA, 2002. IEEE Computer Society.
-
E. D. Houda Lamehamedi, Boleslaw Szymanski and Z. Shentu. Data replication strategies in grid environments. In ICA3PP '02: Proceedings of the Fifth International Conference on Algorithms and Architectures for Parallel Processing, page 378, Washington, DC, USA, 2002. IEEE Computer Society.
-
-
-
-
18
-
-
35048812506
-
Adaptive MPI
-
L. Rauchwerger, editor, Languages and Compilers for Parallel Computing, 16th LCPC'03, of, Springer-Verlag New York, College Station, Texas, USA, Oct, Revised Papers 2004
-
C. Huang, O. S. Lawlor, and L. V. Kale. Adaptive MPI. In L. Rauchwerger, editor, Languages and Compilers for Parallel Computing, (16th LCPC'03), volume 2958 of Lecture Notes in Computer Science (LNCS), pages 306-322. Springer-Verlag (New York), College Station, Texas, USA, Oct. 2003, Revised Papers 2004.
-
(2003)
Lecture Notes in Computer Science
, vol.2958
, pp. 306-322
-
-
Huang, C.1
Lawlor, O.S.2
Kale, L.V.3
-
19
-
-
33751039336
-
-
C. Huang, G. Zheng, L. V. Kalé, and S. Kumar. Performance evaluation of adaptive MPI. In J. Torrellas and S. Chatterjee, editors, PPOPP, pages 12-21. ACM, 2006.
-
C. Huang, G. Zheng, L. V. Kalé, and S. Kumar. Performance evaluation of adaptive MPI. In J. Torrellas and S. Chatterjee, editors, PPOPP, pages 12-21. ACM, 2006.
-
-
-
-
20
-
-
50649097161
-
-
INRIA
-
INRIA. Simgrid project. http://simgrid.gforge.inria.fr.
-
Simgrid project
-
-
-
21
-
-
50649092618
-
-
L. V. Kale. The virtualization approach to parallel programming: Runtime optimization and the state of art. In LACSI, Albuquerque, 2002.
-
L. V. Kale. The virtualization approach to parallel programming: Runtime optimization and the state of art. In LACSI, Albuquerque, 2002.
-
-
-
-
22
-
-
50649090196
-
-
L. V. Kale and S. Krishnan. CHARM++. In G. V. Wilson and P. Lu, editors, Parallel Programming in C++, Scientific and Engineering Computation Series, pages 175-214? MIT Press, Cambridge, MA, 1996. chapter 5.
-
L. V. Kale and S. Krishnan. CHARM++. In G. V. Wilson and P. Lu, editors, Parallel Programming in C++, Scientific and Engineering Computation Series, pages 175-214? MIT Press, Cambridge, MA, 1996. chapter 5.
-
-
-
-
26
-
-
0028060943
-
-
J. S. Plank and K. Li. Faster checkpointing with N +1 parity. In FTCS, pages 288-297, 1994.
-
J. S. Plank and K. Li. Faster checkpointing with N +1 parity. In FTCS, pages 288-297, 1994.
-
-
-
-
27
-
-
50649124646
-
-
L. Rilling and C. Morin. A practical transparent data sharing service for the grid. In Proc. Fifth International Workshop on Distributed Shared Memory (DSM 2005), Cardiff, UK, May 2005. Held in conjunction with CCGrid 2005.
-
L. Rilling and C. Morin. A practical transparent data sharing service for the grid. In Proc. Fifth International Workshop on Distributed Shared Memory (DSM 2005), Cardiff, UK, May 2005. Held in conjunction with CCGrid 2005.
-
-
-
-
29
-
-
33845399711
-
-
S. S. Vazhkudai, X. Ma, X. Ma, V. W. Freeh, J. W. Strickland, J.W. Strickland, N. Tammineedi, and S. L. Scott. Freeloader: Scavenging desktop storage resources for scientific data. In SC '05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing, page 56, Washington, DC, USA, 2005. IEEE Computer Society.
-
S. S. Vazhkudai, X. Ma, X. Ma, V. W. Freeh, J. W. Strickland, J.W. Strickland, N. Tammineedi, and S. L. Scott. Freeloader: Scavenging desktop storage resources for scientific data. In SC '05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing, page 56, Washington, DC, USA, 2005. IEEE Computer Society.
-
-
-
-
30
-
-
20444463494
-
-
G. Zheng, L. Shi, and L. V. Kale. Ftc-charm++: an in-memory checkpoint-based fault tolerant runtime for charm++ and mpi. In CLUSTER '04: Proceedings of the 2004 IEEE International Conference on Cluster Computing, pages 93-103, Washington, DC, USA, 2004. IEEE Computer Society.
-
G. Zheng, L. Shi, and L. V. Kale. Ftc-charm++: an in-memory checkpoint-based fault tolerant runtime for charm++ and mpi. In CLUSTER '04: Proceedings of the 2004 IEEE International Conference on Cluster Computing, pages 93-103, Washington, DC, USA, 2004. IEEE Computer Society.
-
-
-
|