-
2
-
-
80052306159
-
Correlated set coordination in fault tolerant message logging protocols
-
In, Springer, September;, DOI:
-
Bouteiller A, Hérault T, Bosilca G, Dongarra JJ,. Correlated set coordination in fault tolerant message logging protocols. In Euro-par 2011 Parallel Processing-17th International Conference, Proceedings, Part II, Lecture Notes in Computer Science, Vol. 6853. Springer, September 2011; 51-64, DOI: http://dx.doi.org/10.1007/978-3-642-23397-56.
-
(2011)
Euro-par 2011 Parallel Processing - 17th International Conference, Proceedings, Part II, Lecture Notes in Computer Science
, vol.6853
, pp. 51-64
-
-
Bouteiller, A.1
Hérault, T.2
Bosilca, G.3
Dongarra, J.J.4
-
3
-
-
0017996760
-
TIME, Clocks, and the ordering of events in a distributed system
-
DOI 10.1145/359545.359563
-
Lamport L,. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM 1978; 21 (7): 558-565. DOI: http://doi.acm.org/10.1145/359545.359563. (Pubitemid 8615486)
-
(1978)
Communications of the ACM
, vol.21
, Issue.7
, pp. 558-565
-
-
Lamport Leslie1
-
6
-
-
0022020346
-
Distributed snapshots: Determining global states of distributed systems
-
DOI 10.1145/214451.214456
-
Chandy KM, Lamport L,. Distributed snapshots: determining global states of distributed systems. Transactions on computer systems February 1985; 3 (1): 63-75. ACM. (Pubitemid 15597765)
-
(1985)
ACM Transactions on Computer Systems
, vol.3
, Issue.1
, pp. 63-75
-
-
Chandy, K.M.1
Lamport, L.2
-
7
-
-
34548331689
-
Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI Protocols
-
DOI 10.1016/j.future.2007.02.002, PII S0167739X07000258
-
Buntinas D, Coti C, Herault T, Lemarinier P, Pilard L, Rezmerita A, Rodriguez E, Cappello F,. Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI protocols. Future Generation Computer Systems 2008; 24 (1): 73-84. http://www.sciencedirect.com/science/article/B6V06- 4N2KT6H-1/2/00e790651475028977cc3031d9ea3980. (Pubitemid 47337128)
-
(2008)
Future Generation Computer Systems
, vol.24
, Issue.1
, pp. 73-84
-
-
Buntinas, D.1
Coti, C.2
Herault, T.3
Lemarinier, P.4
Pilard, L.5
Rezmerita, A.6
Rodriguez, E.7
Cappello, F.8
-
8
-
-
0003922410
-
-
Ph.D. Thesis, Thesis, Princeton University, June
-
Plank JS,. Efficient checkpointing on MIMD architectures. Ph.D. Thesis, Thesis, Princeton University, June 1993. http://www.cs.utk.edu/~plank/plank/ papers/thesis.html.
-
(1993)
Efficient Checkpointing on MIMD Architectures
-
-
Plank, J.S.1
-
9
-
-
20444435911
-
Improved message logging versus improved coordinated checkpointing for fault tolerant MPI
-
In. IEEE CS Press
-
Lemarinier P, Bouteiller A, Herault T, Krawezik G, Cappello F,. Improved message logging versus improved coordinated checkpointing for fault tolerant MPI. In IEEE International Conference on Cluster Computing. IEEE CS Press, 2004.
-
(2004)
IEEE International Conference on Cluster Computing
-
-
Lemarinier, P.1
Bouteiller, A.2
Herault, T.3
Krawezik, G.4
Cappello, F.5
-
11
-
-
78149231438
-
Dodging the cost of unavoidable memory copies in message logging protocols
-
In, Keller R. Gabriel E. Resch M.M. Dongarra J. (eds). Springer
-
Bosilca G, Bouteiller A, Herault T, Lemarinier P, Dongarra JJ,. Dodging the cost of unavoidable memory copies in message logging protocols. In EuroMPI, Lecture Notes in Computer Science, Vol. 6305, Keller R, Gabriel E, Resch MM, Dongarra J, (eds). Springer, 2010; 189-197.
-
(2010)
EuroMPI, Lecture Notes in Computer Science
, vol.6305
, pp. 189-197
-
-
Bosilca, G.1
Bouteiller, A.2
Herault, T.3
Lemarinier, P.4
Dongarra, J.J.5
-
12
-
-
35048884271
-
Open MPI: Goals, concept, and design of a next generation MPI implementation
-
Gabriel E, Fagg GE, Bosilca G, Angskun T, Dongarra JJ, Squyres JM, Sahay V, Kambadur P, Barrett B, Lumsdaine A, Castain RH, Daniel DJ, Graham RL, Woodall TS,. Open MPI: Goals, concept, and design of a next generation MPI implementation. Proceedings, 11th European PVM/MPI Users' Group Meeting, Budapest, Hungary, September 2004; 97-104.
-
(2004)
Proceedings, 11th European PVM/MPI Users' Group Meeting, Budapest, Hungary, September
, pp. 97-104
-
-
Gabriel, E.1
Fagg, G.E.2
Bosilca, G.3
Angskun, T.4
Dongarra, J.J.5
Squyres, J.M.6
Sahay, V.7
Kambadur, P.8
Barrett, B.9
Lumsdaine, A.10
Castain, R.H.11
Daniel, D.J.12
Graham, R.L.13
Woodall, T.S.14
-
13
-
-
33847171498
-
MPI: A message passing interface
-
The MPI F. In. ACM Press: New York, NY, USA, DOI:
-
The MPI F. MPI: a message passing interface. In Supercomputing '93: Proceedings of the 1993 ACM/IEEE Conference on Supercomputing. ACM Press: New York, NY, USA, 1993; 878-883, DOI: http://doi.acm.org/10.1145/169627.169855.
-
(1993)
Supercomputing '93: Proceedings of the 1993 ACM/IEEE Conference on Supercomputing
, pp. 878-883
-
-
-
14
-
-
0042078549
-
A survey of rollback-recovery protocols in message-passing systems
-
Elnozahy EN, Alvisi L, Wang YM, Johnson DB,. A survey of rollback-recovery protocols in message-passing systems. ACM Comput. Surv. 2002; 34 (3): 375-408.
-
(2002)
ACM Comput. Surv.
, vol.34
, Issue.3
, pp. 375-408
-
-
Elnozahy, E.N.1
Alvisi, L.2
Wang, Y.M.3
Johnson, D.B.4
-
15
-
-
84874105579
-
Reasons to be pessimist or optimist for failure recovery in high performance clusters
-
In, IEEE (ed.)
-
Bouteiller A, Ropars T, Bosilca G, Morin C, Dongarra J,. Reasons to be pessimist or optimist for failure recovery in high performance clusters. In Proceedings of the 2009 IEEE Cluster Conference, IEEE (ed.), September 2009.
-
(2009)
Proceedings of the 2009 IEEE Cluster Conference
-
-
Bouteiller, A.1
Ropars, T.2
Bosilca, G.3
Morin, C.4
Dongarra, J.5
-
17
-
-
0032597670
-
An analysis of communication induced checkpointing
-
In. IEEE CS Press, June
-
Alvisi L, Elnozahy E, Rao S, Husain SA, Mel AD,. An analysis of communication induced checkpointing. In 29th Symposium on Fault-Tolerant Computing (FTCS'99). IEEE CS Press, June 1999.
-
(1999)
29th Symposium on Fault-Tolerant Computing (FTCS'99)
-
-
Alvisi, L.1
Elnozahy, E.2
Rao, S.3
Husain, S.A.4
Mel, A.D.5
-
18
-
-
47249116207
-
Group-based coordinated checkpointing for mpi: A case study on infiniband
-
Gao Q, Huang W, Koop MJ, Panda DK,. Group-based coordinated checkpointing for mpi: a case study on infiniband. Parallel Processing, 2007. ICPP 2007. International Conference on, 2007.
-
(2007)
Parallel Processing, 2007. ICPP 2007. International Conference on
-
-
Gao, Q.1
Huang, W.2
Koop, M.J.3
Panda, D.K.4
-
21
-
-
80052320911
-
Automatic MPI to AMPI program transformation
-
Negara S, Pan KC, Zheng G, Negara N, Johnson RE, Kale LV, Ricker PM,. Automatic MPI to AMPI program transformation. Technical Report 10-09, Parallel Programming Laboratory, March 2010.
-
(2010)
Technical Report 10-09, Parallel Programming Laboratory, March
-
-
Negara, S.1
Pan, K.C.2
Zheng, G.3
Negara, N.4
Johnson, R.E.5
Kale, L.V.6
Ricker, P.M.7
-
22
-
-
77954923590
-
Team-based message logging: Preliminary results
-
Meneses E, Mendes CL, Kalé LV,. Team-based message logging: preliminary results. 3rd Workshop on Resiliency in High Performance Computing (Resilience) in Clusters, Clouds, and Grids (CCGRID 2010), 2010.
-
(2010)
3rd Workshop on Resiliency in High Performance Computing (Resilience) in Clusters, Clouds, and Grids (CCGRID 2010)
-
-
Meneses, E.1
Mendes, C.L.2
Kalé, L.V.3
|