메뉴 건너뛰기




Volumn 3648, Issue , 2005, Pages 675-684

A checkpoint/recovery model for heterogeneous dataflow computations using work-stealing

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTATION THEORY; COSTS; DATA COMMUNICATION SYSTEMS; DISTRIBUTED COMPUTER SYSTEMS; GRAPH THEORY; PROGRAM PROCESSORS;

EID: 27144432456     PISSN: 03029743     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1007/11549468_74     Document Type: Conference Paper
Times cited : (12)

References (21)
  • 4
    • 60449096682 scopus 로고    scopus 로고
    • Mpichv2: A fault tolerant mpi for volatile nodes based on the pessimistic sender based message logging
    • Phoenix, USA
    • A. Bouteiller, F. Cappello, T. Hérault, P. Lemarinier, G. Krawezik, and F. Magniette. Mpichv2: a fault tolerant mpi for volatile nodes based on the pessimistic sender based message logging. In SuperComputing, Phoenix, USA, 2003.
    • (2003) SuperComputing
    • Bouteiller, A.1    Cappello, F.2    Hérault, T.3    Lemarinier, P.4    Krawezik, G.5    Magniette, F.6
  • 5
    • 0042078549 scopus 로고    scopus 로고
    • A survey of rollback-recovery protocols in message-passing systems
    • E. N. Mootaz Elnozahy, L. Alvisi, Y.-M. Wang, and Johnson D. B. A survey of rollback-recovery protocols in message-passing systems. ACM Comput. Surv., 34(3):375-408, 2002.
    • (2002) ACM Comput. Surv. , vol.34 , Issue.3 , pp. 375-408
    • Elnozahy, E.N.M.1    Alvisi, L.2    Wang, Y.-M.3    Johnson, D.B.4
  • 6
    • 20444463494 scopus 로고    scopus 로고
    • Ftc-charm++: An in-memory checkpoint-based fault tolerant runtime for charm++ and mpi
    • San Dieago, CA, September
    • L. V. Kale G. Zheng, L. Shi. Ftc-charm++: An in-memory checkpoint-based fault tolerant runtime for charm++ and mpi. In 2004 IEEE International Conference on Cluster Computing, San Dieago, CA, September 2004.
    • (2004) 2004 IEEE International Conference on Cluster Computing
    • Kale, L.V.1    Zheng, G.2    Shi, L.3
  • 7
    • 84908538816 scopus 로고    scopus 로고
    • Athapascan-1: On-line building data flow graph in a parallel language
    • IEEE, editor, Paris, France, October
    • F. Galilée, J.-L. Roch, G. Cavalheiro, and M. Doreille. Athapascan-1: On-line building data flow graph in a parallel language. In IEEE, editor, PACT'98, pages 88-95, Paris, France, October 1998.
    • (1998) PACT'98 , pp. 88-95
    • Galilée, F.1    Roch, J.-L.2    Cavalheiro, G.3    Doreille, M.4
  • 8
    • 0014477093 scopus 로고
    • Bounds on multiprocessing timing anomalies
    • Ronald L. Graham. Bounds on multiprocessing timing anomalies. SIAM Journal of Applied Mathematics, 17(2):416-429, 1969.
    • (1969) SIAM Journal of Applied Mathematics , vol.17 , Issue.2 , pp. 416-429
    • Graham, R.L.1
  • 10
    • 27144462717 scopus 로고    scopus 로고
    • Using data-flow analysis for resilience and result checking in peer-to-peer computations
    • Zaragoza, Spain, August
    • S. Jafar, S. Varrette, and J.-L. Roch. Using data-flow analysis for resilience and result checking in peer-to-peer computations. In IEEE DEXA'2004, Zaragoza, Spain, August 2004.
    • (2004) IEEE DEXA'2004
    • Jafar, S.1    Varrette, S.2    Roch, J.-L.3
  • 11
    • 0022020346 scopus 로고
    • Distributed snapshots: Determining global states of distributed systems
    • Lamport K. M. Chandy. Distributed snapshots: determining global states of distributed systems. ACM Trans. Comput. Syst., 3(1):63-75, 1985.
    • (1985) ACM Trans. Comput. Syst. , vol.3 , Issue.1 , pp. 63-75
    • Chandy, L.K.M.1
  • 12
    • 0032000230 scopus 로고    scopus 로고
    • Message logging: Pessimistic, optimistic, causal and optimal
    • Transactions on Software Engineering
    • K. Marzullo L. Alvisi. Message logging: Pessimistic, optimistic, causal and optimal. TSE, 24(2): 149-159, 1998. Transactions on Software Engineering.
    • (1998) TSE , vol.24 , Issue.2 , pp. 149-159
    • Marzullo, K.1    Alvisi, L.2
  • 13
    • 0003912256 scopus 로고    scopus 로고
    • Checkpoint and migration of unix processes in the condor distributed processing system
    • Univ. Wisconsin, Madison
    • M. Litzkow, T. Tannenbaum, J. Basney, and M. Livny. Checkpoint and migration of unix processes in the condor distributed processing system. Technical Report CS-TR-97-1346, Univ. Wisconsin, Madison, 1997.
    • (1997) Technical Report , vol.CS-TR-97-1346
    • Litzkow, M.1    Tannenbaum, T.2    Basney, J.3    Livny, M.4
  • 16
    • 0022112420 scopus 로고
    • Optimistic recovery in distributed systems
    • S. Yemini R. Strom. Optimistic recovery in distributed systems. ACM Trans. Comput. Syst., 3(3):204-226, 1985.
    • (1985) ACM Trans. Comput. Syst. , vol.3 , Issue.3 , pp. 204-226
    • Yemini, S.1    Strom, R.2
  • 18
    • 12444281734 scopus 로고    scopus 로고
    • A fault tolerant protocol for massively parallel machines
    • IEEE Press
    • L. V. Kale S. Chakravorty. A fault tolerant protocol for massively parallel machines. In FT-PDS Workshop for IPDPS 2004. IEEE Press, 2004.
    • (2004) FT-PDS Workshop for IPDPS 2004
    • Kale, L.V.1    Chakravorty, S.2
  • 20
    • 0012243052 scopus 로고    scopus 로고
    • Compiler technology for portable checkpoints
    • MIT Laboratory for Computer Science, Cambridge
    • V. Strumpen. Compiler technology for portable checkpoints. Technical Report MA-02139, MIT Laboratory for Computer Science, Cambridge, 1998.
    • (1998) Technical Report , vol.MA-02139
    • Strumpen, V.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.