SCOPUS 정보 검색 플랫폼

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Volumn 6067 LNCS, Issue PART 1, 2010, Pages 206-215

A flexible checkpoint/restart model in distributed systems

(4) Bouguerra, Mohamed Slim a,b Gautier, Thierry b Trystram, Denis a Vincent, Jean Marc a

a UNIV GRENOBLE ALPES (France)

b INRIA RHÔNE ALPES (France)

Author keywords

Checkpointing; Fault tolerance; Reliability modeling

Indexed keywords

CHECKPOINT/RESTART; CHECKPOINTING; COMPLETION TIME; COMPUTATIONAL RESOURCES; COMPUTING PLATFORM; COORDINATED CHECKPOINTING; DISTRIBUTED SYSTEMS; FAULT TOLERANCE MECHANISMS; GLOBAL CONSISTENT STATE; LARGE-SCALE APPLICATIONS; MATHEMATICAL ANALYSIS; NEW MODEL; PROCESS FAILURE; RANDOM FAILURES; RELIABILITY MODELING; RELIABILITY PROBLEMS; RELIABLE EXECUTION; SINGLE PROCESSORS; WEIBULL;

DISTRIBUTED COMPUTER SYSTEMS; FAULT TOLERANCE; FAULT TOLERANT COMPUTER SYSTEMS; POISSON DISTRIBUTION; RELIABILITY ANALYSIS; WEIBULL DISTRIBUTION;

QUALITY ASSURANCE;

EID: 77955097389 PISSN: 03029743 EISSN: 16113349 Source Type: Book Series
DOI: 10.1007/978-3-642-14390-8_22 Document Type: Conference Paper

Times cited : (35)

References (14)

1
- 16244423775
- An overview of the bluegene/L supercomputer
- Adiga, N., et al.: An Overview of the BlueGene/L Supercomputer. In: ACM/IEEE 2002 Conference on Supercomputing, p. 60 (2002)
- (2002) ACM/IEEE 2002 Conference on Supercomputing , pp. 60
- Adiga, N.¹

2
- 33845593340
- A large-scale study of failures in high-performance computing systems
- Washington, DC, USA
- Schroeder, B., Gibson, G.A.: A large-scale study of failures in high-performance computing systems. In: DSN 2006: Proceedings of the International Conference on Dependable Systems and Networks, Washington, DC, USA, pp. 249-258 (2006)
- (2006) DSN 2006: Proceedings of the International Conference on Dependable Systems and Networks , pp. 249-258
- Schroeder, B.¹ Gibson, G.A.²

3
- 67349271621
- An analysis of clustered failures on large supercomputing systems
- Hacker, T.J., Romero, F., Carothers, C.D.: An analysis of clustered failures on large supercomputing systems. J. Parallel Distrib. Comput. 69(7), 652-665 (2009)
- (2009) J. Parallel Distrib. Comput. , vol.69 , Issue.7 , pp. 652-665
- Hacker, T.J.¹ Romero, F.² Carothers, C.D.³

4
- 28044460018
- A higher order estimate of the optimum checkpoint interval for restart dumps
- Daly, J.T.: A higher order estimate of the optimum checkpoint interval for restart dumps. Future Generation Computer Systems 22(3), 303-312 (2006)
- (2006) Future Generation Computer Systems , vol.22 , Issue.3 , pp. 303-312
- Daly, J.T.¹

5
- 9144223280
- Checkpointing for peta-scale systems: A look into the future of practical rollback-recovery
- Elnozahy, E.N., Plank, J.S.: Checkpointing for peta-scale systems: A look into the future of practical rollback-recovery. IEEE Trans. Dependable Secur. Comput. 1(2), 97-108 (2004)
- (2004) IEEE Trans. Dependable Secur. Comput. , vol.1 , Issue.2 , pp. 97-108
- Elnozahy, E.N.¹ Plank, J.S.²

6
- 51049108820
- An optimal checkpoint/restart model for a large scale high performance computing system
- Liu, Y., Nassar, R., Leangsuksun, C., Naksinehaboon, N., Paun, M., Scott, S.: An optimal checkpoint/restart model for a large scale high performance computing system. In: IEEE International Symposium on Parallel and Distributed Processing, pp. 1-9 (2008)
- (2008) IEEE International Symposium on Parallel and Distributed Processing , pp. 1-9
- Liu, Y.¹ Nassar, R.² Leangsuksun, C.³ Naksinehaboon, N.⁴ Paun, M.⁵ Scott, S.⁶

7
- 34547424386
- Cooperative checkpointing: A robust approach to large-scale systems reliability
- ACM, New York
- Oliner, A.J., Rudolph, L., Sahoo, R.K.: Cooperative checkpointing: a robust approach to large-scale systems reliability. In: Proceedings of The 20th Annual International Conference on Supercomputing, pp. 14-23. ACM, New York (2006)
- (2006) Proceedings of the 20th Annual International Conference on Supercomputing , pp. 14-23
- Oliner, A.J.¹ Rudolph, L.² Sahoo, R.K.³

8
- 84976846528
- A first order approximation to the optimum checkpoint interval
- Young, J.W.: A first order approximation to the optimum checkpoint interval. ACM Commun. 17(9), 530-531 (1974)
- (1974) ACM Commun. , vol.17 , Issue.9 , pp. 530-531
- Young, J.W.¹

9
- 0022020346
- Distributed snapshots: Determining global states of distributed systems
- Chandy, K.M., Lamport, L.: Distributed snapshots: determining global states of distributed systems. ACM Trans. Comput. Syst. 3(1), 63-75 (1985)
- (1985) ACM Trans. Comput. Syst. , vol.3 , Issue.1 , pp. 63-75
- Chandy, K.M.¹ Lamport, L.²

10
- 77955115718
- A new flexible checkpoint/ restart model
- INRIA
- Bouguerra, M.S., Gautier, T., Trystram, D., Vincent, J.M.: A new flexible checkpoint/ restart model. Technical report, RR-6751, INRIA (2008)
- (2008) Technical Report RR-6751
- Bouguerra, M.S.¹ Gautier, T.² Trystram, D.³ Vincent, J.M.⁴

11
- 0000652719
- Selection of a checkpoint interval in a criticaltask environment
- Geist, R., Reynolds, R., Westall, J.: Selection of a checkpoint interval in a criticaltask environment. IEEE Transactions on Reliability 37, 395-400 (1988)
- (1988) IEEE Transactions on Reliability , vol.37 , pp. 395-400
- Geist, R.¹ Reynolds, R.² Westall, J.³

12
- 0032597646
- The average availability of parallel checkpointing systems and its importance in selecting runtime parameters
- Plank, J.S., Thomason, M.G.: The average availability of parallel checkpointing systems and its importance in selecting runtime parameters. In: 29th International Symposium on Fault-Tolerant Computing, pp. 250-259 (1999)
- (1999) 29th International Symposium on Fault-Tolerant Computing , pp. 250-259
- Plank, J.S.¹ Thomason, M.G.²

13
- 50649087527
- Reliability-aware approach: An incremental checkpoint/restart model in HPC environments
- Naksinehaboon, N., Liu, Y., Leangsuksun, C., Nassar, R., Paun, M., Scott, S.: Reliability-Aware Approach: An Incremental Checkpoint/Restart Model in HPC Environments. In: IEEE International Symposium on Cluster Computing and the Grid, pp. 783-788 (2008)
- (2008) IEEE International Symposium on Cluster Computing and the Grid , pp. 783-788
- Naksinehaboon, N.¹ Liu, Y.² Leangsuksun, C.³ Nassar, R.⁴ Paun, M.⁵ Scott, S.⁶

14
- 2442632432
- John Wiley, Chichester
- Tijms, H.C.: A First Course in Stochastic Models. John Wiley, Chichester (2003)
- (2003) A First Course in Stochastic Models
- Tijms, H.C.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.