메뉴 건너뛰기




Volumn , Issue , 2005, Pages 812-821

Modeling coordinated checkpointing for large-scale supercomputers

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER NETWORKS; COMPUTER SIMULATION; FAILURE ANALYSIS; MATHEMATICAL MODELS; NETWORK PROTOCOLS; RANDOM PROCESSES; RELIABILITY THEORY; SECURITY OF DATA; SYSTEMS ANALYSIS;

EID: 27544513113     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (49)

References (25)
  • 2
    • 0003820750 scopus 로고    scopus 로고
    • An overview of checkpointing in uniprocessor and distributed systems, focusing on implementation and performance
    • J. S. Plank, "An Overview of Checkpointing in Uniprocessor and Distributed Systems, Focusing on Implementation and Performance," Technical Report of University of Tennessee, UT-CS, 1997.
    • (1997) Technical Report of University of Tennessee , vol.UT-CS
    • Plank, J.S.1
  • 3
    • 0022020346 scopus 로고
    • Distributed snapshots: Determinining global states of distributed systems
    • M. Chandy, L. Lamport. "Distributed Snapshots: Determinining Global States of Distributed Systems," ACM Trans. on Computing Systems, 3(1), 1985.
    • (1985) ACM Trans. on Computing Systems , vol.3 , Issue.1
    • Chandy, M.1    Lamport, L.2
  • 4
    • 0023090161 scopus 로고
    • Checkpointing and recovery rollback for distributed systems
    • R. Koo, S. Toueg, "Checkpointing and Recovery Rollback for Distributed Systems," IEEE Trans. on Software Engineering, Vol. SE-13, No.1, 1987.
    • (1987) IEEE Trans. on Software Engineering , vol.SE-13 , Issue.1
    • Koo, R.1    Toueg, S.2
  • 6
    • 0026869241 scopus 로고
    • Analysis and modeling of correlated failures in multicomputer systems
    • D. Tang, R. K. Iyer, "Analysis and Modeling of Correlated Failures in Multicomputer Systems," IEEE Trans. on Computers, Vol. 41, Num. 5, 1992.
    • (1992) IEEE Trans. on Computers , vol.41 , Issue.5
    • Tang, D.1    Iyer, R.K.2
  • 7
    • 84976846528 scopus 로고
    • A first order approximation to the optimum checkpoint interval
    • J. W. Young, "A First Order Approximation to the Optimum Checkpoint Interval," Communications of the ACM, Vol. 17, Num. 9, 1974.
    • (1974) Communications of the ACM , vol.17 , Issue.9
    • Young, J.W.1
  • 10
    • 0032597646 scopus 로고    scopus 로고
    • The average availability of parallel checkpointing systems and its importance in selecting runtime parameters
    • J. S. Plank, M. G. Thomason, "The Average Availability of Parallel Checkpointing Systems and Its Importance in Selecting Runtime Parameters," IEEE Proc. Int'l Symp. on fault-Tolerant Computing, 1999.
    • (1999) IEEE Proc. Int'l Symp. on Fault-tolerant Computing
    • Plank, J.S.1    Thomason, M.G.2
  • 11
    • 9144223280 scopus 로고    scopus 로고
    • Checkpointing for peta-scale systems: A look into the future of practical rollback-recovery
    • E. N. Elnozahy, J. S. Plank, W. K. Fuchs, "Checkpointing for Peta-Scale Systems: A Look into the Future of Practical Rollback-Recovery," IEEE Trans. on Dependable and Secure Computing, Vol. 1, Num. 2, 2004.
    • (2004) IEEE Trans. on Dependable and Secure Computing , vol.1 , Issue.2
    • Elnozahy, E.N.1    Plank, J.S.2    Fuchs, W.K.3
  • 13
    • 0025467711 scopus 로고
    • A bridging model for parallel computation
    • L. G. Valiant, "A Bridging Model for Parallel Computation" Communications of the ACM, Vol. 33, 1990
    • (1990) Communications of the ACM , vol.33
    • Valiant, L.G.1
  • 15
    • 0036504721 scopus 로고    scopus 로고
    • Models of parallel applications with large computation and I/O requirements
    • E. Rosti, et al., "Models of Parallel Applications with Large Computation and I/O Requirements," IEEE Trans. on Software Engineering, Vol.28, Num. 3, 2002.
    • (2002) IEEE Trans. on Software Engineering , vol.28 , Issue.3
    • Rosti, E.1
  • 20
    • 0022734032 scopus 로고
    • A measurement-based model for workload dependence of CPU errors
    • R. Iyer, D. Rossetti, "A Measurement-based Model for Workload Dependence of CPU Errors," IEEE Trans. on Computers, Vol. C-35, 1986.
    • (1986) IEEE Trans. on Computers , vol.C-35
    • Iyer, R.1    Rossetti, D.2
  • 22
    • 0033314330 scopus 로고    scopus 로고
    • IBM S/390 parallel enterprise server G5 fault tolerance: A historical perspective
    • L. Spainhower, T. A. Gregg, "IBM S/390 Parallel Enterprise Server G5 Fault Tolerance: A Historical Perspective," IBM Journal of Research and Development, Vol. 43, Num. 5/6, 1999.
    • (1999) IBM Journal of Research and Development , vol.43 , Issue.5-6
    • Spainhower, L.1    Gregg, T.A.2
  • 25
    • 27544480041 scopus 로고    scopus 로고
    • Modeling coordinated checkpointing for large-scale supercomputers
    • L. Wang et al., "Modeling Coordinated Checkpointing for Large-Scale Supercomputers," Technical Report of University of Illinois, 2005.
    • (2005) Technical Report of University of Illinois
    • Wang, L.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.