메뉴 건너뛰기




Volumn 2006, Issue , 2006, Pages 249-258

A large-scale study of failures in high-performance computing systems

Author keywords

[No Author keywords available]

Indexed keywords

DATA ACQUISITION; DATA REDUCTION; DATA STRUCTURES; FAILURE ANALYSIS; WEIBULL DISTRIBUTION;

EID: 33845593340     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/DSN.2006.5     Document Type: Conference Paper
Times cited : (452)

References (25)
  • 1
    • 33845566561 scopus 로고    scopus 로고
    • http://www.lanl.gov/projects/computerscience/data/
    • The raw data and more information is available at the following two URLs:. http://www.pdl.cmu.edu/FailureData/ and http://www.lanl.gov/projects/ computerscience/data/, 2006.
    • (2006)
  • 2
    • 0019661017 scopus 로고
    • Workload, performance, and reliability of digital computing systems
    • X. Castillo and D. Siewiorek. Workload, performance, and reliability of digital computing systems. In FTCS-11, 1981.
    • (1981) FTCS-11
    • Castillo, X.1    Siewiorek, D.2
  • 4
    • 0025505070 scopus 로고
    • A census of tandem system availability between 1985 and 1990
    • J. Gray. A census of tandem system availability between 1985 and 1990. IEEE Trans. on Reliability, 39(4), 1990.
    • (1990) IEEE Trans. on Reliability , vol.39 , Issue.4
    • Gray, J.1
  • 6
    • 84976815079 scopus 로고
    • Measurement and modeling of computer reliability as affected by system activity
    • R. K. Iyer, D. J. Rossetti, and M. C. Hsueh. Measurement and modeling of computer reliability as affected by system activity. ACM Trans. Comput. Syst., 4(3), 1986.
    • (1986) ACM Trans. Comput. Syst. , vol.4 , Issue.3
    • Iyer, R.K.1    Rossetti, D.J.2    Hsueh, M.C.3
  • 7
    • 0033344278 scopus 로고    scopus 로고
    • Failure data analysis of a LAN of Windows NT based computers
    • M. Kalyanakrishnam, Z. Kalbarczyk, and R. Iyer. Failure data analysis of a LAN of Windows NT based computers. In SRDS-18, 1999.
    • (1999) SRDS-18
    • Kalyanakrishnam, M.1    Kalbarczyk, Z.2    Iyer, R.3
  • 9
    • 0025502686 scopus 로고
    • Error log analysis: Statistical modeling and heuristic trend analysis
    • T.-T. Y. Lin and D. P. Siewiorek. Error log analysis: Statistical modeling and heuristic trend analysis. IEEE Trans. on Reliability, 39, 1990.
    • (1990) IEEE Trans. on Reliability , vol.39
    • Lin, T.-T.Y.1    Siewiorek, D.P.2
  • 10
    • 0029204130 scopus 로고
    • A longitudinal survey of internet host reliability
    • D. Long, A. Muir, and R. Golding. A longitudinal survey of internet host reliability. In SRDS-14, 1995.
    • (1995) SRDS-14
    • Long, D.1    Muir, A.2    Golding, R.3
  • 11
    • 0024132220 scopus 로고
    • Analysis of workload influence on dependability
    • J. Meyer and L. Wei. Analysis of workload influence on dependability. In FTCS, 1988.
    • (1988) FTCS
    • Meyer, J.1    Wei, L.2
  • 13
    • 0029368189 scopus 로고
    • Measuring system and software reliability using an automated data collection process
    • B. Murphy and T. Gent. Measuring system and software reliability using an automated data collection process. Quality and Reliability Engineering International, 11(5), 1995.
    • (1995) Quality and Reliability Engineering International , vol.11 , Issue.5
    • Murphy, B.1    Gent, T.2
  • 15
    • 45749113088 scopus 로고    scopus 로고
    • Modeling machine availability in enterprise and wide-area distributed computing environments
    • D. Nurmi, J. Brevik, and R. Wolski. Modeling machine availability in enterprise and wide-area distributed computing environments. In Euro-Par'05, 2005.
    • (2005) Euro-Par'05
    • Nurmi, D.1    Brevik, J.2    Wolski, R.3
  • 17
    • 85014175705 scopus 로고    scopus 로고
    • Experimental assessment of workstation failures and their impact on checkpointing systems
    • J. S. Plank and W. R. Elwasif. Experimental assessment of workstation failures and their impact on checkpointing systems. In FTCS'98, 1998.
    • (1998) FTCS'98
    • Plank, J.S.1    Elwasif, W.R.2
  • 19
    • 0025693296 scopus 로고
    • Failure analysis and modelling of a VAX cluster system
    • D. Tang, R. K. Iyer, and S. S. Subramani. Failure analysis and modelling of a VAX cluster system. In FTCS, 1990.
    • (1990) FTCS
    • Tang, D.1    Iyer, R.K.2    Subramani, S.S.3
  • 21
    • 84877699694 scopus 로고
    • A case for two-level distributed recovery schemes
    • N. H. Vaidya. A case for two-level distributed recovery schemes. In Proc. of ACM SIGMETRICS, 1995.
    • (1995) Proc. of ACM SIGMETRICS
    • Vaidya, N.H.1
  • 22
    • 0031078972 scopus 로고    scopus 로고
    • Self-similarity through high-variability: Statistical analysis of Ethernet LAN traffic at the source level
    • W. Willinger, M. S. Taqqu, R. Sherman, and D. V. Wilson. Self-similarity through high-variability: statistical analysis of Ethernet LAN traffic at the source level. IEEE/ACM Trans. on Networking, 5(1):71-86, 1997.
    • (1997) IEEE/ACM Trans. on Networking , vol.5 , Issue.1 , pp. 71-86
    • Willinger, W.1    Taqqu, M.S.2    Sherman, R.3    Wilson, D.V.4
  • 23
    • 0030600996 scopus 로고    scopus 로고
    • Checkpointing in distributed computing systems
    • K. F. Wong and M. Franklin. Checkpointing in distributed computing systems. J. Par. Distrib. Comput., 35(1), 1996.
    • (1996) J. Par. Distrib. Comput. , vol.35 , Issue.1
    • Wong, K.F.1    Franklin, M.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.