메뉴 건너뛰기




Volumn 3277, Issue , 2005, Pages 233-252

Performance implications of failures in large-scale cluster scheduling

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER SYSTEM RECOVERY; INFORMATION RETRIEVAL; JOB ANALYSIS; LARGE SCALE SYSTEMS; PARALLEL PROCESSING SYSTEMS; PROBLEM SOLVING;

EID: 23944448107     PISSN: 03029743     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1007/11407522_13     Document Type: Conference Paper
Times cited : (52)

References (44)
  • 1
    • 0035877334 scopus 로고    scopus 로고
    • Scheduling with unexpected machine breakdowns
    • S. Albers and G. Schmidt. Scheduling with unexpected machine breakdowns. Discrete Applied Mathematics, 110(2-3):85-99, 2001.
    • (2001) Discrete Applied Mathematics , vol.110 , Issue.2-3 , pp. 85-99
    • Albers, S.1    Schmidt, G.2
  • 2
    • 1442309284 scopus 로고
    • On the reliability of the ibm mvs/xa operating system
    • S. M. andD. Andrews. On the reliability of the ibm mvs/xa operating system. In IEEE Trans. Software Engineering, volume October, 1987.
    • (1987) IEEE Trans. Software Engineering , vol.OCTOBER
    • M., S.1    Andrews, D.2
  • 3
    • 13944260324 scopus 로고    scopus 로고
    • Workload Characterization of the 1998 World Cup E-Commerce Site
    • Technical Report HP, May
    • M. Arlitt and T. Jin. Workload Characterization of the 1998 World Cup E-Commerce Site. Technical Report Technical Report HPL-1999-62, HP, May 1999.
    • (1999) Technical Report , vol.HPL-1999-62
    • Arlitt, M.1    Jin, T.2
  • 4
    • 0031358595 scopus 로고    scopus 로고
    • Optimal fault-tolerant computing on multiprocess systems
    • J. L. Bruno and E. G. Coffman. Optimal Fault-Tolerant Computing on Multiprocess Systems. Acta Informatica, 34:881-904, 1997.
    • (1997) Acta Informatica , vol.34 , pp. 881-904
    • Bruno, J.L.1    Coffman, E.G.2
  • 8
    • 0003520524 scopus 로고
    • A survey of scheduling in multiprogrammed parallel systems
    • D. Feitelson. A survey of scheduling in multiprogrammed parallel systems. IBM Research Technical Report, RC 19790, 1994.
    • (1994) IBM Research Technical Report , vol.RC 19790
    • Feitelson, D.1
  • 12
    • 0011625222 scopus 로고
    • Time sharing massively parallel machines
    • August
    • B. Gorda and R. Wolski. Time sharing massively parallel machines. In Proc. of ICPP'95. Portland OR, pages 214-217, August 1995.
    • (1995) Proc. of ICPP'95. Portland or , pp. 214-217
    • Gorda, B.1    Wolski, R.2
  • 15
    • 0022209689 scopus 로고
    • Effect of system workload on operating system reliability: A study on ibm 3081
    • R. K. Iyer and D. J. Rossetti. Effect of system workload on operating system reliability: A study on ibm 3081. In IEEE Trans. Software Engineering, volume SE-11, pages 1438-1448, 1985.
    • (1985) IEEE Trans. Software Engineering , vol.SE-11 , pp. 1438-1448
    • Iyer, R.K.1    Rossetti, D.J.2
  • 17
    • 0000412757 scopus 로고    scopus 로고
    • Task allocation algorithms for maximizing reliability of distributed computing systems
    • S. Kartik and C. S. R. Murthy. Task allocation algorithms for maximizing reliability of distributed computing systems. In IEEE Transactions on Computer Systems, volume 46, pages 719-724, 1997.
    • (1997) IEEE Transactions on Computer Systems , vol.46 , pp. 719-724
    • Kartik, S.1    Murthy, C.S.R.2
  • 18
  • 20
    • 0025502686 scopus 로고
    • Error log analysis: Statistical modelling and heuristic trend analysis
    • October
    • T. Y. Lin and D. P. Siewiorek. Error log analysis: Statistical modelling and heuristic trend analysis. IEEE Trans. on Reliability, 39(4):419-432, October 1990.
    • (1990) IEEE Trans. on Reliability , vol.39 , Issue.4 , pp. 419-432
    • Lin, T.Y.1    Siewiorek, D.P.2
  • 21
    • 0035390088 scopus 로고    scopus 로고
    • A variational calculus approach to optimal checkpoint placement
    • July
    • Y. Ling, J. Mi, and X. Lin. A Variational Calculus Approach to Optimal Checkpoint Placement. IEEE Transactions on Computer Systems, 50(7): 699-708, July 2001.
    • (2001) IEEE Transactions on Computer Systems , vol.50 , Issue.7 , pp. 699-708
    • Ling, Y.1    Mi, J.2    Lin, X.3
  • 22
    • 0017532566 scopus 로고
    • Optimal policy for batch operations: Backup, checkpointing, reorganization, and updating
    • G. M. Lohman and J. A. Muckstadt. Optimal Policy for Batch Operations: Backup, Checkpointing, Reorganization, and Updating. ACM Transactions on Database Systems, 2(3):209-222, 1977.
    • (1977) ACM Transactions on Database Systems , vol.2 , Issue.3 , pp. 209-222
    • Lohman, G.M.1    Muckstadt, J.A.2
  • 23
    • 0032598743 scopus 로고    scopus 로고
    • Software fault tolerance in a clustered architecture: Techniques and reliability modeling
    • M. Lyu and V. Mendiratta. Software Fault Tolerance in a Clustered Architecture: Techniques and Reliability Modeling. In Proceedings 1999 IEEE Aerospace Conference, pages 141 -150, 1999.
    • (1999) Proceedings 1999 IEEE Aerospace Conference , pp. 141-150
    • Lyu, M.1    Mendiratta, V.2
  • 26
    • 0035201417 scopus 로고    scopus 로고
    • Processor allocation and checkpoint interval selection in cluster computing systems
    • November
    • J. S. Plank and M. G. Thomason. Processor allocation and checkpoint interval selection in cluster computing systems. Journal of Parallel and Distributed Computing, 61(11):1570-1590, November 2001.
    • (2001) Journal of Parallel and Distributed Computing , vol.61 , Issue.11 , pp. 1570-1590
    • Plank, J.S.1    Thomason, M.G.2
  • 30
    • 0026923304 scopus 로고
    • Task allocation for maximizing reliability of distributed computer systems
    • S. M. Shaltz, J. P. Wang, and M. Goto. Task allocation for maximizing reliability of distributed computer systems. In IEEE Transactions on Computer Systems, volume 41, pages 1156-1168, 1992.
    • (1992) IEEE Transactions on Computer Systems , vol.41 , pp. 1156-1168
    • Shaltz, S.M.1    Wang, J.P.2    Goto, M.3
  • 39
    • 0034832697 scopus 로고    scopus 로고
    • Analysis and implementation of software rejuvenation in cluster systems
    • K. Vaidyanathan, R. E. Harper, S. W. Hunter, and K. S. Trivedi. Analysis and implementation of software rejuvenation in cluster systems. In SIGMETRICS 2001, pages 62-71, 2001.
    • (2001) SIGMETRICS 2001 , pp. 62-71
    • Vaidyanathan, K.1    Harper, R.E.2    Hunter, S.W.3    Trivedi, K.S.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.