메뉴 건너뛰기




Volumn , Issue , 2013, Pages

A 'cool' way of improving the reliability of HPC machines

Author keywords

Actionable modeling; Checkpointing restart; Energy minimization; Fault tolerance; Load balancing; Temperature capping; Temperature thresholds; Thermal control

Indexed keywords

ENERGY UTILIZATION; FAULT TOLERANCE; HARDWARE; PARALLEL ARCHITECTURES; RESOURCE ALLOCATION; SUPERCOMPUTERS;

EID: 84899668006     PISSN: 21674329     EISSN: 21674337     Source Type: Conference Proceeding    
DOI: 10.1145/2503210.2503228     Document Type: Conference Paper
Times cited : (24)

References (39)
  • 3
    • 34247594029 scopus 로고    scopus 로고
    • Alternating cold and hot aisles provides more reliable cooling for server farms
    • R. F. Sullivan, \Alternating cold and hot aisles provides more reliable cooling for server farms," White Paper, Uptime Institute, 2000.
    • (2000) White Paper, Uptime Institute
    • Sullivan, R.F.1
  • 5
    • 84899696017 scopus 로고    scopus 로고
    • R. American Society of Heating and A.-C. Engineers
    • R. American Society of Heating and A.-C. Engineers, \2008 ashrae environmental guidelines for datacom equipment. " [Online]. Available: http: //tc99. ashraetcs. org/documents/ASHRAE Extended Environmental Envelope Final Aug 1 2008. pdf
    • 2008 ashrae environmental guidelines for datacom equipment
  • 10
    • 85024266067 scopus 로고    scopus 로고
    • 7. New York, NY, USA: ACM, Oct. 2003
    • W.-c. Feng, \Making a case for efficient supercomputing," vol. 1, no. 7. New York, NY, USA: ACM, Oct. 2003, pp. 54-64. [Online]. Available: http://doi. acm. org/10. 1145/957717. 957772
    • Making a case for efficient supercomputing , vol.1 , pp. 54-64
    • Feng, W.-C.1
  • 11
    • 34548715722 scopus 로고    scopus 로고
    • The importance of being low power in high-performance computing
    • August
    • The Importance of Being Low Power in High-Performance Computing," Cyberinfrastructure Technology Watch Quarterly (CTWatch Quarterly), vol. 1, no. 3, August 2005.
    • (2005) Cyberinfrastructure Technology Watch Quarterly (CTWatch Quarterly) , vol.1 , Issue.3
  • 12
    • 84899703545 scopus 로고    scopus 로고
    • Technical ReportDesign Note 002, Ericsson Microelectronics, April Ericsson
    • Ericsson, \Reliability Aspects on Power Supplies," Technical ReportDesign Note 002, Ericsson Microelectronics, April 2000.
    • (2000) Reliability Aspects on Power Supplies
  • 18
    • 28044460018 scopus 로고    scopus 로고
    • A higher order estimate of the optimum checkpoint interval for restart dumps
    • J. T. Daly, \A higher order estimate of the optimum checkpoint interval for restart dumps," Future Generation Comp. Syst., vol. 22, no. 3, pp. 303-312, 2006.
    • (2006) Future Generation Comp. Syst. , vol.22 , Issue.3 , pp. 303-312
    • Daly, J.T.1
  • 19
    • 84976846528 scopus 로고
    • A rst order approximation to the optimal checkpoint interval
    • J. W. Young, \A rst order approximation to the optimal checkpoint interval," Commun. ACM, vol. 17, no. 9, pp. 530-531, 1974.
    • (1974) Commun. ACM , vol.17 , Issue.9 , pp. 530-531
    • Young, J.W.1
  • 21
    • 74049121711 scopus 로고    scopus 로고
    • Berkeley lab checkpoint/restart (blcr) for linux clusters
    • P. H. Hargrove and J. C. Duell, \Berkeley lab checkpoint/restart (blcr) for linux clusters," in SciDAC, 2006.
    • (2006) SciDAC
    • Hargrove, P.H.1    Duell, J.C.2
  • 22
    • 78650831692 scopus 로고    scopus 로고
    • Design, modeling, and evaluation of a scalable multi-level checkpointing system
    • A. Moody, G. Bronevetsky, K. Mohror, and B. R. de Supinski, \Design, modeling, and evaluation of a scalable multi-level checkpointing system," in SC, 2010, pp. 1-11.
    • (2010) SC , pp. 1-11
    • Moody, A.1    Bronevetsky, G.2    Mohror, K.3    De Supinski, B.R.4
  • 23
    • 84899694124 scopus 로고    scopus 로고
    • \Lulesh," http://computation. llnl. gov/casc/ShockHydro/.
    • Lulesh
  • 28
    • 84899672036 scopus 로고    scopus 로고
    • Intel turbo boost technology
    • \Intel turbo boost technology," http://www. intel. com/technology/turboboost/.
  • 39
    • 22944456833 scopus 로고    scopus 로고
    • Lifetime reliability: Toward an architectural solution
    • J. Srinivasan, S. Adve, P. Bose, and J. Rivers, \Lifetime reliability: toward an architectural solution," Micro, IEEE, vol. 25, no. 3, pp. 70-80, 2005.
    • (2005) Micro, IEEE , vol.25 , Issue.3 , pp. 70-80
    • Srinivasan, J.1    Adve, S.2    Bose, P.3    Rivers, J.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.