메뉴 건너뛰기




Volumn 29, Issue , 2014, Pages 599-613

HP-DAEMON: High performance distributed adaptive energy-efficient matrix-multiplication

Author keywords

Adaptive; Binomial tree broadcast; DPLASMA; DVFS; Energy; Memory aware; Performance; Pipeline broadcast; ScaLAPACK

Indexed keywords

DYNAMIC FREQUENCY SCALING; ENERGY CONSERVATION; HARDWARE; MATRIX ALGEBRA; PIPELINES; TIME SWITCHES; VOLTAGE SCALING;

EID: 84902816979     PISSN: 18770509     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1016/j.procs.2014.05.054     Document Type: Conference Paper
Times cited : (7)

References (24)
  • 4
    • 84862137426 scopus 로고    scopus 로고
    • Saving energy in the LU factorization with partial pivoting on multi-core processors
    • P. Alonso, M. F. Dolz, F. D. Igual, R. Mayo, and E. S. Quintana-Orti. Saving energy in the LU factorization with partial pivoting on multi-core processors. In Proc. PDP, pages 353-358, 2012.
    • (2012) Proc. PDP , pp. 353-358
    • Alonso, P.1    Dolz, M.F.2    Igual, F.D.3    Mayo, R.4    Quintana-Orti, E.S.5
  • 5
    • 33751022826 scopus 로고    scopus 로고
    • Collective communication on architectures that support simultaneous communication over multiple links
    • E. Chan, R. van de Geijn, W. Gropp, and R. Thakur. Collective communication on architectures that support simultaneous communication over multiple links. In Proc. PPoPP, pages 2-11, 2006.
    • (2006) Proc. PPoPP , pp. 2-11
    • Chan, E.1    Van De Geijn, R.2    Gropp, W.3    Thakur, R.4
  • 6
    • 33746318690 scopus 로고    scopus 로고
    • Reducing power with performance constraints for parallel sparse applications
    • G. Chen, K. Malkowski, M. Kandemir, and P. Raghavan. Reducing power with performance constraints for parallel sparse applications. In Proc. IPDPS, pages 1-8, 2005.
    • (2005) Proc. IPDPS , pp. 1-8
    • Chen, G.1    Malkowski, K.2    Kandemir, M.3    Raghavan, P.4
  • 7
    • 0030683462 scopus 로고    scopus 로고
    • A new parallel matrix multiplication algorithm on distributed-memory concurrent computers
    • J. Choi. A new parallel matrix multiplication algorithm on distributed-memory concurrent computers. In Proc. HPC-Asia, pages 224-229, 1997.
    • (1997) Proc. HPC-Asia , pp. 224-229
    • Choi, J.1
  • 9
    • 32844458895 scopus 로고    scopus 로고
    • Automatic generation and tuning of MPI collective communication routines
    • A. Faraj and X. Yuan. Automatic generation and tuning of MPI collective communication routines. In Proc. ICS, pages 393-402, 2005.
    • (2005) Proc. ICS , pp. 393-402
    • Faraj, A.1    Yuan, X.2
  • 10
    • 31844450952 scopus 로고    scopus 로고
    • Using multiple energy gears in MPI programs on a power scalable cluster
    • V. W. Freeh and D. K. Lowenthal. Using multiple energy gears in MPI programs on a powerscalable cluster. In Proc. PPoPP, pages 164-173, 2005.
    • (2005) Proc. PPoPP , pp. 164-173
    • Freeh, V.W.1    Lowenthal, D.K.2
  • 11
    • 77950629423 scopus 로고    scopus 로고
    • Power pack: Energy profiling and analysis of high-performance systems and applications
    • May
    • R. Ge, X. Feng, S. Song, H.-C. Chang, D. Li, and K. W. Cameron. PowerPack: Energy profiling and analysis of high-performance systems and applications. IEEE Trans. Parallel Distrib. Syst., 21(5):658-671, May 2010.
    • (2010) IEEE Trans. Parallel Distrib. Syst. , vol.21 , Issue.5 , pp. 658-671
    • Ge, R.1    Feng, X.2    Song, S.3    Chang, H.-C.4    Li, D.5    Cameron, K.W.6
  • 12
    • 34548012303 scopus 로고    scopus 로고
    • A power-aware run-time system for high-performance computing
    • C.-H. Hsu and W.-C. Feng. A power-aware run-time system for high-performance computing. In Proc. SC, page 1, 2005.
    • (2005) Proc. SC , pp. 1
    • Hsu, C.-H.1    Feng, W.-C.2
  • 13
    • 27144479899 scopus 로고    scopus 로고
    • Automatic tuning of PDGEMM towards optimal performance
    • S. Hunold and T. Rauber. Automatic tuning of PDGEMM towards optimal performance. In Proc. Euro-Par, pages 837-846, 2005.
    • (2005) Proc. Euro-Par , pp. 837-846
    • Hunold, S.1    Rauber, T.2
  • 14
    • 33746618548 scopus 로고    scopus 로고
    • Just in time dynamic voltage scaling: Exploiting inter-node slack to save energy in MPI programs
    • N. Kappiah, V. W. Freeh, and D. K. Lowenthal. Just in time dynamic voltage scaling: Exploiting inter-node slack to save energy in MPI programs. In Proc. SC, page 33, 2005.
    • (2005) Proc. SC , pp. 33
    • Kappiah, N.1    Freeh, V.W.2    Lowenthal, D.K.3
  • 15
    • 80155187635 scopus 로고    scopus 로고
    • Optimizing process-to-core mappings for two dimensional broadcast/reduce on multicore architectures
    • C. Karlsson, T. Davies, C. Ding, H. Liu, and Z. Chen. Optimizing process-to-core mappings for two dimensional broadcast/reduce on multicore architectures. In Proc. ICPP, pages 404-413, 2011.
    • (2011) Proc. ICPP , pp. 404-413
    • Karlsson, C.1    Davies, T.2    Ding, C.3    Liu, H.4    Chen, Z.5
  • 16
    • 0038040084 scopus 로고    scopus 로고
    • CC-MPI: A compiled communication capable MPI prototype for ethernet switched clusters
    • A. Karwande, X. Yuan, and D. K. Lowenthal. CC-MPI: A compiled communication capable MPI prototype for ethernet switched clusters. In Proc. PPoPP, pages 95-106, 2003.
    • (2003) Proc. PPoPP , pp. 95-106
    • Karwande, A.1    Yuan, X.2    Lowenthal, D.K.3
  • 17
    • 46049087938 scopus 로고    scopus 로고
    • Empirical study on reducing energy of parallel programs using slack reclamation by DVFS in a power-scalable high performance cluster
    • H. Kimura, M. Sato, Y. Hotta, T. Boku, and D. Takahashi. Empirical study on reducing energy of parallel programs using slack reclamation by DVFS in a power-scalable high performance cluster. In Proc. CLUSTER, pages 1-10, 2006.
    • (2006) Proc. CLUSTER , pp. 1-10
    • Kimura, H.1    Sato, M.2    Hotta, Y.3    Boku, T.4    Takahashi, D.5
  • 19
    • 0036374185 scopus 로고    scopus 로고
    • Critical power slope: Understanding the runtime effects of frequency scaling
    • A. Miyoshi, C. Lefurgy, E. V. Hensbergen, R. Rajamony, and R. Rajkumar. Critical power slope: Understanding the runtime effects of frequency scaling. In Proc. ICS, pages 35-44, 2002.
    • (2002) Proc. ICS , pp. 35-44
    • Miyoshi, A.1    Lefurgy, C.2    Hensbergen, E.V.3    Rajamony, R.4    Rajkumar, R.5
  • 21
    • 33751054291 scopus 로고    scopus 로고
    • Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster
    • R. Springer, D. K. Lowenthal, B. Rountree, and V. W. Freeh. Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster. In Proc. PPoPP, pages 230-238, 2006.
    • (2006) Proc. PPoPP , pp. 230-238
    • Springer, R.1    Lowenthal, D.K.2    Rountree, B.3    Freeh, V.W.4
  • 22
    • 84893617587 scopus 로고    scopus 로고
    • Improving performance and energy efficiency of matrix multiplication via pipeline broadcast
    • L. Tan, L. Chen, Z. Chen, Z. Zong, R. Ge, and D. Li. Improving performance and energy efficiency of matrix multiplication via pipeline broadcast. In Proc. CLUSTER, pages 1-5, 2013.
    • (2013) Proc. CLUSTER , pp. 1-5
    • Tan, L.1    Chen, L.2    Chen, Z.3    Zong, Z.4    Ge, R.5    Li, D.6
  • 23
    • 84897784379 scopus 로고    scopus 로고
    • A2E: Adaptively aggressive energy efficient DVFS scheduling for data intensive applications
    • L. Tan, Z. Chen, Z. Zong, R. Ge, and D. Li. A2E: Adaptively aggressive energy efficient DVFS scheduling for data intensive applications. In Proc. IPCCC, pages 1-10, 2013.
    • (2013) Proc. IPCCC , pp. 1-10
    • Tan, L.1    Chen, Z.2    Zong, Z.3    Ge, R.4    Li, D.5


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.