SCOPUS 정보 검색 플랫폼

Procedia Computer Science

Volumn 29, Issue , 2014, Pages 599-613

HP-DAEMON: High performance distributed adaptive energy-efficient matrix-multiplication

(6) Tan, Li a Chen, Longxiang a Chen, Zizhong a Zong, Ziliang b Ge, Rong c Li, Dong d

a UNIVERSITY OF CALIFORNIA (United States)

b TEXAS STATE UNIVERSITY (United States)

c MARQUETTE UNIVERSITY (United States)

d OAK RIDGE NATIONAL LABORATORY (United States)

Author keywords

Adaptive; Binomial tree broadcast; DPLASMA; DVFS; Energy; Memory aware; Performance; Pipeline broadcast; ScaLAPACK

Indexed keywords

DYNAMIC FREQUENCY SCALING; ENERGY CONSERVATION; HARDWARE; MATRIX ALGEBRA; PIPELINES; TIME SWITCHES; VOLTAGE SCALING;

ADAPTIVE; BINOMIAL TREE; DPLASMA; DVFS; ENERGY; MEMORY AWARE; PERFORMANCE; SCALAPACK;

ENERGY EFFICIENCY;

EID: 84902816979 PISSN: 18770509 EISSN: None Source Type: Conference Proceeding
DOI: 10.1016/j.procs.2014.05.054 Document Type: Conference Paper

Times cited : (7)

References (24)

1
- 84870413589
- Automatically Tuned Linear Algebra Software (ATLAS). http://math-atlas. sourceforge.net/.
- Automatically Tuned Linear Algebra Software (ATLAS)

2
- 84902778071
- DPLASMA: Distributed Parallel Linear Algebra Software for Multicore Architectures. http://icl.cs.utk.edu/dplasma/.
- DPLASMA: Distributed Parallel Linear Algebra Software for Multicore Architectures

3
- 84892562689
- ScaLAPACK - Scalable Linear Algebra PACKage. http://www.netlib.org/ scalapack/.
- ScaLAPACK - Scalable Linear Algebra PACKage

4
- 84862137426
- Saving energy in the LU factorization with partial pivoting on multi-core processors
- P. Alonso, M. F. Dolz, F. D. Igual, R. Mayo, and E. S. Quintana-Orti. Saving energy in the LU factorization with partial pivoting on multi-core processors. In Proc. PDP, pages 353-358, 2012.
- (2012) Proc. PDP , pp. 353-358
- Alonso, P.¹ Dolz, M.F.² Igual, F.D.³ Mayo, R.⁴ Quintana-Orti, E.S.⁵

5
- 33751022826
- Collective communication on architectures that support simultaneous communication over multiple links
- E. Chan, R. van de Geijn, W. Gropp, and R. Thakur. Collective communication on architectures that support simultaneous communication over multiple links. In Proc. PPoPP, pages 2-11, 2006.
- (2006) Proc. PPoPP , pp. 2-11
- Chan, E.¹ Van De Geijn, R.² Gropp, W.³ Thakur, R.⁴

6
- 33746318690
- Reducing power with performance constraints for parallel sparse applications
- G. Chen, K. Malkowski, M. Kandemir, and P. Raghavan. Reducing power with performance constraints for parallel sparse applications. In Proc. IPDPS, pages 1-8, 2005.
- (2005) Proc. IPDPS , pp. 1-8
- Chen, G.¹ Malkowski, K.² Kandemir, M.³ Raghavan, P.⁴

7
- 0030683462
- A new parallel matrix multiplication algorithm on distributed-memory concurrent computers
- J. Choi. A new parallel matrix multiplication algorithm on distributed-memory concurrent computers. In Proc. HPC-Asia, pages 224-229, 1997.
- (1997) Proc. HPC-Asia , pp. 224-229
- Choi, J.¹

8
- 0030244536
- The design and implementation of the SCALAPACK LU, QR and cholesky factorization routines
- August
- J. Choi, J. J. Dongarra, L. S. Ostrouchov, A. P. Petitet, D. W. Walker, and R. C. Whaley. The design and implementation of the ScaLAPACK LU, QR and Cholesky factorization routines. Scientific Programming, 5(3):173-184, August 1996.
- (1996) Scientific Programming , vol.5 , Issue.3 , pp. 173-184
- Choi, J.¹ Dongarra, J.J.² Ostrouchov, L.S.³ Petitet, A.P.⁴ Walker, D.W.⁵ Whaley, R.C.⁶

9
- 32844458895
- Automatic generation and tuning of MPI collective communication routines
- A. Faraj and X. Yuan. Automatic generation and tuning of MPI collective communication routines. In Proc. ICS, pages 393-402, 2005.
- (2005) Proc. ICS , pp. 393-402
- Faraj, A.¹ Yuan, X.²

10
- 31844450952
- Using multiple energy gears in MPI programs on a power scalable cluster
- V. W. Freeh and D. K. Lowenthal. Using multiple energy gears in MPI programs on a powerscalable cluster. In Proc. PPoPP, pages 164-173, 2005.
- (2005) Proc. PPoPP , pp. 164-173
- Freeh, V.W.¹ Lowenthal, D.K.²

11
- 77950629423
- Power pack: Energy profiling and analysis of high-performance systems and applications
- May
- R. Ge, X. Feng, S. Song, H.-C. Chang, D. Li, and K. W. Cameron. PowerPack: Energy profiling and analysis of high-performance systems and applications. IEEE Trans. Parallel Distrib. Syst., 21(5):658-671, May 2010.
- (2010) IEEE Trans. Parallel Distrib. Syst. , vol.21 , Issue.5 , pp. 658-671
- Ge, R.¹ Feng, X.² Song, S.³ Chang, H.-C.⁴ Li, D.⁵ Cameron, K.W.⁶

12
- 34548012303
- A power-aware run-time system for high-performance computing
- C.-H. Hsu and W.-C. Feng. A power-aware run-time system for high-performance computing. In Proc. SC, page 1, 2005.
- (2005) Proc. SC , pp. 1
- Hsu, C.-H.¹ Feng, W.-C.²

13
- 27144479899
- Automatic tuning of PDGEMM towards optimal performance
- S. Hunold and T. Rauber. Automatic tuning of PDGEMM towards optimal performance. In Proc. Euro-Par, pages 837-846, 2005.
- (2005) Proc. Euro-Par , pp. 837-846
- Hunold, S.¹ Rauber, T.²

14
- 33746618548
- Just in time dynamic voltage scaling: Exploiting inter-node slack to save energy in MPI programs
- N. Kappiah, V. W. Freeh, and D. K. Lowenthal. Just in time dynamic voltage scaling: Exploiting inter-node slack to save energy in MPI programs. In Proc. SC, page 33, 2005.
- (2005) Proc. SC , pp. 33
- Kappiah, N.¹ Freeh, V.W.² Lowenthal, D.K.³

15
- 80155187635
- Optimizing process-to-core mappings for two dimensional broadcast/reduce on multicore architectures
- C. Karlsson, T. Davies, C. Ding, H. Liu, and Z. Chen. Optimizing process-to-core mappings for two dimensional broadcast/reduce on multicore architectures. In Proc. ICPP, pages 404-413, 2011.
- (2011) Proc. ICPP , pp. 404-413
- Karlsson, C.¹ Davies, T.² Ding, C.³ Liu, H.⁴ Chen, Z.⁵

16
- 0038040084
- CC-MPI: A compiled communication capable MPI prototype for ethernet switched clusters
- A. Karwande, X. Yuan, and D. K. Lowenthal. CC-MPI: A compiled communication capable MPI prototype for ethernet switched clusters. In Proc. PPoPP, pages 95-106, 2003.
- (2003) Proc. PPoPP , pp. 95-106
- Karwande, A.¹ Yuan, X.² Lowenthal, D.K.³

17
- 46049087938
- Empirical study on reducing energy of parallel programs using slack reclamation by DVFS in a power-scalable high performance cluster
- H. Kimura, M. Sato, Y. Hotta, T. Boku, and D. Takahashi. Empirical study on reducing energy of parallel programs using slack reclamation by DVFS in a power-scalable high performance cluster. In Proc. CLUSTER, pages 1-10, 2006.
- (2006) Proc. CLUSTER , pp. 1-10
- Kimura, H.¹ Sato, M.² Hotta, Y.³ Boku, T.⁴ Takahashi, D.⁵

18
- 77953990600
- Hybrid MPI/open MP power-aware computing
- D. Li, B. R. de Supinski, M. Schulz, K. W. Cameron, and D. S. Nikolopoulos. Hybrid MPI/OpenMP power-aware computing. In Proc. IPDPS, pages 1-12, 2010.
- (2010) Proc. IPDPS , pp. 1-12
- Li, D.¹ De Supinski, B.R.² Schulz, M.³ Cameron, K.W.⁴ Nikolopoulos, D.S.⁵

19
- 0036374185
- Critical power slope: Understanding the runtime effects of frequency scaling
- A. Miyoshi, C. Lefurgy, E. V. Hensbergen, R. Rajamony, and R. Rajkumar. Critical power slope: Understanding the runtime effects of frequency scaling. In Proc. ICS, pages 35-44, 2002.
- (2002) Proc. ICS , pp. 35-44
- Miyoshi, A.¹ Lefurgy, C.² Hensbergen, E.V.³ Rajamony, R.⁴ Rajkumar, R.⁵

20
- 84857781512
- Bounding energy consumption in large-scale MPI programs
- B. Rountree, D. K. Lowenthal, S. Funk, V. W. Freeh, B. R. de Supinski, and M. Schulz. Bounding energy consumption in large-scale MPI programs. In Proc. SC, pages 1-9, 2007.
- (2007) Proc. SC , pp. 1-9
- Rountree, B.¹ Lowenthal, D.K.² Funk, S.³ Freeh, V.W.⁴ De Supinski, B.R.⁵ Schulz, M.⁶

21
- 33751054291
- Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster
- R. Springer, D. K. Lowenthal, B. Rountree, and V. W. Freeh. Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster. In Proc. PPoPP, pages 230-238, 2006.
- (2006) Proc. PPoPP , pp. 230-238
- Springer, R.¹ Lowenthal, D.K.² Rountree, B.³ Freeh, V.W.⁴

22
- 84893617587
- Improving performance and energy efficiency of matrix multiplication via pipeline broadcast
- L. Tan, L. Chen, Z. Chen, Z. Zong, R. Ge, and D. Li. Improving performance and energy efficiency of matrix multiplication via pipeline broadcast. In Proc. CLUSTER, pages 1-5, 2013.
- (2013) Proc. CLUSTER , pp. 1-5
- Tan, L.¹ Chen, L.² Chen, Z.³ Zong, Z.⁴ Ge, R.⁵ Li, D.⁶

23
- 84897784379
- A2E: Adaptively aggressive energy efficient DVFS scheduling for data intensive applications
- L. Tan, Z. Chen, Z. Zong, R. Ge, and D. Li. A2E: Adaptively aggressive energy efficient DVFS scheduling for data intensive applications. In Proc. IPCCC, pages 1-10, 2013.
- (2013) Proc. IPCCC , pp. 1-10
- Tan, L.¹ Chen, Z.² Zong, Z.³ Ge, R.⁴ Li, D.⁵

24
- 85029600625
- Scheduling for reduced CPU energy
- M. Weiser, B. Welch, A. Demers, and S. Shenker. Scheduling for reduced CPU energy. In Proc. OSDI, page 2, 1994.
- (1994) Proc. OSDI , pp. 2
- Weiser, M.¹ Welch, B.² Demers, A.³ Shenker, S.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.