-
4
-
-
84862137426
-
Saving energy in the LU factorization with partial pivoting on multi-core processors
-
P. Alonso, M. F. Dolz, F. D. Igual, R. Mayo, and E. S. Quintana-Orti. Saving energy in the LU factorization with partial pivoting on multi-core processors. In Proc. PDP, pages 353-358, 2012.
-
(2012)
Proc. PDP
, pp. 353-358
-
-
Alonso, P.1
Dolz, M.F.2
Igual, F.D.3
Mayo, R.4
Quintana-Orti, E.S.5
-
5
-
-
33751022826
-
Collective communication on architectures that support simultaneous communication over multiple links
-
E. Chan, R. van de Geijn, W. Gropp, and R. Thakur. Collective communication on architectures that support simultaneous communication over multiple links. In Proc. PPoPP, pages 2-11, 2006.
-
(2006)
Proc. PPoPP
, pp. 2-11
-
-
Chan, E.1
Van De Geijn, R.2
Gropp, W.3
Thakur, R.4
-
6
-
-
33746318690
-
Reducing power with performance constraints for parallel sparse applications
-
G. Chen, K. Malkowski, M. Kandemir, and P. Raghavan. Reducing power with performance constraints for parallel sparse applications. In Proc. IPDPS, pages 1-8, 2005.
-
(2005)
Proc. IPDPS
, pp. 1-8
-
-
Chen, G.1
Malkowski, K.2
Kandemir, M.3
Raghavan, P.4
-
7
-
-
0030683462
-
A new parallel matrix multiplication algorithm on distributed-memory concurrent computers
-
J. Choi. A new parallel matrix multiplication algorithm on distributed-memory concurrent computers. In Proc. HPC-Asia, pages 224-229, 1997.
-
(1997)
Proc. HPC-Asia
, pp. 224-229
-
-
Choi, J.1
-
8
-
-
0030244536
-
The design and implementation of the SCALAPACK LU, QR and cholesky factorization routines
-
August
-
J. Choi, J. J. Dongarra, L. S. Ostrouchov, A. P. Petitet, D. W. Walker, and R. C. Whaley. The design and implementation of the ScaLAPACK LU, QR and Cholesky factorization routines. Scientific Programming, 5(3):173-184, August 1996.
-
(1996)
Scientific Programming
, vol.5
, Issue.3
, pp. 173-184
-
-
Choi, J.1
Dongarra, J.J.2
Ostrouchov, L.S.3
Petitet, A.P.4
Walker, D.W.5
Whaley, R.C.6
-
9
-
-
32844458895
-
Automatic generation and tuning of MPI collective communication routines
-
A. Faraj and X. Yuan. Automatic generation and tuning of MPI collective communication routines. In Proc. ICS, pages 393-402, 2005.
-
(2005)
Proc. ICS
, pp. 393-402
-
-
Faraj, A.1
Yuan, X.2
-
10
-
-
31844450952
-
Using multiple energy gears in MPI programs on a power scalable cluster
-
V. W. Freeh and D. K. Lowenthal. Using multiple energy gears in MPI programs on a powerscalable cluster. In Proc. PPoPP, pages 164-173, 2005.
-
(2005)
Proc. PPoPP
, pp. 164-173
-
-
Freeh, V.W.1
Lowenthal, D.K.2
-
11
-
-
77950629423
-
Power pack: Energy profiling and analysis of high-performance systems and applications
-
May
-
R. Ge, X. Feng, S. Song, H.-C. Chang, D. Li, and K. W. Cameron. PowerPack: Energy profiling and analysis of high-performance systems and applications. IEEE Trans. Parallel Distrib. Syst., 21(5):658-671, May 2010.
-
(2010)
IEEE Trans. Parallel Distrib. Syst.
, vol.21
, Issue.5
, pp. 658-671
-
-
Ge, R.1
Feng, X.2
Song, S.3
Chang, H.-C.4
Li, D.5
Cameron, K.W.6
-
12
-
-
34548012303
-
A power-aware run-time system for high-performance computing
-
C.-H. Hsu and W.-C. Feng. A power-aware run-time system for high-performance computing. In Proc. SC, page 1, 2005.
-
(2005)
Proc. SC
, pp. 1
-
-
Hsu, C.-H.1
Feng, W.-C.2
-
13
-
-
27144479899
-
Automatic tuning of PDGEMM towards optimal performance
-
S. Hunold and T. Rauber. Automatic tuning of PDGEMM towards optimal performance. In Proc. Euro-Par, pages 837-846, 2005.
-
(2005)
Proc. Euro-Par
, pp. 837-846
-
-
Hunold, S.1
Rauber, T.2
-
14
-
-
33746618548
-
Just in time dynamic voltage scaling: Exploiting inter-node slack to save energy in MPI programs
-
N. Kappiah, V. W. Freeh, and D. K. Lowenthal. Just in time dynamic voltage scaling: Exploiting inter-node slack to save energy in MPI programs. In Proc. SC, page 33, 2005.
-
(2005)
Proc. SC
, pp. 33
-
-
Kappiah, N.1
Freeh, V.W.2
Lowenthal, D.K.3
-
15
-
-
80155187635
-
Optimizing process-to-core mappings for two dimensional broadcast/reduce on multicore architectures
-
C. Karlsson, T. Davies, C. Ding, H. Liu, and Z. Chen. Optimizing process-to-core mappings for two dimensional broadcast/reduce on multicore architectures. In Proc. ICPP, pages 404-413, 2011.
-
(2011)
Proc. ICPP
, pp. 404-413
-
-
Karlsson, C.1
Davies, T.2
Ding, C.3
Liu, H.4
Chen, Z.5
-
16
-
-
0038040084
-
CC-MPI: A compiled communication capable MPI prototype for ethernet switched clusters
-
A. Karwande, X. Yuan, and D. K. Lowenthal. CC-MPI: A compiled communication capable MPI prototype for ethernet switched clusters. In Proc. PPoPP, pages 95-106, 2003.
-
(2003)
Proc. PPoPP
, pp. 95-106
-
-
Karwande, A.1
Yuan, X.2
Lowenthal, D.K.3
-
17
-
-
46049087938
-
Empirical study on reducing energy of parallel programs using slack reclamation by DVFS in a power-scalable high performance cluster
-
H. Kimura, M. Sato, Y. Hotta, T. Boku, and D. Takahashi. Empirical study on reducing energy of parallel programs using slack reclamation by DVFS in a power-scalable high performance cluster. In Proc. CLUSTER, pages 1-10, 2006.
-
(2006)
Proc. CLUSTER
, pp. 1-10
-
-
Kimura, H.1
Sato, M.2
Hotta, Y.3
Boku, T.4
Takahashi, D.5
-
18
-
-
77953990600
-
Hybrid MPI/open MP power-aware computing
-
D. Li, B. R. de Supinski, M. Schulz, K. W. Cameron, and D. S. Nikolopoulos. Hybrid MPI/OpenMP power-aware computing. In Proc. IPDPS, pages 1-12, 2010.
-
(2010)
Proc. IPDPS
, pp. 1-12
-
-
Li, D.1
De Supinski, B.R.2
Schulz, M.3
Cameron, K.W.4
Nikolopoulos, D.S.5
-
19
-
-
0036374185
-
Critical power slope: Understanding the runtime effects of frequency scaling
-
A. Miyoshi, C. Lefurgy, E. V. Hensbergen, R. Rajamony, and R. Rajkumar. Critical power slope: Understanding the runtime effects of frequency scaling. In Proc. ICS, pages 35-44, 2002.
-
(2002)
Proc. ICS
, pp. 35-44
-
-
Miyoshi, A.1
Lefurgy, C.2
Hensbergen, E.V.3
Rajamony, R.4
Rajkumar, R.5
-
20
-
-
84857781512
-
Bounding energy consumption in large-scale MPI programs
-
B. Rountree, D. K. Lowenthal, S. Funk, V. W. Freeh, B. R. de Supinski, and M. Schulz. Bounding energy consumption in large-scale MPI programs. In Proc. SC, pages 1-9, 2007.
-
(2007)
Proc. SC
, pp. 1-9
-
-
Rountree, B.1
Lowenthal, D.K.2
Funk, S.3
Freeh, V.W.4
De Supinski, B.R.5
Schulz, M.6
-
21
-
-
33751054291
-
Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster
-
R. Springer, D. K. Lowenthal, B. Rountree, and V. W. Freeh. Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster. In Proc. PPoPP, pages 230-238, 2006.
-
(2006)
Proc. PPoPP
, pp. 230-238
-
-
Springer, R.1
Lowenthal, D.K.2
Rountree, B.3
Freeh, V.W.4
-
22
-
-
84893617587
-
Improving performance and energy efficiency of matrix multiplication via pipeline broadcast
-
L. Tan, L. Chen, Z. Chen, Z. Zong, R. Ge, and D. Li. Improving performance and energy efficiency of matrix multiplication via pipeline broadcast. In Proc. CLUSTER, pages 1-5, 2013.
-
(2013)
Proc. CLUSTER
, pp. 1-5
-
-
Tan, L.1
Chen, L.2
Chen, Z.3
Zong, Z.4
Ge, R.5
Li, D.6
-
23
-
-
84897784379
-
A2E: Adaptively aggressive energy efficient DVFS scheduling for data intensive applications
-
L. Tan, Z. Chen, Z. Zong, R. Ge, and D. Li. A2E: Adaptively aggressive energy efficient DVFS scheduling for data intensive applications. In Proc. IPCCC, pages 1-10, 2013.
-
(2013)
Proc. IPCCC
, pp. 1-10
-
-
Tan, L.1
Chen, Z.2
Zong, Z.3
Ge, R.4
Li, D.5
-
24
-
-
85029600625
-
Scheduling for reduced CPU energy
-
M. Weiser, B. Welch, A. Demers, and S. Shenker. Scheduling for reduced CPU energy. In Proc. OSDI, page 2, 1994.
-
(1994)
Proc. OSDI
, pp. 2
-
-
Weiser, M.1
Welch, B.2
Demers, A.3
Shenker, S.4
|