-
1
-
-
84869690427
-
-
NVIDIA CUDA
-
NVIDIA CUDA. http://www.nvidia.com/cuda.
-
-
-
-
2
-
-
0037810283
-
Online feedback-directed optimization of Java
-
M. Arnold, M. Hind, and B. G. Ryder. Online feedback-directed optimization of Java. In Proceedings of ACM Conference on Object- Oriented Programming Systems, Languages and Applications, pages 111-129, 2002.
-
(2002)
Proceedings of ACM Conference on Object- Oriented Programming Systems, Languages and Applications
, pp. 111-129
-
-
Arnold, M.1
Hind, M.2
Ryder, B.G.3
-
3
-
-
57349180412
-
-
M. M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. A compiler framework for optimization of affine loop nests for GPGPUs. In ICS'08: Proceedings of the 22nd Annual International Conference on Supercomputing, pages 225-234, 2008.
-
M. M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. A compiler framework for optimization of affine loop nests for GPGPUs. In ICS'08: Proceedings of the 22nd Annual International Conference on Supercomputing, pages 225-234, 2008.
-
-
-
-
4
-
-
0030661485
-
Optimizing matrixmultiply using PHiPAC: A portable, high-performance, ANSI C coding methodology
-
J. Bilmes, K. Asanovic, C.-W. Chin, and J. Demmel. Optimizing matrixmultiply using PHiPAC: A portable, high-performance, ANSI C coding methodology. In Proceedings of the ACM International Conference on Supercomputing, pages 340-347, 1997.
-
(1997)
Proceedings of the ACM International Conference on Supercomputing
, pp. 340-347
-
-
Bilmes, J.1
Asanovic, K.2
Chin, C.-W.3
Demmel, J.4
-
7
-
-
57349184047
-
Fast scan algorithms on graphics processors
-
Y. Dotsenko, N. K. Govindaraju, P. Sloan, C. Boyd, and J. Manferdelli. Fast scan algorithms on graphics processors. In ICS'08: Proceedings of the 22nd Annual International Conference on Supercomputing, pages 205-213, 2008.
-
(2008)
ICS'08: Proceedings of the 22nd Annual International Conference on Supercomputing
, pp. 205-213
-
-
Dotsenko, Y.1
Govindaraju, N.K.2
Sloan, P.3
Boyd, C.4
Manferdelli, J.5
-
8
-
-
20744449792
-
The design and implementation of FFTW3
-
M. Frigo and S. G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2):216-231, 2005.
-
(2005)
Proceedings of the IEEE
, vol.93
, Issue.2
, pp. 216-231
-
-
Frigo, M.1
Johnson, S.G.2
-
12
-
-
1542501019
-
Sparsity: Optimizationframework for sparse matrix kernels
-
Eun-Jin Im, Katherine Yelick, and Richard Vuduc. Sparsity: Optimizationframework for sparse matrix kernels. Int. J. High Perform. Comput. Appl., 18(1):135-158, 2004.
-
(2004)
Int. J. High Perform. Comput. Appl
, vol.18
, Issue.1
, pp. 135-158
-
-
Im, E.-J.1
Yelick, K.2
Vuduc, R.3
-
13
-
-
20744444866
-
Telescoping languages: A system for automatic generation of domain languages
-
Ken Kennedy, Bradley Broom, Arun Chauhan, Rob Fowler, John Garvin, Charles Koelbel, Cheryl McCosh, and John Mellor-Crummey. Telescoping languages: A system for automatic generation of domain languages. Proceedings of the IEEE, 93(2):387-408, 2005.
-
(2005)
Proceedings of the IEEE
, vol.93
, Issue.2
, pp. 387-408
-
-
Kennedy, K.1
Broom, B.2
Chauhan, A.3
Fowler, R.4
Garvin, J.5
Koelbel, C.6
McCosh, C.7
Mellor-Crummey, J.8
-
14
-
-
35048854568
-
-
S. Lee, T. Johnson, and R. Eigenmann. Cetus - an extensible compiler infrastructure for source-to-source transformation. In In Proceedings of the 16th Annual Workshop on Languages and Compilers for Parallel Computing (LCPC), pages 539-553, 2003.
-
S. Lee, T. Johnson, and R. Eigenmann. Cetus - an extensible compiler infrastructure for source-to-source transformation. In In Proceedings of the 16th Annual Workshop on Languages and Compilers for Parallel Computing (LCPC), pages 539-553, 2003.
-
-
-
-
16
-
-
0032676575
-
Efficient incremental run-time specialization for free
-
Atlanta, GA, May
-
R. Marlet, C. Consel, and P. Boinot. Efficient incremental run-time specialization for free. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 281-292, Atlanta, GA, May 1999.
-
(1999)
Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation
, pp. 281-292
-
-
Marlet, R.1
Consel, C.2
Boinot, P.3
-
17
-
-
78651550268
-
Scalable parallel programming with CUDA
-
March/ April
-
John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron. Scalable parallel programming with CUDA. ACM Queue, pages 40-53, March/ April 2008.
-
(2008)
ACM Queue
, pp. 40-53
-
-
Nickolls, J.1
Buck, I.2
Garland, M.3
Skadron, K.4
-
18
-
-
70350759823
-
Bandwidth intensive 3-D FFT kernel for GPUs using CUDA
-
A. Nukada, Y. Ogata, T. Endo, and S. Matsuoka. Bandwidth intensive 3-D FFT kernel for GPUs using CUDA. In SC'08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, pages 1-11, 2008.
-
(2008)
SC'08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing
, pp. 1-11
-
-
Nukada, A.1
Ogata, Y.2
Endo, T.3
Matsuoka, S.4
-
19
-
-
19344368072
-
SPIRAL: Code generation for DSP transforms
-
M. Puschel, J.M.F. Moura, J.R. Johnson, D. Padua, M.M. Veloso, B.W. Singer, Jianxin Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R.W. Johnson, and N. Rizzolo. SPIRAL: code generation for DSP transforms. Proceedings of the IEEE, 93(2):232-275, 2005.
-
(2005)
Proceedings of the IEEE
, vol.93
, Issue.2
, pp. 232-275
-
-
Puschel, M.1
Moura, J.M.F.2
Johnson, J.R.3
Padua, D.4
Veloso, M.M.5
Singer, B.W.6
Xiong, J.7
Franchetti, F.8
Gacic, A.9
Voronenko, Y.10
Chen, K.11
Johnson, R.W.12
Rizzolo, N.13
-
20
-
-
79959466764
-
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
-
S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W. W. Hwu. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 73-82, 2008.
-
(2008)
PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
, pp. 73-82
-
-
Ryoo, S.1
Rodrigues, C.I.2
Baghsorkhi, S.S.3
Stone, S.S.4
Kirk, D.B.5
Hwu, W.W.6
-
21
-
-
43449094719
-
Program optimization space pruning for a multithreaded GPU
-
S. Ryoo, C. I. Rodrigues, S. S. Stone, S. S. Baghsorkhi, S. Ueng, J. A. Stratton, and W. W. Hwu. Program optimization space pruning for a multithreaded GPU. In CGO'08: Proceedings of the Sixth Annual IEEE/ACM International Symposium on Code Generation and Optimization, pages 195-204, 2008.
-
(2008)
CGO'08: Proceedings of the Sixth Annual IEEE/ACM International Symposium on Code Generation and Optimization
, pp. 195-204
-
-
Ryoo, S.1
Rodrigues, C.I.2
Stone, S.S.3
Baghsorkhi, S.S.4
Ueng, S.5
Stratton, J.A.6
Hwu, W.W.7
-
22
-
-
56849102474
-
Efficient computation of sum-products on GPUs through software-managed cache
-
June
-
Mark Silberstein, Assaf Schuster, Dan Geiger, Anjul Patney, and John D. Owens. Efficient computation of sum-products on GPUs through software-managed cache. In Proceedings of the 22nd ACM International Conference on Supercomputing, pages 309-318, June 2008.
-
(2008)
Proceedings of the 22nd ACM International Conference on Supercomputing
, pp. 309-318
-
-
Silberstein, M.1
Schuster, A.2
Geiger, D.3
Patney, A.4
Owens, J.D.5
-
23
-
-
31844454218
-
A framework for adaptive algorithm selection in STAPL
-
N. Thomas, G. Tanase, O. Tkachyshyn, J. Perdue, N. M. Amato, and L. Rauchwerger. A framework for adaptive algorithm selection in STAPL. In Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 277-288, 2005.
-
(2005)
Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
, pp. 277-288
-
-
Thomas, N.1
Tanase, G.2
Tkachyshyn, O.3
Perdue, J.4
Amato, N.M.5
Rauchwerger, L.6
-
25
-
-
0343462141
-
Automated empirical optimizations of software and the ATLAS project
-
R. C. Whaley, A. Petitet, and J. Dongarra. Automated empirical optimizations of software and the ATLAS project. Parallel Computing, 27(1-2):3-35, 2001.
-
(2001)
Parallel Computing
, vol.27
, Issue.1-2
, pp. 3-35
-
-
Whaley, R.C.1
Petitet, A.2
Dongarra, J.3
|