-
1
-
-
84870629709
-
-
NVIDIA CUDA. http://www.nvidia.com/cuda.
-
NVIDIA CUDA
-
-
-
2
-
-
57349180412
-
A compiler framework for optimization of affine loop nests for GPGPUs
-
M. M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. A compiler framework for optimization of affine loop nests for GPGPUs. In ICS'08: Proceedings of the 22nd Annual International Conference on Supercomputing, pages 225-234, 2008.
-
(2008)
ICS'08: Proceedings of the 22nd Annual International Conference on Supercomputing
, pp. 225-234
-
-
Baskaran, M.M.1
Bondhugula, U.2
Krishnamoorthy, S.3
Ramanujam, J.4
Rountev, A.5
Sadayappan, P.6
-
6
-
-
1642502420
-
Improving effective bandwidth through compiler enhancement of global cache reuse
-
C. Ding and K. Kennedy. Improving effective bandwidth through compiler enhancement of global cache reuse. Journal of Parallel and Distributed Computing, 64(1):108-134, 2004.
-
(2004)
Journal of Parallel and Distributed Computing
, vol.64
, Issue.1
, pp. 108-134
-
-
Ding, C.1
Kennedy, K.2
-
7
-
-
57349184047
-
Fast scan algorithms on graphics processors
-
Y. Dotsenko, N. K. Govindaraju, P. Sloan, C. Boyd, and J. Manferdelli. Fast scan algorithms on graphics processors. In ICS'08: Proceedings of the 22nd Annual International Conference on Supercomputing, pages 205-213, 2008.
-
(2008)
ICS'08: Proceedings of the 22nd Annual International Conference on Supercomputing
, pp. 205-213
-
-
Dotsenko, Y.1
Govindaraju, N.K.2
Sloan, P.3
Boyd, C.4
Manferdelli, J.5
-
8
-
-
47349104432
-
Dynamic warp formation and scheduling for efficient gpu control flow
-
Washington, DC, USA, IEEE Computer Society
-
W. Fung, I. Sham, G. Yuan, and T. Aamodt. Dynamic warp formation and scheduling for efficient gpu control flow. In MICRO '07: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pages 407-420, Washington, DC, USA, 2007. IEEE Computer Society.
-
(2007)
MICRO '07: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
, pp. 407-420
-
-
Fung, W.1
Sham, I.2
Yuan, G.3
Aamodt, T.4
-
12
-
-
70350759823
-
Bandwidth intensive 3-D FFT kernel for GPUs using CUDA
-
A. Nukada, Y. Ogata, T. Endo, and S. Matsuoka. Bandwidth intensive 3-D FFT kernel for GPUs using CUDA. In SC'08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, pages 1-11, 2008.
-
(2008)
SC'08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing
, pp. 1-11
-
-
Nukada, A.1
Ogata, Y.2
Endo, T.3
Matsuoka, S.4
-
13
-
-
79959466764
-
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
-
S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W. W. Hwu. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 73-82, 2008.
-
(2008)
PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
, pp. 73-82
-
-
Ryoo, S.1
Rodrigues, C.I.2
Baghsorkhi, S.S.3
Stone, S.S.4
Kirk, D.B.5
Hwu, W.W.6
-
14
-
-
43449094719
-
Program optimization space pruning for a multithreaded GPU
-
S. Ryoo, C. I. Rodrigues, S. S. Stone, S. S. Baghsorkhi, S. Ueng, J. A. Stratton, and W. W. Hwu. Program optimization space pruning for a multithreaded GPU. In CGO'08: Proceedings of the Sixth Annual IEEE/ACM International Symposium on Code Generation and Optimization, pages 195-204, 2008.
-
(2008)
CGO'08: Proceedings of the Sixth Annual IEEE/ACM International Symposium on Code Generation and Optimization
, pp. 195-204
-
-
Ryoo, S.1
Rodrigues, C.I.2
Stone, S.S.3
Baghsorkhi, S.S.4
Ueng, S.5
Stratton, J.A.6
Hwu, W.W.7
-
15
-
-
56849102474
-
Efficient computation of sum-products on GPUs through software-managed cache
-
June
-
M. Silberstein, A. Schuster, D. Geiger, A. Patney, and J. D. Owens. Efficient computation of sum-products on GPUs through software-managed cache. In Proceedings of the 22nd ACM International Conference on Supercomputing, pages 309-318, June 2008.
-
(2008)
Proceedings of the 22nd ACM International Conference on Supercomputing
, pp. 309-318
-
-
Silberstein, M.1
Schuster, A.2
Geiger, D.3
Patney, A.4
Owens, J.D.5
-
18
-
-
41249094477
-
Lattice boltzmann based pde solver on the gpu
-
Y. Zhao. Lattice boltzmann based pde solver on the gpu. The Visual Computer, (5):323-333, 2008.
-
(2008)
The Visual Computer
, Issue.5
, pp. 323-333
-
-
Zhao, Y.1
|