-
2
-
-
10644248153
-
Brook for GPUs: Stream computing on graphics hardware
-
Aug
-
I. Buck et al. Brook for GPUs: Stream computing on graphics hardware. ACM Transactions on Graphics, 23(3):777-786, Aug. 2004.
-
(2004)
ACM Transactions on Graphics
, vol.23
, Issue.3
, pp. 777-786
-
-
Buck, I.1
-
6
-
-
34547423880
-
Exploiting coarsegrained task, data, and pipeline parallelism in stream programs
-
M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarsegrained task, data, and pipeline parallelism in stream programs. In 14th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 151-162, 2006.
-
(2006)
14th International Conference on Architectural Support for Programming Languages and Operating Systems
, pp. 151-162
-
-
Gordon, M.I.1
Thies, W.2
Amarasinghe, S.3
-
7
-
-
79952593905
-
CnC-CUDA: Declarative programming for GPUs
-
M. Grossman, A. Simion, Z. Budimli, and V. Sarkar. CnC-CUDA: Declarative Programming for GPUs. In Proc. of the 23rd Workshop on Languages and Compilers for Parallel Computing, pages 230-245, 2010.
-
(2010)
Proc. of the 23rd Workshop on Languages and Compilers for Parallel Computing
, pp. 230-245
-
-
Grossman, M.1
Simion, A.2
Budimli, Z.3
Sarkar, V.4
-
9
-
-
78149231331
-
Mapcg: Writing parallel program portable between CPU and GPU
-
C. Hong, D. Chen, W. Chen, W. Zheng, and H. Lin. Mapcg: writing parallel program portable between CPU and GPU. In Proc. of the 19th International Conference on Parallel Architectures and Compilation Techniques, pages 217-226, 2010.
-
(2010)
Proc. of the 19th International Conference on Parallel Architectures and Compilation Techniques
, pp. 217-226
-
-
Hong, C.1
Chen, D.2
Chen, W.3
Zheng, W.4
Lin, H.5
-
11
-
-
77952252026
-
Macross: Macro-simdization of streaming applications
-
A. Hormati, Y. Choi, M. Woh, M. Kudlur, R. Rabbah, T. Mudge, and S. Mahlke. Macross: Macro-simdization of streaming applications. In 18th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 285-296, 2010.
-
(2010)
18th International Conference on Architectural Support for Programming Languages and Operating Systems
, pp. 285-296
-
-
Hormati, A.1
Choi, Y.2
Woh, M.3
Kudlur, M.4
Rabbah, R.5
Mudge, T.6
Mahlke, S.7
-
12
-
-
79953071805
-
Sponge: Portable stream programming on graphics engines
-
A. H. Hormati, M. Samadi, M.Woh, T.Mudge, and S.Mahlke. Sponge: portable stream programming on graphics engines. In 19th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 381-392, 2011.
-
(2011)
19th International Conference on Architectural Support for Programming Languages and Operating Systems
, pp. 381-392
-
-
Hormati, A.H.1
Samadi, M.2
Woh, M.3
Mudge, T.4
Mahlke, S.5
-
15
-
-
77954995885
-
Debunking the 100x GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU
-
V. W. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey. Debunking the 100x GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. In Proc. of the 37th Annual International Symposium on Computer Architecture, pages 451-460, 2010.
-
(2010)
Proc. of the 37th Annual International Symposium on Computer Architecture
, pp. 451-460
-
-
Lee, V.W.1
Kim, C.2
Chhugani, J.3
Deisher, M.4
Kim, D.5
Nguyen, A.D.6
Satish, N.7
Smelyanskiy, M.8
Chennupaty, S.9
Hammarlund, P.10
Singhal, R.11
Dubey, P.12
-
19
-
-
70449723385
-
Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs
-
J. Meng and K. Skadron. Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs. In Proc. of the 2009 International Conference on Supercomputing, pages 256-265, 2009.
-
(2009)
Proc. of the 2009 International Conference on Supercomputing
, pp. 256-265
-
-
Meng, J.1
Skadron, K.2
-
22
-
-
77954709868
-
Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations
-
V. T. Ravi, W.Ma, D. Chiu, and G. Agrawal. Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations. In Proc. of the 2010 International Conference on Supercomputing, pages 137-146, 2010.
-
(2010)
Proc. of the 2010 International Conference on Supercomputing
, pp. 137-146
-
-
Ravi, V.T.1
Ma, W.2
Chiu, D.3
Agrawal, G.4
-
24
-
-
79959466764
-
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
-
S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W. mei W. Hwu. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In Proc. of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 73-82, 2008.
-
(2008)
Proc. of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
, pp. 73-82
-
-
Ryoo, S.1
Rodrigues, C.I.2
Baghsorkhi, S.S.3
Stone, S.S.4
Kirk, D.B.5
Mei, W.6
Hwu, W.7
-
28
-
-
31844454218
-
A framework for adaptive algorithm selection in STAPL
-
Proceedings of the 2005 ACM SIGPLAN Symposium on Principles and Practise of Parallel Programming, PROPP 05
-
N. Thomas, G. Tanase, O. Tkachyshyn, J. Perdue, N. M. Amato, and L. Rauchwerger. A framework for adaptive algorithm selection in stapl. In Proc. of the 10th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 277-288, 2005. (Pubitemid 43182854)
-
(2005)
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP
, pp. 277-288
-
-
Thomas, N.1
Tanase, G.2
Tkachyshyn, O.3
Perdue, J.4
Amato, N.M.5
Rauchwerger, L.6
-
29
-
-
78650086311
-
An input-centric paradigm for program dynamic optimizations
-
K. Tian, Y. Jiang, E. Z. Zhang, and X. Shen. An input-centric paradigm for program dynamic optimizations. In Proceedings of the OOPSLA'10, pages 125-139, 2010.
-
(2010)
Proceedings of the OOPSLA'10
, pp. 125-139
-
-
Tian, K.1
Jiang, Y.2
Zhang, E.Z.3
Shen, X.4
-
33
-
-
77954691442
-
A GPGPU compiler for memory optimization and parallelism management
-
Y. Yang, P. Xiang, J. Kong, and H. Zhou. A GPGPU compiler for memory optimization and parallelism management. In Proc. of the'10 Conference on Programming Language Design and Implementation, pages 86-97, 2010.
-
(2010)
Proc. of the'10 Conference on Programming Language Design and Implementation
, pp. 86-97
-
-
Yang, Y.1
Xiang, P.2
Kong, J.3
Zhou, H.4
-
34
-
-
58449127539
-
CUDALite: Reducing GPU programming complexity
-
S. zee Ueng, M. Lathara, S. S. Baghsorkhi, and W.mei, W. Hwu. CUDALite: Reducing GPU programming complexity. In Proc. of the 21st Workshop on Languages and Compilers for Parallel Computing, pages 1-15, 2008.
-
(2008)
Proc. of the 21st Workshop on Languages and Compilers for Parallel Computing
, pp. 1-15
-
-
Ueng, S.Z.1
Lathara, M.2
Baghsorkhi, S.S.3
Mei, W.4
Hwu, W.5
|