-
1
-
-
10644248153
-
Brook for GPUs: Stream computing on graphics hardware
-
Aug
-
I. Buck et al. Brook for GPUs: Stream computing on graphics hardware. ACM Transactions on Graphics, 23(3):777-786, Aug. 2004.
-
(2004)
ACM Transactions on Graphics
, vol.23
, Issue.3
, pp. 777-786
-
-
Buck, I.1
-
2
-
-
77952565941
-
Weak execution ordering - Exploiting iterative methods on many-core gpus
-
J. Chen, Z. Huang, F. Su, J.-K. Peir, J. Ho, and L. Peng. Weak execution ordering - exploiting iterative methods on many-core gpus. In Proc. of the 2010 IEEE Symposium on Performance Analysis of Systems and Software, pages 154-163, 2010.
-
(2010)
Proc. of the 2010 IEEE Symposium on Performance Analysis of Systems and Software
, pp. 154-163
-
-
Chen, J.1
Huang, Z.2
Su, F.3
Peir, J.-K.4
Ho, J.5
Peng, L.6
-
3
-
-
34548207355
-
Sequoia: Programming the memory hierarchy
-
K. Fatahalian, D. R. Horn, T. J. Knight, L. Leem, M. Houston, J. Y. Park, M. Erez, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: programming the memory hierarchy. In Proceedings of the 2006 ACM/IEEE conference on Supercomputing, page 83, 2006.
-
(2006)
Proceedings of the 2006 ACM/IEEE Conference on Supercomputing
, pp. 83
-
-
Fatahalian, K.1
Horn, D.R.2
Knight, T.J.3
Leem, L.4
Houston, M.5
Park, J.Y.6
Erez, M.7
Ren, M.8
Aiken, A.9
Dally, W.J.10
Hanrahan, P.11
-
4
-
-
47349104432
-
Dynamic warp formation and scheduling for efficient GPU control flow
-
W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt. Dynamic warp formation and scheduling for efficient GPU control flow. In Proc. of the 40th Annual International Symposium on Microarchitecture, pages 407-420, 2007.
-
(2007)
Proc. of the 40th Annual International Symposium on Microarchitecture
, pp. 407-420
-
-
Fung, W.W.L.1
Sham, I.2
Yuan, G.3
Aamodt, T.M.4
-
5
-
-
34547423880
-
Exploiting coarsegrained task, data, and pipeline parallelism in stream programs
-
M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarsegrained task, data, and pipeline parallelism in stream programs. In 14th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 151-162, 2006.
-
(2006)
14th International Conference on Architectural Support for Programming Languages and Operating Systems
, pp. 151-162
-
-
Gordon, M.I.1
Thies, W.2
Amarasinghe, S.3
-
6
-
-
0036959649
-
A stream compiler for communication-exposed architectures
-
Oct
-
M. I. Gordon, W. Thies, M. Karczmarek, J. Lin, A. S. Meli, A. A. Lamb, C. Leger, J. Wong, H. Hoffmann, D. Maze, and S. Amarasinghe. A stream compiler for communication-exposed architectures. In Tenth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 291-303, Oct. 2002.
-
(2002)
Tenth International Conference on Architectural Support for Programming Languages and Operating Systems
, pp. 291-303
-
-
Gordon, M.I.1
Thies, W.2
Karczmarek, M.3
Lin, J.4
Meli, A.S.5
Lamb, A.A.6
Leger, C.7
Wong, J.8
Hoffmann, H.9
Maze, D.10
Amarasinghe, S.11
-
7
-
-
0026255448
-
Compile-time scheduling and assignment of data-flow program graphs with data-dependent iteration
-
S. Ha and E. A. Lee. Compile-time scheduling and assignment of data-flow program graphs with data-dependent iteration. IEEE Transactions on Computers, 40(11):1225-1238, 1991.
-
(1991)
IEEE Transactions on Computers
, vol.40
, Issue.11
, pp. 1225-1238
-
-
Ha, S.1
Lee, E.A.2
-
10
-
-
63349092007
-
Optimus: Efficient realization of streaming applications on FPGAs
-
Oct
-
A. Hormati, M. Kudlur, D. Bacon, S. Mahlke, and R. Rabbah. Optimus: Efficient realization of streaming applications on FPGAs. In Proc. of the 2008 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pages 41-50, Oct. 2008.
-
(2008)
Proc. of the 2008 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems
, pp. 41-50
-
-
Hormati, A.1
Kudlur, M.2
Bacon, D.3
Mahlke, S.4
Rabbah, R.5
-
11
-
-
70449669477
-
Flextream: Adaptive compilation of streaming applications for heterogeneous architectures
-
A. H. Hormati, Y. Choi, M. Kudlur, R. Rabbah, T. Mudge, and S. Mahlke. Flextream: Adaptive compilation of streaming applications for heterogeneous architectures. In Proc. of the 18th International Conference on Parallel Architectures and Compilation Techniques, pages 214-223, 2009.
-
(2009)
Proc. of the 18th International Conference on Parallel Architectures and Compilation Techniques
, pp. 214-223
-
-
Hormati, A.H.1
Choi, Y.2
Kudlur, M.3
Rabbah, R.4
Mudge, T.5
Mahlke, S.6
-
12
-
-
77952252026
-
MacRoss: Macro-simdization of streaming applications
-
A. H. Hormati, Y. Choi,M.Woh,M. Kudlur, T.Mudge, and S.Mahlke. Macross: Macro-simdization of streaming applications. In 18th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 285-296, 2010.
-
(2010)
18th International Conference on Architectural Support for Programming Languages and Operating Systems
, pp. 285-296
-
-
Hormati, A.H.1
Choi, Y.2
Woh, M.3
Kudlur, M.4
Mudge, T.5
Mahlke, S.6
-
17
-
-
77954995885
-
Debunking the 100x GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU
-
V. W. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey. Debunking the 100x GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. In Proc. of the 37th Annual International Symposium on Computer Architecture, pages 451-460, 2010.
-
(2010)
Proc. of the 37th Annual International Symposium on Computer Architecture
, pp. 451-460
-
-
Lee, V.W.1
Kim, C.2
Chhugani, J.3
Deisher, M.4
Kim, D.5
Nguyen, A.D.6
Satish, N.7
Smelyanskiy, M.8
Chennupaty, S.9
Hammarlund, P.10
Singhal, R.11
Dubey, P.12
-
18
-
-
77953983400
-
Cg: A system for programming graphics hardware in a C-like language
-
July
-
W. Mark, R. Glanville, K. Akeley, and J. Kilgard. Cg: A system for programming graphics hardware in a C-like language. In Proc. of the 30thInternational Conference on Computer Graphics and Interactive Techniques, pages 893-907, July 2003.
-
(2003)
Proc. of the 30thInternational Conference on Computer Graphics and Interactive Techniques
, pp. 893-907
-
-
Mark, W.1
Glanville, R.2
Akeley, K.3
Kilgard, J.4
-
19
-
-
34547309668
-
-
June
-
NVIDIA. CUDA Programming Guide, June 2007. http://developer. download.nvidia.com/compute/cuda.
-
(2007)
CUDA Programming Guide
-
-
-
22
-
-
33846545187
-
A hierarchical multiprocessor scheduling framework for synchronous dataflow graphs
-
University of California, Berkeley, May
-
J. L. Pino, S. S. Bhattacharyya, and E. A. Lee. A hierarchical multiprocessor scheduling framework for synchronous dataflow graphs. Technical Report UCB/ERL M95/36, University of California, Berkeley, May 1995.
-
(1995)
Technical Report UCB/ERL M95/36
-
-
Pino, J.L.1
Bhattacharyya, S.S.2
Lee, E.A.3
-
23
-
-
79959466764
-
Optimization principles and application performance evaluation of a multithreaded gpu using cuda
-
S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W. meiW. Hwu. Optimization principles and application performance evaluation of a multithreaded gpu using cuda. In Proc. of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 73-82, 2008.
-
(2008)
Proc. of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
, pp. 73-82
-
-
Ryoo, S.1
Rodrigues, C.I.2
Baghsorkhi, S.S.3
Stone, S.S.4
Kirk, D.B.5
Mei, W.6
Hwu, W.7
-
28
-
-
57349101237
-
Data and computation transformations for brook streaming applications on multiprocessors
-
S. wei Liao, Z. Du, G. Wu, and G.-Y. Lueh. Data and computation transformations for brook streaming applications on multiprocessors. Proc. of the 2006 International Symposium on Code Generation and Optimization, 0(1):196-207, 2006.
-
(2006)
Proc. of the 2006 International Symposium on Code Generation and Optimization
, Issue.1
, pp. 196-207
-
-
Wei Liao, S.1
Du, Z.2
Wu, G.3
Lueh, G.-Y.4
-
29
-
-
77954691442
-
A gpgpu compiler for memory optimization and parallelism management
-
Y. Yang, P. Xiang, J. Kong, and H. Zhou. A gpgpu compiler for memory optimization and parallelism management. In Proc. of the '10 Conference on Programming Language Design and Implementation, pages 86-97, 2010.
-
(2010)
Proc. of the '10 Conference on Programming Language Design and Implementation
, pp. 86-97
-
-
Yang, Y.1
Xiang, P.2
Kong, J.3
Zhou, H.4
-
30
-
-
58449127539
-
Cuda-lite: Reducing gpu programming complexity
-
S. zee Ueng, M. Lathara, S. S. Baghsorkhi, and W. mei W. Hwu. Cuda-lite: Reducing gpu programming complexity. In Proc. of the 21st Workshop on Languages and Compilers for Parallel Computing, pages 1-15, 2008.
-
(2008)
Proc. of the 21st Workshop on Languages and Compilers for Parallel Computing
, pp. 1-15
-
-
Zee Ueng, S.1
Lathara, M.2
Baghsorkhi, S.S.3
Mei, W.4
Hwu, W.5
|