-
1
-
-
84899700709
-
-
AMD Inc., AMD APP Profiler
-
AMD Inc., AMD APP Profiler http://developer. amd. com/tools/ heterogeneous-computing/amd-app-profiler/.
-
-
-
-
2
-
-
84899683908
-
-
The llvm compiler infrastructure
-
The llvm compiler infrastructure http://llvm. org.
-
-
-
-
3
-
-
84899683649
-
-
NVIDIA Corporation, NVIDIA Profiler
-
NVIDIA Corporation, NVIDIA Profiler http: //docs. nvidia. com/cuda/profiler-users-guide/.
-
-
-
-
4
-
-
84899696089
-
-
Nvidia's Next Generation CUDA Compute Architecture: Fermi
-
Nvidia's Next Generation CUDA Compute Architecture: Fermi http://www. nvidia. com/content/PDF/fermi-white papers/NVIDIA-Fermi-Compute-Architecture Whitepaper. pdf, 2009.
-
(2009)
-
-
-
8
-
-
84856530584
-
Divergence analysis and optimizations
-
oct.
-
B. Coutinho, D. Sampaio, F. Pereira, and W. Meira. Divergence analysis and optimizations. PACT, pages 320-329, oct. 2011.
-
(2011)
PACT
, pp. 320-329
-
-
Coutinho, B.1
Sampaio, D.2
Pereira, F.3
Meira, W.4
-
9
-
-
78149233155
-
Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems
-
New York, NY, USA,. ACM
-
G. F. Diamos, A. R. Kerr, S. Yalamanchili, and N. Clark. Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems. PACT'10, pages 353-364, New York, NY, USA, 2010. ACM.
-
(2010)
PACT'10
, pp. 353-364
-
-
Diamos, G.F.1
Kerr, A.R.2
Yalamanchili, S.3
Clark, N.4
-
10
-
-
84863463369
-
Compiling a high-level language for gpus: (Via language support for architectures and compilers)
-
C. Dubach, P. Cheng, R. M. Rabbah, D. F. Bacon, and S. J. Fink. Compiling a high-level language for gpus: (via language support for architectures and compilers). In PLDI, pages 1-12, 2012.
-
(2012)
PLDI
, pp. 1-12
-
-
Dubach, C.1
Cheng, P.2
Rabbah, R.M.3
Bacon, D.F.4
Fink, S.J.5
-
11
-
-
84876937393
-
Portable mapping of data parallel programs to opencl for heterogeneous systems
-
D. Grewe, Z. Wang, and M. F. O'Boyle. Portable mapping of data parallel programs to opencl for heterogeneous systems. CGO'13. ACM, 2013.
-
(2013)
CGO'13. ACM
-
-
Grewe, D.1
Wang, Z.2
O'boyle, M.F.3
-
12
-
-
79953071805
-
Sponge: Portable stream programming on graphics engines
-
New York, NY, USA, ACM
-
A. H. Hormati, M. Samadi, M. Woh, T. Mudge, and S. Mahlke. Sponge: portable stream programming on graphics engines. ASPLOS'11, pages 381-392, New York, NY, USA, 2011. ACM.
-
(2011)
ASPLOS'11
, pp. 381-392
-
-
Hormati, A.H.1
Samadi, M.2
Woh, M.3
Mudge, T.4
Mahlke, S.5
-
14
-
-
79957502935
-
Whole-function vectorization
-
april
-
R. Karrenberg and S. Hack. Whole-function vectorization. CGO'11, pages 141-150, april 2011.
-
(2011)
CGO'11
, pp. 141-150
-
-
Karrenberg, R.1
Hack, S.2
-
15
-
-
84859143447
-
Improving performance of opencl on cpus
-
R. Karrenberg and S. Hack. Improving performance of opencl on cpus. CC, pages 1-20, 2012.
-
(2012)
CC
, pp. 1-20
-
-
Karrenberg, R.1
Hack, S.2
-
16
-
-
77952256778
-
Modeling gpu-cpu workloads and systems
-
New York, NY, USA,. ACM
-
A. Kerr, G. Diamos, and S. Yalamanchili. Modeling gpu-cpu workloads and systems. GPGPU'10, pages 31-42, New York, NY, USA, 2010. ACM.
-
(2010)
GPGPU'10
, pp. 31-42
-
-
Kerr, A.1
Diamos, G.2
Yalamanchili, S.3
-
17
-
-
70450103746
-
A cross-input adaptive framework for gpu program optimizations
-
may
-
Y. Liu, E. Zhang, and X. Shen. A cross-input adaptive framework for gpu program optimizations. IPDPS'09, pages 1-10, may 2009.
-
(2009)
IPDPS'09
, pp. 1-10
-
-
Liu, Y.1
Zhang, E.2
Shen, X.3
-
18
-
-
33745304805
-
Pin: Building customized program analysis tools with dynamic instrumentation
-
June
-
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. SIGPLAN Not., 40(6):190-200, June 2005.
-
(2005)
SIGPLAN Not.
, vol.40
, Issue.6
, pp. 190-200
-
-
Luk, C.-K.1
Cohn, R.2
Muth, R.3
Patil, H.4
Klauser, A.5
Lowney, G.6
Wallace, S.7
Reddi, V.J.8
Hazelwood, K.9
-
21
-
-
79959466764
-
Optimization principles and application performance evaluation of a multithreaded gpu using cuda
-
New York, NY, USA,. ACM
-
S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W.-m. W. Hwu. Optimization principles and application performance evaluation of a multithreaded gpu using cuda. PPoPP'08, pages 73-82, New York, NY, USA, 2008. ACM.
-
(2008)
PPoPP'08
, pp. 73-82
-
-
Ryoo, S.1
Rodrigues, C.I.2
Baghsorkhi, S.S.3
Stone, S.S.4
Kirk, D.B.5
Hwu, W.-M.W.6
-
22
-
-
84863347222
-
A performance analysis framework for identifying potential benefits in gpgpu applications
-
New York, NY, USA, ACM
-
J. Sim, A. Dasgupta, H. Kim, and R. Vuduc. A performance analysis framework for identifying potential benefits in gpgpu applications. PPoPP'12, pages 11-22, New York, NY, USA, 2012. ACM.
-
(2012)
PPoPP'12
, pp. 11-22
-
-
Sim, J.1
Dasgupta, A.2
Kim, H.3
Vuduc, R.4
-
23
-
-
84859153100
-
Automatic restructuring of gpu kernels for exploiting inter-thread data locality
-
S. Unkule, C. Shaltz, and A. Qasem. Automatic restructuring of gpu kernels for exploiting inter-thread data locality. CC, pages 21-40, 2012.
-
(2012)
CC
, pp. 21-40
-
-
Unkule, S.1
Shaltz, C.2
Qasem, A.3
-
24
-
-
70350771131
-
Benchmarking gpus to tune dense linear algebra
-
Piscataway, NJ, USA,. IEEE Press
-
V. Volkov and J. W. Demmel. Benchmarking gpus to tune dense linear algebra. SC'08, pages 31:1-31:11, Piscataway, NJ, USA, 2008. IEEE Press.
-
(2008)
SC'08
, pp. 311-3111
-
-
Volkov, V.1
Demmel, J.W.2
-
25
-
-
85050273691
-
Program slicing
-
Piscataway, NJ, USA,. IEEE Press
-
M. Weiser. Program slicing. ICSE'81, pages 439-449, Piscataway, NJ, USA, 1981. IEEE Press.
-
(1981)
ICSE'81
, pp. 439-449
-
-
Weiser, M.1
-
26
-
-
84863053984
-
Linear-time modeling of program working set in shared cache
-
X. Xiang, B. Bao, C. Ding, and Y. Gao. Linear-time modeling of program working set in shared cache. In Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on, pages 350-360, 2011.
-
(2011)
Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on
, pp. 350-360
-
-
Xiang, X.1
Bao, B.2
Ding, C.3
Gao, Y.4
-
27
-
-
84863663143
-
A unified optimizing compiler framework for different gpgpu architectures
-
Y. Yang, P. Xiang, J. Kong, M. Mantor, and H. Zhou. A unified optimizing compiler framework for different gpgpu architectures. TACO, 9(2):9, 2012.
-
(2012)
TACO
, vol.9
, Issue.2
, pp. 9
-
-
Yang, Y.1
Xiang, P.2
Kong, J.3
Mantor, M.4
Zhou, H.5
-
28
-
-
79953126288
-
On-the-fly elimination of dynamic irregularities for gpu computing
-
New York, NY, USA, ACM
-
E. Z. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen. On-the-fly elimination of dynamic irregularities for gpu computing. ASPLOS'11, pages 369-380, New York, NY, USA, 2011. ACM.
-
(2011)
ASPLOS'11
, pp. 369-380
-
-
Zhang, E.Z.1
Jiang, Y.2
Guo, Z.3
Tian, K.4
Shen, X.5
|