-
1
-
-
79959466764
-
-
S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W. mei W. Hwu, Optimization principles and application performance evaluation of a multithreaded GPU using CUDA, in Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (23th PPOPP'2008). Salt Lake City, UT: ACM SIGPLAN, Feb. 2008, pp. 73-82.
-
S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W. mei W. Hwu, "Optimization principles and application performance evaluation of a multithreaded GPU using CUDA," in Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (23th PPOPP'2008). Salt Lake City, UT: ACM SIGPLAN, Feb. 2008, pp. 73-82.
-
-
-
-
2
-
-
51449118065
-
A performance study of general-purpose applications on graphics processors using CUDA
-
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, and K. Skadron, "A performance study of general-purpose applications on graphics processors using CUDA," J. Parallel Distrib. Comput, vol. 68, no. 10, pp. 1370-1380, 2008.
-
(2008)
J. Parallel Distrib. Comput
, vol.68
, Issue.10
, pp. 1370-1380
-
-
Che, S.1
Boyer, M.2
Meng, J.3
Tarjan, D.4
Sheaffer, J.W.5
Skadron, K.6
-
3
-
-
67650021816
-
-
G. Quintana-Ort?́, F. D. Igual, E. S. Quintana-Ort?́, and R. A. van de Geijn, Solving dense linear systems on platforms with multiple hardware accelerators, in PPOPP, D. A. Reed and V. Sarkar, Eds. ACM, 2009, pp. 121-130.
-
G. Quintana-Ort?́, F. D. Igual, E. S. Quintana-Ort?́, and R. A. van de Geijn, "Solving dense linear systems on platforms with multiple hardware accelerators," in PPOPP, D. A. Reed and V. Sarkar, Eds. ACM, 2009, pp. 121-130.
-
-
-
-
4
-
-
35948931417
-
Cache-efficient numerical algorithms using graphics hardware
-
N. K. Govindaraju and D. Manocha, "Cache-efficient numerical algorithms using graphics hardware," Parallel Comput., vol. 33, no. 10-11, pp. 663-684, 2007.
-
(2007)
Parallel Comput
, vol.33
, Issue.10-11
, pp. 663-684
-
-
Govindaraju, N.K.1
Manocha, D.2
-
5
-
-
10644248153
-
Brook for GPUs: Stream computing on graphics hardware
-
Aug
-
I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan, "Brook for GPUs: stream computing on graphics hardware," ACM Transactions on Graphics, vol. 23, no. 3, pp. 777-786, Aug. 2004.
-
(2004)
ACM Transactions on Graphics
, vol.23
, Issue.3
, pp. 777-786
-
-
Buck, I.1
Foley, T.2
Horn, D.3
Sugerman, J.4
Fatahalian, K.5
Houston, M.6
Hanrahan, P.7
-
8
-
-
24644456455
-
Automatic tiling of iterative stencil loops
-
Z. Li and Y. Song, "Automatic tiling of iterative stencil loops," ACM Trans. Program. Lang. Syst, vol. 26, no. 6, pp. 975-1028, 2004.
-
(2004)
ACM Trans. Program. Lang. Syst
, vol.26
, Issue.6
, pp. 975-1028
-
-
Li, Z.1
Song, Y.2
-
9
-
-
84877082695
-
Identifying and exploiting spatial regularity in data memory references
-
Washington, DC, USA: IEEE Computer Society
-
T. Mohan, B. R. d. Supinski, S. A. McKee, F. Mueller, A. Yoo, and M. Schulz, "Identifying and exploiting spatial regularity in data memory references," in SC '03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing. Washington, DC, USA: IEEE Computer Society, 2003, p. 49.
-
(2003)
SC '03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing
, pp. 49
-
-
Mohan, T.1
Supinski, B.R.D.2
McKee, S.A.3
Mueller, F.4
Yoo, A.5
Schulz, M.6
-
11
-
-
67650702543
-
Architecture-aware optimization targeting multithreaded stream computing
-
New York, NY, USA: ACM
-
B. Jang, S. Do, H. Pien, and D. Kaeli, "Architecture-aware optimization targeting multithreaded stream computing," in GPGPU-2: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units. New York, NY, USA: ACM, 2009, pp. 62-70.
-
(2009)
GPGPU-2: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
, pp. 62-70
-
-
Jang, B.1
Do, S.2
Pien, H.3
Kaeli, D.4
-
12
-
-
67650081010
-
Openmp to gpgpu: A compiler framework for automatic translation and optimization
-
New York, NY, USA: ACM
-
S. Lee, S.-J. Min, and R. Eigenmann, "Openmp to gpgpu: a compiler framework for automatic translation and optimization," in PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming. New York, NY, USA: ACM, 2009, pp. 101-110.
-
(2009)
PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
, pp. 101-110
-
-
Lee, S.1
Min, S.-J.2
Eigenmann, R.3
-
13
-
-
43449094719
-
-
S. Ryoo, C. I. Rodrigues, S. S. Stone, S. S. Baghsorkhi, S.-Z. Ueng, J. A. Stratton, and W. mei W. Hwu, Program optimization space pruning for a multithreaded gpu, in CGO, M. L. Soffa and E. Duesterwald, Eds. ACM, 2008, pp. 195-204.
-
S. Ryoo, C. I. Rodrigues, S. S. Stone, S. S. Baghsorkhi, S.-Z. Ueng, J. A. Stratton, and W. mei W. Hwu, "Program optimization space pruning for a multithreaded gpu," in CGO, M. L. Soffa and E. Duesterwald, Eds. ACM, 2008, pp. 195-204.
-
-
-
-
14
-
-
67650784628
-
Feedback-driven threading: Power-efficient and high-performance execution of multi-threaded workloads on cmps
-
M. A. Suleman, M. K. Qureshi, and Y. N. Patt, "Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on cmps," SIGARCH Comput. Archit. News, vol. 36, no. 1, pp. 277-286, 2008.
-
(2008)
SIGARCH Comput. Archit. News
, vol.36
, Issue.1
, pp. 277-286
-
-
Suleman, M.A.1
Qureshi, M.K.2
Patt, Y.N.3
-
15
-
-
70349169075
-
Analyzing CUDA Workloads Using a Detailed GPU Simulator
-
April
-
A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing CUDA Workloads Using a Detailed GPU Simulator," in IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2009), April 2009, pp. 163-174.
-
(2009)
IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2009)
, pp. 163-174
-
-
Bakhoda, A.1
Yuan, G.L.2
Fung, W.W.L.3
Wong, H.4
Aamodt, T.M.5
|