-
1
-
-
79959578272
-
-
"http://www.top500.org/."
-
-
-
-
2
-
-
74049140954
-
Scalable parallel programming with CUDA
-
ACM
-
J. Nickolls, I. Buck, M. Garland, and K. Skadron, "Scalable parallel programming with CUDA," in SIGGRAPH '08: ACM SIGGRAPH 2008 classes, pp. 1-14, ACM, 2008.
-
(2008)
SIGGRAPH '08: ACM SIGGRAPH 2008 Classes
, pp. 1-14
-
-
Nickolls, J.1
Buck, I.2
Garland, M.3
Skadron, K.4
-
4
-
-
74049143158
-
Implementing sparse matrix-vector multiplication on throughput-oriented processors
-
ACM
-
N. Bell and M. Garland, "Implementing sparse matrix-vector multiplication on throughput-oriented processors," in Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pp. 18:1-18:11, ACM, 2009.
-
(2009)
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09
-
-
Bell, N.1
Garland, M.2
-
6
-
-
70350771127
-
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
-
IEEE Press
-
K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. Patterson, J. Shalf, and K. Yelick, "Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures," in Proceedings of the 2008 ACM/IEEE conference on Supercomputing, SC '08, pp. 4:1-4:12, IEEE Press, 2008.
-
(2008)
Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC '08
-
-
Datta, K.1
Murphy, M.2
Volkov, V.3
Williams, S.4
Carter, J.5
Oliker, L.6
Patterson, D.7
Shalf, J.8
Yelick, K.9
-
7
-
-
35048828869
-
The FFT on a GPU
-
Eurographics Association
-
K. Moreland and E. Angel, "The FFT on a GPU," in Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, HWWS '03, pp. 112-119, Eurographics Association, 2003.
-
(2003)
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, HWWS '03
, pp. 112-119
-
-
Moreland, K.1
Angel, E.2
-
11
-
-
84966549063
-
Treating a user-defined parallel library as a domain-specific language
-
IEEE Computer Society
-
D. J. Quinlan, B. Miller, B. Philip, and M. Schordan, "Treating a user-defined parallel library as a domain-specific language," in Proceedings of the 16th International Parallel and Distributed Processing Symposium, IPDPS '02, pp. 324-, IEEE Computer Society, 2002.
-
(2002)
Proceedings of the 16th International Parallel and Distributed Processing Symposium, IPDPS '02
-
-
Quinlan, D.J.1
Miller, B.2
Philip, B.3
Schordan, M.4
-
12
-
-
79959599884
-
-
"Rose." http://www.rosecompiler.org.
-
-
-
-
15
-
-
34250216007
-
Scientific computing kernels on the Cell processor
-
June
-
S. Williams, J. Shalf, L. Oliker, S. Kamil, P. Husbands, and K. Yelick, "Scientific computing kernels on the Cell processor," Int. J. Parallel Program., vol. 35, pp. 263-298, June 2007.
-
(2007)
Int. J. Parallel Program.
, vol.35
, pp. 263-298
-
-
Williams, S.1
Shalf, J.2
Oliker, L.3
Kamil, S.4
Husbands, P.5
Yelick, K.6
-
16
-
-
25844503119
-
Introduction to the Cell multiprocessor
-
July
-
J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy, "Introduction to the Cell multiprocessor," IBM J. Res. Dev., vol. 49, pp. 589-604, July 2005.
-
(2005)
IBM J. Res. Dev.
, vol.49
, pp. 589-604
-
-
Kahle, J.A.1
Day, M.N.2
Hofstee, H.P.3
Johns, C.R.4
Maeurer, T.R.5
Shippy, D.6
-
17
-
-
84971423310
-
Auto-tuning the 27-point stencil for multicore
-
K. Datta, S. Williams, V. Volkov, J. Carter, L. Oliker, J. Shalf, and K. Yelick, "Auto-tuning the 27-point stencil for multicore," in iWAPT, 4th International Workshop on Automatic Performance Tuning, 2009.
-
iWAPT, 4th International Workshop on Automatic Performance Tuning, 2009
-
-
Datta, K.1
Williams, S.2
Volkov, V.3
Carter, J.4
Oliker, L.5
Shalf, J.6
Yelick, K.7
-
18
-
-
70450103746
-
A cross-input adaptive framework for GPU program optimizations
-
Y. Liu, E. Z. Zhang, and X. Shen, "A cross-input adaptive framework for GPU program optimizations," in Int. Parallel and Distributed Processing Symp., pp. 1-10, 2009.
-
(2009)
Int. Parallel and Distributed Processing Symp.
, pp. 1-10
-
-
Liu, Y.1
Zhang, E.Z.2
Shen, X.3
-
19
-
-
78349275320
-
Source-to-source optimization of CUDA C for GPU accelerated cardiac cell modeling
-
Springer-Verlag
-
F. V. Lionetti, A. D. McCulloch, and S. B. Baden, "Source-to-source optimization of CUDA C for GPU accelerated cardiac cell modeling," in Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I, EuroPar'10, pp. 38-49, Springer-Verlag, 2010.
-
(2010)
Proceedings of the 16th International Euro-Par Conference on Parallel Processing: Part I, EuroPar'10
, pp. 38-49
-
-
Lionetti, F.V.1
McCulloch, A.D.2
Baden, S.B.3
-
20
-
-
77954022347
-
An auto-tuning framework for parallel multicore stencil computations
-
S. Kamil, C. Chan, L. Oliker, J. Shalf, and S. Williams, "An auto-tuning framework for parallel multicore stencil computations," in Interational Conference on Parallel and Distributed Computing Systems (IPDPS), 2010.
-
Interational Conference on Parallel and Distributed Computing Systems (IPDPS), 2010
-
-
Kamil, S.1
Chan, C.2
Oliker, L.3
Shalf, J.4
Williams, S.5
-
21
-
-
33646558229
-
Using advanced compiler technology to exploit the performance of the Cell Broadband Engine architecture
-
January
-
A. E. Eichenberger, J. K. O'Brien, K. M. O'Brien, P. Wu, T. Chen, P. H. Oden, D. A. Prener, J. C. Shepherd, B. So, Z. Sura, A. Wang, T. Zhang, P. Zhao, M. K. Gschwind, R. Archambault, Y. Gao, and R. Koo, "Using advanced compiler technology to exploit the performance of the Cell Broadband Engine architecture," IBM Syst. J., vol. 45, pp. 59-84, January 2006.
-
(2006)
IBM Syst. J.
, vol.45
, pp. 59-84
-
-
Eichenberger, A.E.1
O'Brien, J.K.2
O'Brien, K.M.3
Wu, P.4
Chen, T.5
Oden, P.H.6
Prener, D.A.7
Shepherd, J.C.8
So, B.9
Sura, Z.10
Wang, A.11
Zhang, T.12
Zhao, P.13
Gschwind, M.K.14
Archambault, R.15
Gao, Y.16
Koo, R.17
-
23
-
-
67650081010
-
OpenMP to GPGPU: A compiler framework for automatic translation and optimization
-
February
-
S. Lee, S.-J. Min, and R. Eigenmann, "OpenMP to GPGPU: a compiler framework for automatic translation and optimization," SIGPLAN Not., vol. 44, pp. 101-110, February 2009.
-
(2009)
SIGPLAN Not.
, vol.44
, pp. 101-110
-
-
Lee, S.1
Min, S.-J.2
Eigenmann, R.3
-
24
-
-
78650802947
-
OpenMPC: Extended OpenMP Programming and Tuning for GPUs
-
IEEE Computer Society
-
S. Lee and R. Eigenmann, "OpenMPC: Extended OpenMP Programming and Tuning for GPUs," in Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC '10, pp. 1-11, IEEE Computer Society, 2010.
-
(2010)
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC '10
, pp. 1-11
-
-
Lee, S.1
Eigenmann, R.2
|