-
1
-
-
79958694059
-
-
NVIDIA, "CUDA Zone," http://developer. nvidia. com/category/zone/cuda-zone, 2011.
-
(2011)
CUDA Zone
-
-
-
2
-
-
84885144878
-
-
Khronos Group
-
Khronos Group, "OpenCL," http://www.khronos. org/opencl/, 2011.
-
(2011)
-
-
-
4
-
-
80053211847
-
-
May
-
NVIDIA, "CUDA C Programming Guide 4. 0," http://developer. download. nvidia. com/compute/DevZone/docs/html/C/doc/CUDACProgrammingGuide. pdf, May 2011.
-
(2011)
CUDA C Programming Guide 4. 0
-
-
-
5
-
-
80053212291
-
-
May
-
NVIDIA, "CUDA C Best Practices Guide 4. 0," http://developer. download. nvidia. com/compute/DevZone/docs/html/C/doc/CUDACBestPracticesGuide. pdf, May 2011.
-
(2011)
CUDA C Best Practices Guide 4. 0
-
-
-
6
-
-
51449112813
-
Program optimization carving for GPU computing
-
S. Ryoo, C. I. Rodrigues, S. S. Stone, J. A. Stratton, S.-Z. Ueng, S. S. Baghsorkhi, and W.-m. W. Hwu, "Program Optimization Carving for GPU Computing," J. Parallel Distributed Computing, vol. 68, no. 10, pp. 1389-1401, 2008.
-
(2008)
J. Parallel Distributed Computing
, vol.68
, Issue.10
, pp. 1389-1401
-
-
Ryoo, S.1
Rodrigues, C.I.2
Stone, S.S.3
Stratton, J.A.4
Ueng, S.-Z.5
Baghsorkhi, S.S.6
Hwu, W.7
-
7
-
-
77957561221
-
An adaptive performance modeling tool for gpu architectures
-
Feb.
-
S. S. Baghsorkhi, M. Delahaye, S. J. Patel, W. D. Gropp, and W.-M. W. Hwu, "An Adaptive Performance Modeling Tool for GPU Architectures," Proc. 15th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP '10), pp. 105-114, Feb. 2010.
-
(2010)
Proc. 15th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP '10)
, pp. 105-114
-
-
Baghsorkhi, S.S.1
Delahaye, M.2
Patel, S.J.3
Gropp, W.D.4
Hwu, W.-M.W.5
-
8
-
-
70450231944
-
An Analytical Model for a gpu Architecture with Memory-Level and Thread-Level Parallelism Awareness
-
S. Hong and H. Kim, "An Analytical Model for a gpu Architecture with Memory-Level and Thread-Level Parallelism Awareness," Proc. 36th Ann. Int'l Symp. Computer Architecture (ISCA '09), pp. 152-163, 2009.
-
(2009)
Proc. 36th Ann. Int'l Symp. Computer Architecture (ISCA '09)
, pp. 152-163
-
-
Hong, S.1
Kim, H.2
-
9
-
-
77952579552
-
Demystifying GPU Microarchitecture through Microbenchmarking
-
Mar.
-
H. Wong, M.-M. Papadopoulou, M. Sadooghi-Alvandi, and A. Moshovos, "Demystifying GPU Microarchitecture through Microbenchmarking," Proc. IEEE Int'l Symp. Performance Analysis of Systems Software (ISPASS), pp. 235-246, Mar. 2010.
-
(2010)
Proc. IEEE Int'l Symp. Performance Analysis of Systems Software (ISPASS)
, pp. 235-246
-
-
Wong, H.1
Papadopoulou, M.-M.2
Sadooghi-Alvandi, M.3
Moshovos, A.4
-
12
-
-
84858379069
-
Efficient performance evaluation of memory hierarchy for highly multithreaded graphics processors
-
S. S. Baghsorkhi, I. Gelado, M. Delahaye, and W.-M. W. Hwu, "Efficient Performance Evaluation of Memory Hierarchy for Highly Multithreaded Graphics Processors," Proc. 17th ACM SIGPLAN Symp. Principles and Practice Parallel Programming (PPoPP '12), pp. 23-34, 2012.
-
(2012)
Proc. 17th ACM SIGPLAN Symp. Principles and Practice Parallel Programming (PPoPP '12)
, pp. 23-34
-
-
Baghsorkhi, S.S.1
Gelado, I.2
Delahaye, M.3
Hwu, W.-M.W.4
-
13
-
-
84879549908
-
-
White Paper
-
NVIDIA, "Fermi Compute Architecture. White Paper," http://www.nvidia. com/content/PDF/fermiwhitepapers/NVIDIA FermiComputeArchitectureWhitepaper. pdf, 2009.
-
(2009)
Fermi Compute Architecture
-
-
-
14
-
-
84885152951
-
-
Patent, US 8055856, Nov.
-
B. W. Coon, J. R. Nickolls, L. Nyland, and P. C. Mills, "Lock Mechanism to Enable Atomic Updates to Shared Memory," Patent, US 8055856, Nov. 2011.
-
(2011)
Lock Mechanism to Enable Atomic Updates to Shared Memory
-
-
Coon, B.W.1
Nickolls, J.R.2
Nyland, L.3
Mills, P.C.4
-
15
-
-
0001457509
-
Some methods for classification and analysis of multivariate observations
-
J. B. MacQueen, "Some Methods for Classification and Analysis of Multivariate Observations," Proc. Fifth Berkeley Symp. Math. Statistics and Probability, vol. 1, pp. 281-297, 1967.
-
(1967)
Proc. Fifth Berkeley Symp. Math. Statistics and Probability
, vol.1
, pp. 281-297
-
-
Macqueen, J.B.1
-
16
-
-
47349098275
-
Minebench: A benchmark suite for data mining workloads
-
Oct.
-
R. Narayanan, B. Ozisikyilmaz, J. Zambreno, G. Memik, and A. Choudhary, "Minebench: A Benchmark Suite for Data Mining Workloads," Proc. IEEE Int'l Symp. Workload Characterization, pp. 182-188, Oct. 2006.
-
(2006)
Proc. IEEE Int'l Symp. Workload Characterization
, pp. 182-188
-
-
Narayanan, R.1
Ozisikyilmaz, B.2
Zambreno, J.3
Memik, G.4
Choudhary, A.5
-
17
-
-
70449088579
-
KMeans on commodity gpus with cuda
-
B. Hong-tao, H. Li-li, O. Dan-tong, L. Zhan-shan, and L. He, "KMeans on Commodity GPUs with CUDA," Proc. WRI World Congress Computer Science and Information Eng. (CSIE '09), vol. 3, pp. 651-655, 2009.
-
(2009)
Proc. WRI World Congress Computer Science and Information Eng. (CSIE '09)
, vol.3
, pp. 651-655
-
-
Hong-Tao, B.1
Li-Li, H.2
Dan-Tong, O.3
Zhan-Shan, L.4
He, L.5
-
20
-
-
84865088781
-
K-means image segmentation on massively parallel gpu architecture
-
J. Sirotkovic, H. Dujmic, and V. Papic, "K-Means Image Segmentation on Massively Parallel GPU Architecture," Proc. 35th Int'l Convention MIPRO, pp. 489-494, 2012.
-
(2012)
Proc. 35th Int'l Convention MIPRO
, pp. 489-494
-
-
Sirotkovic, J.1
Dujmic, H.2
Papic, V.3
-
21
-
-
72449121016
-
Clustering billions of data points using gpus
-
R. Wu, B. Zhang, and M. Hsu, "Clustering Billions of Data Points Using GPUs," Proc. Combined Workshops UnConventional High Performance Computing Workshop Plus Memory Access Workshop (UCHPC-MAW '09), pp. 1-6, 2009.
-
(2009)
Proc. Combined Workshops UnConventional High Performance Computing Workshop Plus Memory Access Workshop (UCHPC-MAW '09)
, pp. 1-6
-
-
Wu, R.1
Zhang, B.2
Hsu, M.3
-
22
-
-
73449125780
-
-
technical report, Hong Kong Univ. of Science and Technology
-
W. Fang, K. K. Lau, M. Lu, X. Xiao, C. K. Lam, P. Y. Yang, B. He, Q. Luo, P. V. Sander, and K. Yang, "Parallel Data Mining on Graphics Processors," technical report, Hong Kong Univ. of Science and Technology, 2008.
-
(2008)
Parallel Data Mining on Graphics Processors
-
-
Fang, W.1
Lau, K.K.2
Lu, M.3
Xiao, X.4
Lam, C.K.5
Yang, P.Y.6
He, B.7
Luo, Q.8
Sander, P.V.9
Yang, K.10
-
23
-
-
51449118065
-
A performance study of general-purpose applications on graphics processors using cuda
-
Oct.
-
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, and K. Skadron, "A Performance Study of General-Purpose Applications on Graphics Processors Using CUDA," J. Parallel Distributed Computing, vol. 68, no. 10, pp. 1370-1380, Oct. 2008.
-
(2008)
J. Parallel Distributed Computing
, vol.68
, Issue.10
, pp. 1370-1380
-
-
Che, S.1
Boyer, M.2
Meng, J.3
Tarjan, D.4
Sheaffer, J.W.5
Skadron, K.6
-
24
-
-
78249233591
-
Speeding up k-Means Algorithm by GPUs
-
Y. Li, K. Zhao, X. Chu, and J. Liu, "Speeding up k-Means Algorithm by GPUs," Proc. IEEE 10th Int'l Conf. Computer and Information Technology (CIT), pp. 115-122, 2010.
-
(2010)
Proc. IEEE 10th Int'l Conf. Computer and Information Technology (CIT)
, pp. 115-122
-
-
Li, Y.1
Zhao, K.2
Chu, X.3
Liu, J.4
-
25
-
-
0032492432
-
Independent component filters of natural images compared with simple cells in primary visual cortex
-
Mar.
-
J. H. v. Hateren and A. v. d. Schaaf, "Independent Component Filters of Natural Images Compared with Simple Cells in Primary Visual Cortex," Proceedings: Biological Sciences, vol. 265, no. 1394, pp. 359-366, Mar. 1998.
-
(1998)
Proceedings: Biological Sciences
, vol.265
, Issue.1394
, pp. 359-366
-
-
Hateren, J.H.1
Schaaf, A.2
-
27
-
-
84870724724
-
High performance predictable histogramming on gpus: Exploring and evaluating algorithm trade-offs
-
C. Nugteren, G.-J. van den Braak, H. Corporaal, and B. Mesman, "High Performance Predictable Histogramming on GPUs: Exploring and Evaluating Algorithm Trade-Offs," Proc. Fourth Workshop General Purpose Processing Graphics Processing Units (GPGPU-4), pp. 1:1-1:8, 2011.
-
(2011)
Proc. Fourth Workshop General Purpose Processing Graphics Processing Units (GPGPU-4)
, pp. 11-18
-
-
Nugteren, C.1
Braak Den Van, G.-J.2
Corporaal, H.3
Mesman, B.4
-
28
-
-
57349086588
-
-
White Paper
-
V. Podlozhnyuk, "Histogram Calculation in CUDA. White Paper," http://developer. download. nvidia. com/compute/cuda/11/Website/projects/ histogram256/doc/histogram. pdf, 2007.
-
(2007)
Histogram Calculation in CUDA
-
-
Podlozhnyuk, V.1
|