-
1
-
-
84866846609
-
-
Available
-
NVIDIA. CUDA. Available: http://developer.nvidia.com/object/cuda.html
-
CUDA
-
-
-
3
-
-
77957561221
-
An adaptive performance modeling tool for GPU architectures
-
S. S. Baghsorkhi, M. Delahaye, S. J. Patel, W. D. Gropp, and W.-m. W. Hwu, "An adaptive performance modeling tool for GPU architectures," PPoPP, 2010, pp. 105-114
-
(2010)
PPoPP
, pp. 105-114
-
-
Baghsorkhi, S.S.1
Delahaye, M.2
Patel, S.J.3
Gropp, W.D.4
Hwu, W.-M.W.5
-
4
-
-
77954724148
-
Streamlining GPU applications on the fly: Thread divergence elimination through runtime thread-data remapping
-
E. Z. Zhang, Y. Jiang, Z. Guo, and X. Shen, "Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping," ICS, 2010, pp. 115-126
-
(2010)
ICS
, pp. 115-126
-
-
Zhang, E.Z.1
Jiang, Y.2
Guo, Z.3
Shen, X.4
-
5
-
-
79953126288
-
On-the-fly elimination of dynamic irregularities for GPU computing
-
E. Z. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen, "On-the-fly elimination of dynamic irregularities for GPU computing," ASPLOS, 2011, pp. 369-380
-
(2011)
ASPLOS
, pp. 369-380
-
-
Zhang, E.Z.1
Jiang, Y.2
Guo, Z.3
Tian, K.4
Shen, X.5
-
6
-
-
47349104432
-
Dynamic warp formation and scheduling for efficient gpu control flow
-
W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt, "Dynamic warp formation and scheduling for efficient gpu control flow," MICRO, 2007, pp. 407-420.
-
(2007)
MICRO
, pp. 407-420
-
-
Fung, W.W.L.1
Sham, I.2
Yuan, G.3
Aamodt, T.M.4
-
7
-
-
77954976292
-
Dynamic warp subdivision for integrated branch and memory divergence tolerance
-
J. Meng, D. Tarjan, and K. Skadron, "Dynamic warp subdivision for integrated branch and memory divergence tolerance," ISCA, 2010, pp. 236-246.
-
(2010)
ISCA
, pp. 236-246
-
-
Meng, J.1
Tarjan, D.2
Skadron, K.3
-
8
-
-
70450231944
-
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
-
S. Hong and H. Kim, "An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness," ISCA, 2009, pp. 152-163.
-
(2009)
ISCA
, pp. 152-163
-
-
Hong, S.1
Kim, H.2
-
9
-
-
84862182989
-
-
Available
-
NVIDIA. (2011). Compute Visual Profiler User Guide. Available: http://developer.nvidia.com/nvidia-gpu-computing-documentation
-
(2011)
Compute Visual Profiler User Guide
-
-
-
10
-
-
0035182089
-
Basic block distribution analysis to find periodic behavior and simulation points in applications
-
T. Sherwood, E. Perelman, and B. Calder, "Basic block distribution analysis to find periodic behavior and simulation points in applications," PACT 2001, pp. 3-14.
-
(2001)
PACT
, pp. 3-14
-
-
Sherwood, T.1
Perelman, E.2
Calder, B.3
-
11
-
-
70349169075
-
Analyzing CUDA workloads using a detailed GPU simulator
-
A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing CUDA workloads using a detailed GPU simulator," ISPASS, 2009 pp. 163-174.
-
(2009)
ISPASS
, pp. 163-174
-
-
Bakhoda, A.1
Yuan, G.L.2
Fung, W.W.L.3
Wong, H.4
Aamodt, T.M.5
-
12
-
-
0024700878
-
Determining average program execution times and their variance
-
V. Sarkar, "Determining average program execution times and their variance," PLDI, 1989 pp. 298-312
-
(1989)
PLDI
, pp. 298-312
-
-
Sarkar, V.1
-
14
-
-
79551704836
-
-
Available
-
NVIDIA. (2011). Nvidia cuda c programming guide. Available: http://developer.nvidia.com/nvidia-gpu-computing-documentation
-
(2011)
Nvidia Cuda C Programming Guide
-
-
-
15
-
-
0036647190
-
An efficient-means clustering algorithm: Analysis and implementation
-
T. Kanungo, et al., "An efficient-means clustering algorithm: Analysis and implementation," IEEE TPAMI, pp. 881-892, 2002.
-
(2002)
IEEE TPAMI
, pp. 881-892
-
-
Kanungo, T.1
-
16
-
-
0023381475
-
Marching cubes: A high resolution 3D surface construction algorithm
-
W. E. Lorensen H. E. Cline, "Marching cubes: A high resolution 3D surface construction algorithm," ACM Siggraph Computer Graphics, vol. 21, pp. 163-169, 1987.
-
(1987)
ACM Siggraph Computer Graphics
, vol.21
, pp. 163-169
-
-
Lorensen, W.E.1
Cline, H.E.2
-
17
-
-
70449844290
-
Sequence alignment with GPU: Performance and design challenges
-
G. M. Striemer and A. Akoglu, "Sequence alignment with GPU: Performance and design challenges," IPDPS, 2009, pp. 1-10
-
(2009)
IPDPS
, pp. 1-10
-
-
Striemer, G.M.1
Akoglu, A.2
-
18
-
-
38849131252
-
High-throughput sequence alignment using Graphics Processing Units
-
M. Schatz, C. Trapnell, A. Delcher, A. Varshney, "High-throughput sequence alignment using Graphics Processing Units," BMC bioinformatics, vol. 8, p. 474, 2007.
-
(2007)
BMC Bioinformatics
, vol.8
, pp. 474
-
-
Schatz, M.1
Trapnell, C.2
Delcher, A.3
Varshney, A.4
-
19
-
-
80052657491
-
-
NVIDIA. Occupancy Calculator. http://developer.nvidia.com/object/cuda-3- 2-tooklit-rc.html.
-
Occupancy Calculator
-
-
-
20
-
-
84866849151
-
-
CUDA-EC
-
CUDA-EC, NVIDIA Tesla Bio Workbench. http://www.nvidia.com/object/ec-on- tesla.html.
-
-
-
-
21
-
-
79952767559
-
Accelerating global sequence alignment using CUDA compatible multi-core GPU
-
Siriwardena, T.R.P. Ranasinghe, D.N., " Accelerating global sequence alignment using CUDA compatible multi-core GPU ", ICIAfS, 2010, pp. 201-206.
-
(2010)
ICIAfS
, pp. 201-206
-
-
Siriwardena, T.R.P.1
Ranasinghe, D.N.2
-
22
-
-
77952265152
-
Optimizing Matrix Transpose in CUDA
-
Ruetsch G. and Micikevicius P., "Optimizing Matrix Transpose in CUDA", NVIDIA. 2009.
-
(2009)
NVIDIA
-
-
Ruetsch, G.1
Micikevicius, P.2
-
23
-
-
77952579552
-
Demystifying GPU Microarchitecture through Microbenchmarking
-
Wong, H., Papadopoulou, M.M, Sadooghi-Alvandi, M, and Moshovos. A "Demystifying GPU Microarchitecture through Microbenchmarking". ISPASS, 2010, pp. 235-246.
-
(2010)
ISPASS
, pp. 235-246
-
-
Wong, H.1
Papadopoulou, M.M.2
Sadooghi-Alvandi, M.3
Moshovos, A.4
-
24
-
-
68549096107
-
Dynamic Warp Formation: Efficient MIMD Control Flow on SIMD Graphics Hardware
-
Article 7
-
W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt. "Dynamic Warp Formation: Efficient MIMD Control Flow on SIMD Graphics Hardware". ACM TACO, Vol. 6, No.2, Article 7, 2009, pp. 1-37.
-
(2009)
ACM TACO
, vol.6
, Issue.2
, pp. 1-37
-
-
Fung, W.W.L.1
Sham, I.2
Yuan, G.3
Aamodt, T.M.4
-
25
-
-
84863011842
-
A Revisit to Cost Aggregation in Stereo Matching: How Far Can We Reduce Its Computational Redundancy?
-
D. Min, J. Lu, and M. Do. "A Revisit to Cost Aggregation in Stereo Matching: How Far Can We Reduce Its Computational Redundancy ?" ICCV, 2011.
-
(2011)
ICCV
-
-
Min, D.1
Lu, J.2
Do, M.3
-
26
-
-
84862069040
-
Real-time Implemenation and Performance Optimization of 3D Sound Localization on GPUs
-
Y. Liang, Z. Cui, S. Zhao, K. Rupnow, Y. Zhang, D. L. Jones, and D. Chen. "Real-time Implemenation and Performance Optimization of 3D Sound Localization on GPUs." DATE, 2012.
-
(2012)
DATE
-
-
Liang, Y.1
Cui, Z.2
Zhao, S.3
Rupnow, K.4
Zhang, Y.5
Jones, D.L.6
Chen, D.7
|