-
1
-
-
70450183916
-
Understanding the Efficiency of Ray Traversal on GPUs
-
T. Aila and S. Laine. Understanding the Efficiency of Ray Traversal on GPUs. In HPG '09, 2009.
-
(2009)
HPG '09
-
-
Aila, T.1
Laine, S.2
-
4
-
-
0015330108
-
The Illiac IV System
-
apr.
-
W. Bouknight et al. The Illiac IV System. Proceedings of the IEEE, 60(4):369-388, apr. 1972.
-
(1972)
Proceedings of the IEEE
, vol.60
, Issue.4
, pp. 369-388
-
-
Bouknight, W.1
-
5
-
-
70649092154
-
Rodinia: A Benchmark Suite for Heterogeneous Computing
-
S. Che et al. Rodinia: A Benchmark Suite for Heterogeneous Computing. In Int'l Symp. on Workload Characterization (IISWC), pages 44-54, 2009.
-
(2009)
Int'l. Symp. on Workload Characterization (IISWC)
, pp. 44-54
-
-
Che, S.1
-
6
-
-
49249135216
-
Convergence of Recognition, Mining, and Synthesis Workloads and its Implications
-
May
-
Y.-K. Chen et al. Convergence of Recognition, Mining, and Synthesis Workloads and its Implications. Proceedings of the IEEE, 96(5), May 2008.
-
(2008)
Proceedings of the IEEE
, vol.96
, Issue.5
-
-
Chen, Y.-K.1
-
7
-
-
79955888325
-
-
United States Patent #7,353,369: System and Method for Managing Divergent Threads in a SIMD Architecture (Assignee NVIDIA Corp.), April
-
B. W. Coon et al. United States Patent #7,353,369: System and Method for Managing Divergent Threads in a SIMD Architecture (Assignee NVIDIA Corp.), April 2008.
-
(2008)
-
-
Coon, B.W.1
-
8
-
-
79955922544
-
-
United States Patent #7,434,032: Tracking Register Usage DuringMultithreaded Processing Using a Scorebard having Separate Memory Regions and Storing Sequential Register Size Indicators (Assignee NVIDIA Corp.), October
-
B. W. Coon et al. United States Patent #7,434,032: Tracking Register Usage DuringMultithreaded Processing Using a Scorebard having Separate Memory Regions and Storing Sequential Register Size Indicators (Assignee NVIDIA Corp.), October 2008.
-
(2008)
-
-
Coon, B.W.1
-
10
-
-
68549096107
-
Dynamic Warp Formation: Efficient MIMD Control Flow on SIMD Graphics Hardware
-
W. Fung et al. Dynamic Warp Formation: Efficient MIMD Control Flow on SIMD Graphics Hardware. ACM Trans. Archit. Code Optim., 6(2):1-37, 2009.
-
(2009)
ACM Trans. Archit. Code Optim.
, vol.6
, Issue.2
, pp. 1-37
-
-
Fung, W.1
-
11
-
-
78650817529
-
Size Matters: Space/Time Tradeoffs to Improve GPGPU Applications Performance
-
A. Gharaibeh and M. Ripeanu. Size Matters: Space/Time Tradeoffs to Improve GPGPU Applications Performance. In IEEE/ACM Supercomputing (SC 2010), 2010.
-
(2010)
IEEE/ACM Supercomputing (SC 2010)
-
-
Gharaibeh, A.1
Ripeanu, M.2
-
12
-
-
0034459255
-
Efficient Conditional Operations for Data- Parallel Architectures
-
U. J. Kapasi et al. Efficient Conditional Operations for Data- Parallel Architectures. In Proc. 33rd IEEE/ACM Int'l Symp. on Microarchitecture, pages 159-170, 2000.
-
(2000)
Proc. 33rd IEEE/ACM Int'l. Symp. on Microarchitecture
, pp. 159-170
-
-
Kapasi, U.J.1
-
13
-
-
70450237431
-
Rigel: An Architecture and Scalable Programming Interface for a 1000-core Accelerator
-
J. H. Kelm et al. Rigel: An Architecture and Scalable Programming Interface for a 1000-core Accelerator. In Proc. 36th Int'l Symp. on Computer Arch. (ISCA), pages 140-151, 2009.
-
(2009)
Proc. 36th Int'l. Symp. on Computer Arch. (ISCA)
, pp. 140-151
-
-
Kelm, J.H.1
-
16
-
-
44849137198
-
NVIDIA Tesla: A Unified Graphics and Computing Architecture
-
March-April
-
E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym. NVIDIA Tesla: A Unified Graphics and Computing Architecture. Micro, IEEE, 28(2):39-55, March-April 2008.
-
(2008)
Micro, IEEE
, vol.28
, Issue.2
, pp. 39-55
-
-
Lindholm, E.1
Nickolls, J.2
Oberman, S.3
Montrym, J.4
-
17
-
-
79955893984
-
-
United States Patent Application #2010/0122067: Across-Thread Out-of-Order Instruction Dispatch in a Multithreaded Microprocessor (Assignee NVIDIA Corp.), May
-
E. Lindholm et al. United States Patent Application #2010/0122067: Across-Thread Out-of-Order Instruction Dispatch in a Multithreaded Microprocessor (Assignee NVIDIA Corp.), May 2010.
-
(2010)
-
-
Lindholm, E.1
-
19
-
-
66749170578
-
Tradeoffs in designing accelerator architectures for visual computing
-
A. Mahesri et al. Tradeoffs in designing accelerator architectures for visual computing. In Proc. 41st IEEE/ACM Int'l Symp. on Microarchitecture, pages 164-175, 2008.
-
(2008)
Proc. 41st IEEE/ACM Int'l. Symp. on Microarchitecture
, pp. 164-175
-
-
Mahesri, A.1
-
20
-
-
77954976292
-
Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance
-
J. Meng et al. Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance. In Proc. 37th Int'l Symp. on Computer Architecture (ISCA), pages 235- 246, 2010.
-
(2010)
Proc. 37th Int'l. Symp. on Computer Architecture (ISCA)
, pp. 235-246
-
-
Meng, J.1
-
21
-
-
78651550268
-
Scalable Parallel Programming with CUDA
-
Mar.-Apr.
-
J. Nickolls et al. Scalable Parallel Programming with CUDA. ACM Queue, 6(2):40-53, Mar.-Apr. 2008.
-
(2008)
ACM Queue
, vol.6
, Issue.2
, pp. 40-53
-
-
Nickolls, J.1
-
24
-
-
35948991669
-
-
NVIDIA Corporation. 3.1 edition
-
NVIDIA Corporation. NVIDIA CUDA Programming Guide, 3.1 edition, 2010.
-
(2010)
NVIDIA CUDA Programming Guide
-
-
-
26
-
-
38849131252
-
High-Throughput Sequence Alignment Using Graphics Processing Units
-
M. Schatz et al. High-Throughput Sequence Alignment Using Graphics Processing Units. BMC Bioinformatics, 8(1):474, 2007.
-
(2007)
BMC Bioinformatics
, vol.8
, Issue.1
, pp. 474
-
-
Schatz, M.1
|