-
1
-
-
84882609297
-
-
Advanced Micro Devices, Inc. ATI Stream Technology. http://www.amd.com/ stream.
-
ATI Stream Technology
-
-
-
2
-
-
0025431380
-
April: A processor architecture for multiprocessing
-
A. Agarwal et al. April: a processor architecture for multiprocessing. In ISCA-17, 1990.
-
(1990)
ISCA-17
-
-
Agarwal, A.1
-
3
-
-
0033895964
-
Speed and power scaling of SRAMs
-
Feb.
-
B. Amrutur and M. Horowitz. Speed and power scaling of SRAMs. IEEE JSCC, 35(2):175-185, Feb. 2000.
-
(2000)
IEEE JSCC
, vol.35
, Issue.2
, pp. 175-185
-
-
Amrutur, B.1
Horowitz, M.2
-
4
-
-
0015330108
-
The Illiac IV system
-
Apr.
-
W. J. Bouknight et al. The Illiac IV system. Proceedings of the IEEE, 60(4):369-388, Apr. 1972.
-
(1972)
Proceedings of the IEEE
, vol.60
, Issue.4
, pp. 369-388
-
-
Bouknight, W.J.1
-
5
-
-
70649092154
-
Rodinia: A benchmark suite for heterogeneous computing
-
S. Che et al. Rodinia: A benchmark suite for heterogeneous computing. In IISWC, 2009.
-
(2009)
IISWC
-
-
Che, S.1
-
6
-
-
79955923056
-
Thread block compaction for efficient simt control flow
-
W. W. L. Fung and T. Aamodt. Thread block compaction for efficient simt control flow. In HPCA-17, 2011.
-
(2011)
HPCA-17
-
-
Fung, W.W.L.1
Aamodt, T.2
-
7
-
-
47349104432
-
Dynamic warp formation and scheduling for efficient GPU control flow
-
W. W. L. Fung et al. Dynamic warp formation and scheduling for efficient GPU control flow. In MICRO-40, 2007.
-
(2007)
MICRO-40
-
-
Fung, W.W.L.1
-
8
-
-
68549096107
-
Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware
-
June
-
W. W. L. Fung et al. Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware. ACM TACO, 6(2):1-37, June 2009.
-
(2009)
ACM TACO
, vol.6
, Issue.2
, pp. 1-37
-
-
Fung, W.W.L.1
-
9
-
-
65349159175
-
Compute unified device architecture application suitability
-
may-jun
-
W.-M. Hwu et al. Compute unified device architecture application suitability. Computing in Science Engineering, may-jun 2009.
-
(2009)
Computing in Science Engineering
-
-
Hwu, W.-M.1
-
10
-
-
2342652812
-
Stream register files with indexed access
-
N. Jayasena et al. Stream register files with indexed access. In HPCA-10, 2004.
-
(2004)
HPCA-10
-
-
Jayasena, N.1
-
11
-
-
77954999879
-
Efficient conditional operations for data-parallel architectures
-
U. Kapasi et al. Efficient conditional operations for data-parallel architectures. In MICRO-33, 2000.
-
(2000)
MICRO-33
-
-
Kapasi, U.1
-
12
-
-
0036398375
-
Vlsi design and verification of the imagine processor
-
B. Khailany et al. Vlsi design and verification of the imagine processor. In ICCD, 2002.
-
(2002)
ICCD
-
-
Khailany, B.1
-
13
-
-
84863372818
-
-
Khronos Group. OpenCL. http://www.khronos.org/opencl.
-
OpenCL
-
-
-
15
-
-
4644337990
-
The vector-thread architecture
-
R. Krashinsky et al. The vector-thread architecture. In ISCA-31, 2004.
-
(2004)
ISCA-31
-
-
Krashinsky, R.1
-
17
-
-
77954976292
-
Dynamic warp subdivision for integrated branch and memory divergence tolerance
-
J. Meng et al. Dynamic warp subdivision for integrated branch and memory divergence tolerance. In ISCA-37, 2010.
-
(2010)
ISCA-37
-
-
Meng, J.1
-
18
-
-
47349098275
-
MineBench: A benchmark suite for data mining workloads
-
R. Narayanan et al. MineBench: A benchmark suite for data mining workloads. In IISWC, 2006.
-
(2006)
IISWC
-
-
Narayanan, R.1
-
19
-
-
84863390635
-
-
NVIDIA. CUDA GPU Computing SDK. http://developer.nvidia.com/gpu- computing-sdk.
-
CUDA GPU Computing SDK
-
-
-
22
-
-
0017922490
-
The CRAY-1 computer system
-
Jan.
-
R. M. Russell. The CRAY-1 computer system. Communications of the ACM, 21(1):63-72, Jan. 1978.
-
(1978)
Communications of the ACM
, vol.21
, Issue.1
, pp. 63-72
-
-
Russell, R.M.1
-
23
-
-
79959466764
-
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
-
S. Ryoo et al. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In PPoPP, 2008.
-
PPoPP, 2008
-
-
Ryoo, S.1
-
24
-
-
0018282603
-
A pipelined shared resource MIMD computer
-
B. J. Smith. A pipelined shared resource MIMD computer. In ICPP, 1978.
-
(1978)
ICPP
-
-
Smith, B.J.1
-
25
-
-
0033727057
-
Vector instruction set support for conditional operations
-
J. E. Smith et al. Vector instruction set support for conditional operations. In ISCA-27, 2000.
-
(2000)
ISCA-27
-
-
Smith, J.E.1
-
26
-
-
84863352139
-
Parallel operation in the control data 6600
-
J. E. Thornton. Parallel operation in the control data 6600. In AFIPS, 1965.
-
(1965)
AFIPS
-
-
Thornton, J.E.1
-
27
-
-
0035696665
-
Handling long-latency loads in a simultaneous multithreading processor
-
D. M. Tullsen and J. A. Brown. Handling long-latency loads in a simultaneous multithreading processor. In MICRO-34, 2001.
-
(2001)
MICRO-34
-
-
Tullsen, D.M.1
Brown, J.A.2
|