-
1
-
-
77955005234
-
NVIDIA's next generation CUDA compute architecture: Fermi
-
NVIDIA's next generation CUDA compute architecture: Fermi. NVIDIA Corporation, 2009.
-
(2009)
NVIDIA Corporation
-
-
-
2
-
-
77954969653
-
-
ATI. Radeon 9700 Pro. http://mirror.ati.com/products/pc/radeon9700pro, 2002.
-
(2002)
Radeon 9700 Pro
-
-
-
3
-
-
33846535493
-
The M5 simulator: Modeling networked systems
-
N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G. Saidi, and S. K. Reinhardt. The M5 simulator: Modeling networked systems. IEEE Micro, 26(4), 2006.
-
(2006)
IEEE Micro
, vol.26
, pp. 4
-
-
Binkert, N.L.1
Dreslinski, R.G.2
Hsu, L.R.3
Lim, K.T.4
Saidi, A.G.5
Reinhardt, S.K.6
-
4
-
-
51449118065
-
A performance study of general purpose applications on graphics processors using CUDA
-
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, and K. Skadron. A performance study of general purpose applications on graphics processors using CUDA. JPDC, 2008.
-
(2008)
JPDC
-
-
Che, S.1
Boyer, M.2
Meng, J.3
Tarjan, D.4
Sheaffer, J.W.5
Skadron, K.6
-
5
-
-
33845901233
-
Learning-based SMT processor resource distribution via hill-climbing
-
S. Choi and D. Yeung. Learning-based SMT processor resource distribution via hill-climbing. In ISCA, 2006.
-
(2006)
ISCA
-
-
Choi, S.1
Yeung, D.2
-
9
-
-
0036292604
-
Tarantula: A vector extension to the Alpha architecture
-
R. Espasa, F. Ardanaz, J. Emer, S. Felix, J. Gago, R. Gramunt, I. Hern, T. Juan, G. Lowney, M Mattina, and A. Seznec. Tarantula: A vector extension to the Alpha architecture. In ISCA, 2002.
-
(2002)
ISCA
-
-
Espasa, R.1
Ardanaz, F.2
Emer, J.3
Felix, S.4
Gago, J.5
Gramunt, R.6
Hern, I.7
Juan, T.8
Lowney, G.9
Mattina, M.10
Seznec, A.11
-
10
-
-
47349104432
-
Dynamic warp formation and scheduling for efficient GPU control flow
-
W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt. Dynamic warp formation and scheduling for efficient GPU control flow. In MICRO, 2007.
-
(2007)
MICRO
-
-
Fung, W.W.L.1
Sham, I.2
Yuan, G.3
Aamodt, T.M.4
-
11
-
-
34247376580
-
Chip multiprocessing and the cell broadband engine
-
M. Gschwind. Chip multiprocessing and the Cell Broadband Engine. In CF, 2006.
-
(2006)
CF
-
-
Gschwind, M.1
-
12
-
-
0034459255
-
Efficient conditional operations for data-parallel architectures
-
U. J. Kapasi, J. Dally, W, S. Rixner, P. R. Mattson, J. D. Owens, and B. Khailany. Efficient conditional operations for data-parallel architectures. In MICRO 33, 2000.
-
(2000)
MICRO
, vol.33
-
-
Kapasi, U.J.1
Dally, W.J.2
Rixner, S.3
Mattson, P.R.4
Owens, J.D.5
Khailany, B.6
-
14
-
-
4644337990
-
The Vector-Thread architecture
-
R. Krashinsky, C. Batten, M. Hampton, S. Gerding, B. Pharris, J. Casper, and K. Asanovic. The Vector-Thread architecture. In ISCA, 2004.
-
(2004)
ISCA
-
-
Krashinsky, R.1
Batten, C.2
Hampton, M.3
Gerding, S.4
Pharris, B.5
Casper, J.6
Asanovic, K.7
-
16
-
-
77954020709
-
Exploiting inter-thread temporal locality for chip multithreading
-
J. Meng, J. W. Sheaffer, and K. Skadron. Exploiting inter-thread temporal locality for chip multithreading. In PDPS, 2010.
-
(2010)
PDPS
-
-
Meng, J.1
Sheaffer, J.W.2
Skadron, K.3
-
17
-
-
77955007736
-
Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling
-
J. Meng and K. Skadron. Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling. In ICCD, 2007.
-
(2007)
ICCD
-
-
Meng, J.1
Skadron, K.2
-
18
-
-
77954994930
-
Dynamic warp subdivision for integrated branch and memory divergence tolerance: Extended results
-
J. Meng, D. Tarjan, and K. Skadron. Dynamic warp subdivision for integrated branch and memory divergence tolerance: Extended results. U.Va. Tech. Report CS-2010-2015, 2010.
-
(2010)
U.Va. Tech. Report CS-2010-5
-
-
Meng, J.1
Tarjan, D.2
Skadron, K.3
-
19
-
-
47349098275
-
Minebench: A benchmark suite for data mining workloads
-
R. Narayanan, B. Ozisikyilmaz, J. Zambreno, G. Memik, and A. Choudhary. Minebench: A benchmark suite for data mining workloads. IISWC, 2006.
-
(2006)
IISWC
-
-
Narayanan, R.1
Ozisikyilmaz, B.2
Zambreno, J.3
Memik, G.4
Choudhary, A.5
-
20
-
-
47249164386
-
Performance improvement methodology for ClearSpeed's CSX600
-
Y. Nishikawa, M. Koibuchi, M. Yoshimi, K. Miura, and H. Amano. Performance improvement methodology for ClearSpeed's CSX600. In ICPP, 2007.
-
(2007)
ICPP
-
-
Nishikawa, Y.1
Koibuchi, M.2
Yoshimi, M.3
Miura, K.4
Amano, H.5
-
21
-
-
0016994364
-
Implementation of permutation functions in illiac iv-type computers
-
S. E. Orcutt. Implementation of permutation functions in illiac iv-type computers. IEEE Trans. Comput., 25(9), 1976.
-
(1976)
IEEE Trans. Comput.
, vol.25
, pp. 9
-
-
Orcutt, S.E.1
-
25
-
-
0032312385
-
A bandwidth-efficient architecture for media processing
-
S. Rixner, W. J. Dally, U. J. Kapasi, B. Khailany, A. López-Lagunas, P. R. Mattson, and J. D. Owens. A bandwidth-efficient architecture for media processing. In MICRO 31, 1998.
-
(1998)
MICRO
, vol.31
-
-
Rixner, S.1
Dally, W.J.2
Kapasi, U.J.3
Khailany, B.4
López-Lagunas, A.5
Mattson, P.R.6
Owens, J.D.7
-
26
-
-
0017922490
-
The CRAY-1 computer system
-
R. M. Russell. The CRAY-1 computer system. Commun. ACM, 21(1), 1978.
-
(1978)
Commun. ACM
, vol.21
, pp. 1
-
-
Russell, R.M.1
-
27
-
-
49249086142
-
Larrabee: A many-core ×86 architecture for visual computing
-
L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan. Larrabee: a many-core ×86 architecture for visual computing. ACM Trans. Graph., 27(3), 2008.
-
(2008)
ACM Trans. Graph.
, vol.27
, Issue.3
-
-
Seiler, L.1
Carmean, D.2
Sprangle, E.3
Forsyth, T.4
Abrash, M.5
Dubey, P.6
Junkins, S.7
Lake, A.8
Sugerman, J.9
Cavin, R.10
Espasa, R.11
Grochowski, E.12
Juan, T.13
Hanrahan, P.14
-
28
-
-
0030644231
-
A mechanism for SIMD execution of SPMD programs
-
Y. Takahashi. A mechanism for SIMD execution of SPMD programs. In HPC-ASIA, 1997.
-
(1997)
HPC-ASIA
-
-
Takahashi, Y.1
-
29
-
-
0035178105
-
Cost-effective hardware acceleration of multimedia applications
-
D. Talla and L. K. John. Cost-effective hardware acceleration of multimedia applications. In ICCD, 2001.
-
(2001)
ICCD
-
-
Talla, D.1
John, L.K.2
-
30
-
-
74049151553
-
Increasing memory miss tolerance for SIMD cores
-
D. Tarjan, J. Meng, and K. Skadron. Increasing memory miss tolerance for SIMD cores. In SC, 2009.
-
(2009)
SC
-
-
Tarjan, D.1
Meng, J.2
Skadron, K.3
-
31
-
-
0035696665
-
Handling long-latency loads in a simultaneous multithreading processor
-
D. M. Tullsen and J. A. Brown. Handling long-latency loads in a simultaneous multithreading processor. In MICRO 34, 2001.
-
(2001)
MICRO
, vol.34
-
-
Tullsen, D.M.1
Brown, J.A.2
-
32
-
-
0029194459
-
The SPLASH-2 programs: Characterization and methodological considerations
-
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. ISCA, 1995.
-
(1995)
ISCA
-
-
Woo, S.C.1
Ohara, M.2
Torrie, E.3
Singh, J.P.4
Gupta, A.5
|