-
1
-
-
34547471544
-
Design tradeoffs for tiled CMP on-chip networks
-
J. Balfour and W. J. Dally. Design tradeoffs for tiled CMP on-chip networks. In ICS'06, 2006.
-
(2006)
ICS'06
-
-
Balfour, J.1
Dally, W.J.2
-
2
-
-
0024770039
-
Scans as primitive parallel operations
-
G. E. Blelloch. Scans as primitive parallel operations. IEEE Trans. Comput., 38(11), 1989.
-
(1989)
IEEE Trans. Comput
, vol.38
, Issue.11
-
-
Blelloch, G.E.1
-
3
-
-
0029666646
-
Memory bandwidth limitations of future microprocessors
-
D. Burger, J. R. Goodman, and A. Kägi. Memory bandwidth limitations of future microprocessors. In ISCA'96, 1996.
-
(1996)
ISCA'96
-
-
Burger, D.1
Goodman, J.R.2
Kägi, A.3
-
4
-
-
0029209574
-
A hierarchical task queue organization for shared-memory multiprocessor systems
-
S. P. Dandamudi and P. S. P. Cheng. A hierarchical task queue organization for shared-memory multiprocessor systems. IEEE Trans. Parallel Distrib. Syst., 6(1), 1995.
-
(1995)
IEEE Trans. Parallel Distrib. Syst
, vol.6
, Issue.1
-
-
Dandamudi, S.P.1
Cheng, P.S.P.2
-
5
-
-
80052037090
-
Poster session - N-body simulation on GPUs
-
E. Elsen, M. Houston, V. Vishal, E. Darve, P. Hanrahan, and V. Pande. Poster session - N-body simulation on GPUs. In SC'06, 2006.
-
(2006)
SC'06
-
-
Elsen, E.1
Houston, M.2
Vishal, V.3
Darve, E.4
Hanrahan, P.5
Pande, V.6
-
6
-
-
34548207355
-
Sequoia: Programming the memory hierarchy
-
K. Fatahalian, D. R. Horn, T. J. Knight, L. Leem, M. Houston, J. Y. Park, M. Erez, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: programming the memory hierarchy. In SC'06, 2006.
-
(2006)
SC'06
-
-
Fatahalian, K.1
Horn, D.R.2
Knight, T.J.3
Leem, L.4
Houston, M.5
Park, J.Y.6
Erez, M.7
Ren, M.8
Aiken, A.9
Dally, W.J.10
Hanrahan, P.11
-
7
-
-
56649087761
-
GPUs: A closer look
-
K. Fatahalian and M. Houston. GPUs: a closer look. Queue, 6(2):18-28, 2008.
-
(2008)
Queue
, vol.6
, Issue.2
, pp. 18-28
-
-
Fatahalian, K.1
Houston, M.2
-
8
-
-
70450264487
-
Cedar: A large scale multiprocessor
-
D. Gajski, D. Kuck, D. Lawrie, and A. Sameh. Cedar: a large scale multiprocessor. SIGARCH Comput. Archit. News, 11(1):7-11, 1983.
-
(1983)
SIGARCH Comput. Archit. News
, vol.11
, Issue.1
, pp. 7-11
-
-
Gajski, D.1
Kuck, D.2
Lawrie, D.3
Sameh, A.4
-
9
-
-
70450275953
-
The NYU ultracomputer
-
A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir. The NYU ultracomputer. In ISCA'82, 1982.
-
(1982)
ISCA'82
-
-
Gottlieb, A.1
Grishman, R.2
Kruskal, C.P.3
McAuliffe, K.P.4
Rudolph, L.5
Snir, M.6
-
10
-
-
34247376580
-
Chip multiprocessing and the Cell broadband engine
-
New York, NY, USA
-
M. Gschwind. Chip multiprocessing and the Cell broadband engine. In CF'06, pages 1-8, New York, NY, USA, 2006.
-
(2006)
CF'06
, pp. 1-8
-
-
Gschwind, M.1
-
11
-
-
33847108581
-
Hierarchically tiled arrays for parallelism and locality
-
April
-
J. Guo, G. Bikshandi, D. Hoeflinger, G. Almasi, B. Fraguela, M. Garzaran, D. Padua, and C. von Praun. Hierarchically tiled arrays for parallelism and locality. In Parallel and Distributed Processing Symposium, April 2006.
-
(2006)
Parallel and Distributed Processing Symposium
-
-
Guo, J.1
Bikshandi, G.2
Hoeflinger, D.3
Almasi, G.4
Fraguela, B.5
Garzaran, M.6
Padua, D.7
von Praun, C.8
-
12
-
-
70450249565
-
-
Intel. Intel microprocessor export compliance metrics, Februrary 2009.
-
Intel. Intel microprocessor export compliance metrics, Februrary 2009.
-
-
-
-
13
-
-
0041562664
-
Programmable stream processors
-
U. J. Kapasi, S. Rixner, W. J. Dally, B. Khailany, J. H. Ahn, P. Mattson, and J. D. Owens. Programmable stream processors. Computer, 36(8), 2003.
-
(2003)
Computer
, vol.36
, Issue.8
-
-
Kapasi, U.J.1
Rixner, S.2
Dally, W.J.3
Khailany, B.4
Ahn, J.H.5
Mattson, P.6
Owens, J.D.7
-
14
-
-
35348855586
-
Carbon: Architectural support for fine-grained parallelism on chip multiprocessors
-
New York, NY, USA
-
S. Kumar, C. J. Hughes, and A. Nguyen. Carbon: architectural support for fine-grained parallelism on chip multiprocessors. In ISCA'07, pages 162-173, New York, NY, USA, 2007.
-
(2007)
ISCA'07
, pp. 162-173
-
-
Kumar, S.1
Hughes, C.J.2
Nguyen, A.3
-
15
-
-
16144366475
-
The network architecture of the connection machine CM-5
-
C. E. Leiserson, Z. S. Abuhamdeh, D. C. Douglas, C. R. Feynman, M. N. Ganmukhi, J. V. Hill, W. D. Hillis, B. C. Kuszmaul, M. A. S. Pierre, D. S. Wells, M. C. Wong-Chan, S.-W. Yang, and R. Zak. The network architecture of the connection machine CM-5. J. Parallel Distrib. Comput., 33(2), 1996.
-
(1996)
J. Parallel Distrib. Comput
, vol.33
, Issue.2
-
-
Leiserson, C.E.1
Abuhamdeh, Z.S.2
Douglas, D.C.3
Feynman, C.R.4
Ganmukhi, M.N.5
Hill, J.V.6
Hillis, W.D.7
Kuszmaul, B.C.8
Pierre, M.A.S.9
Wells, D.S.10
Wong-Chan, M.C.11
Yang, S.-W.12
Zak, R.13
-
16
-
-
44849137198
-
NVIDIA tesla: A unified graphics and computing architecture
-
E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym. NVIDIA tesla: A unified graphics and computing architecture. IEEE Micro, 28(2), 2008.
-
(2008)
IEEE Micro
, vol.28
, Issue.2
-
-
Lindholm, E.1
Nickolls, J.2
Oberman, S.3
Montrym, J.4
-
17
-
-
66749170578
-
Tradeoffs in designing accelerator architectures for visual computing
-
A. Mahesri, D. Johnson, N. Crago, and S. J. Patel. Tradeoffs in designing accelerator architectures for visual computing. In MICRO'08, 2008.
-
(2008)
MICRO'08
-
-
Mahesri, A.1
Johnson, D.2
Crago, N.3
Patel, S.J.4
-
18
-
-
84976718540
-
Algorithms for scalable synchronization on shared-memory multiprocessors
-
J. M. Mellor-Crummey and M. L. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst., 9(1):21-65, 1991.
-
(1991)
ACM Trans. Comput. Syst
, vol.9
, Issue.1
, pp. 21-65
-
-
Mellor-Crummey, J.M.1
Scott, M.L.2
-
20
-
-
78651550268
-
Scalable parallel programming with CUDA
-
J. Nickolls, I. Buck, M. Garland, and K. Skadron. Scalable parallel programming with CUDA. Queue, 6(2), 2008.
-
(2008)
Queue
, vol.6
, Issue.2
-
-
Nickolls, J.1
Buck, I.2
Garland, M.3
Skadron, K.4
-
21
-
-
33947588048
-
A survey of general-purpose computation on graphics hardware
-
J. D. Owens, D. Luebke, N. Govindaraju, mark Harris, J. Krueger, A. E. Lefohn, and T. J. Purcell. A survey of general-purpose computation on graphics hardware. Computer Graphics Forum, 26(1):80-113, 2007.
-
(2007)
Computer Graphics Forum
, vol.26
, Issue.1
, pp. 80-113
-
-
Owens, J.D.1
Luebke, D.2
Govindaraju, N.3
mark Harris4
Krueger, J.5
Lefohn, A.E.6
Purcell, T.J.7
-
22
-
-
70349285149
-
A 45nm 8-core enterprise Xeon processor
-
Februrary
-
S. Rusu, S. Tam, H. Muljono, J. Stinson, D. Ayers, R. V. J. Chang, M. Ratta, and S. Kottapalli. A 45nm 8-core enterprise Xeon processor. In ISSCC'09, Februrary 2009.
-
(2009)
ISSCC'09
-
-
Rusu, S.1
Tam, S.2
Muljono, H.3
Stinson, J.4
Ayers, D.5
Chang, R.V.J.6
Ratta, M.7
Kottapalli, S.8
-
23
-
-
40349086066
-
Exploiting fine-grained data parallelism with chip multiprocessors and fast barriers
-
J. Sampson, R. Gonzalez, J.-F. Collard, N. P. Jouppi, M. Schlansker, and B. Calder. Exploiting fine-grained data parallelism with chip multiprocessors and fast barriers. In MICRO'06, 2006.
-
(2006)
MICRO'06
-
-
Sampson, J.1
Gonzalez, R.2
Collard, J.-F.3
Jouppi, N.P.4
Schlansker, M.5
Calder, B.6
-
24
-
-
0030259457
-
Synchronization and communication in the T3E multiprocessor
-
S. L. Scott. Synchronization and communication in the T3E multiprocessor. In ASPLOS'96, pages 26-36, 1996.
-
(1996)
ASPLOS'96
, pp. 26-36
-
-
Scott, S.L.1
-
25
-
-
49249086142
-
Larrabee: A many-core x86 architecture for visual computing
-
L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan. Larrabee: a many-core x86 architecture for visual computing. ACM Trans. Graph., 27(3):1-15, 2008.
-
(2008)
ACM Trans. Graph
, vol.27
, Issue.3
, pp. 1-15
-
-
Seiler, L.1
Carmean, D.2
Sprangle, E.3
Forsyth, T.4
Abrash, M.5
Dubey, P.6
Junkins, S.7
Lake, A.8
Sugerman, J.9
Cavin, R.10
Espasa, R.11
Grochowski, E.12
Juan, T.13
Hanrahan, P.14
-
26
-
-
0009384049
-
The architecture of HEP
-
Massachusetts Institute of Technology
-
B. Smith. The architecture of HEP. In On Parallel MIMD computation, pages 41-55. Massachusetts Institute of Technology, 1985.
-
(1985)
On Parallel MIMD computation
, pp. 41-55
-
-
Smith, B.1
-
27
-
-
35948963714
-
Accelerating molecular modeling applications with graphics processors
-
J. E. Stone, J. C. Phillips, P. L. Freddolino, D. J. Hardy, L. G. Trabuco, and K. Schulten. Accelerating molecular modeling applications with graphics processors. Journal of Computational Chemistry, 28:2618-2640, 2007.
-
(2007)
Journal of Computational Chemistry
, vol.28
, pp. 2618-2640
-
-
Stone, J.E.1
Phillips, J.C.2
Freddolino, P.L.3
Hardy, D.J.4
Trabuco, L.G.5
Schulten, K.6
-
28
-
-
51449100575
-
Accelerating advanced MRI reconstructions on GPUs
-
S. S. Stone, J. P. Haldar, S. C. Tsao, W. m. W. Hwu, B. P. Sutton, and Z. P. Liang. Accelerating advanced MRI reconstructions on GPUs. J. Parallel Distrib. Comput., 68(10):1307-1318, 2008.
-
(2008)
J. Parallel Distrib. Comput
, vol.68
, Issue.10
, pp. 1307-1318
-
-
Stone, S.S.1
Haldar, J.P.2
Tsao, S.C.3
Hwu, W.M.W.4
Sutton, B.P.5
Liang, Z.P.6
-
30
-
-
49549084422
-
A third-generation 65nm 16-core 32-thread plus 32-scout-thread CMT SPARC processor
-
Feb
-
M. Tremblay and S. Chaudhry. A third-generation 65nm 16-core 32-thread plus 32-scout-thread CMT SPARC processor. In ISSCC'08, Feb. 2008.
-
(2008)
ISSCC'08
-
-
Tremblay, M.1
Chaudhry, S.2
-
31
-
-
0025467711
-
A bridging model for parallel computation
-
L. G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103-111, 1990.
-
(1990)
Communications of the ACM
, vol.33
, Issue.8
, pp. 103-111
-
-
Valiant, L.G.1
|