-
1
-
-
0025404493
-
Executing a program on the MIT tagged-token dataflow architecture
-
Mar
-
Arvind and R. Nikhil, "Executing a program on the MIT tagged-token dataflow architecture"IEEE Trans. on Computers, vol. 39, no. 3, pp. 300-318, Mar 1990
-
(1990)
IEEE Trans. on Computers
, vol.39
, Issue.3
, pp. 300-318
-
-
Arvind1
Nikhil, R.2
-
2
-
-
70349169075
-
Analyzing CUDA workloads using a detailed GPU simulator
-
A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing CUDA workloads using a detailed GPU simulator." in ISPASS. IEEE, 2009, pp. 163-174
-
(2009)
ISPASS IEEE
, pp. 163-174
-
-
Bakhoda, A.1
Yuan, G.L.2
Fung, W.W.L.3
Wong, H.4
Aamodt, T.M.5
-
3
-
-
0034592554
-
Adapting software pipelining for reconfigurable computing
-
T. J. Callahan and J. Wawrzynek, "Adapting software pipelining for reconfigurable computing"in Intl. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems, 2000, pp. 57-64
-
(2000)
Intl. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems
, pp. 57-64
-
-
Callahan, T.J.1
Wawrzynek, J.2
-
4
-
-
70649092154
-
Rodinia: A benchmark suite for heterogeneous computing
-
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron, "Rodinia: A benchmark suite for heterogeneous computing"in IEEE Intl. Symp. on Workload Characterization (IISWC), ser. IISWC 09, 2009, pp. 44-54
-
(2009)
IEEE Intl. Symp. on Workload Characterization (IISWC), Ser. IISWC 09
, pp. 44-54
-
-
Che, S.1
Boyer, M.2
Meng, J.3
Tarjan, D.4
Sheaffer, J.W.5
Lee, S.-H.6
Skadron, K.7
-
6
-
-
0026243790
-
Efficiently computing static single assignment form and the control dependence graph
-
R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck, "Efficiently computing static single assignment form and the control dependence graph"ACM Trans. on Programming Languages and Systems, vol. 13, no. 4, pp. 451-490, 1991
-
(1991)
ACM Trans. on Programming Languages and Systems
, vol.13
, Issue.4
, pp. 451-490
-
-
Cytron, R.1
Ferrante, J.2
Rosen, B.K.3
Wegman, M.N.4
Zadeck, F.K.5
-
8
-
-
0025750643
-
Properties and performance of folded hypercubes
-
Jan
-
A. El-Amawy and S. Latifi, "Properties and performance of folded hypercubes"IEEE Trans. on Parallel and Distributed Systems, vol. 2, no. 1, pp. 31-42, Jan. 1991
-
(1991)
IEEE Trans. on Parallel and Distributed Systems
, vol.2
, Issue.1
, pp. 31-42
-
-
El-Amawy, A.1
Latifi, S.2
-
9
-
-
0007997616
-
ARB: A hardware mechanism for dynamic reordering of memory references
-
M. Franklin and G. S. Sohi, "ARB: A hardware mechanism for dynamic reordering of memory references"IEEE Trans. on Computers, vol. 45, no. 5, pp. 552-571, 1996
-
(1996)
IEEE Trans. on Computers
, vol.45
, Issue.5
, pp. 552-571
-
-
Franklin, M.1
Sohi, G.S.2
-
10
-
-
0034174187
-
PipeRench: A reconfigurable architecture and compiler
-
Apr
-
S. C. Goldstein, H. Schmit, M. Budiu, S. Cadambi, M. Moe, and R. R. Taylor, "PipeRench: A reconfigurable architecture and compiler"IEEE Computer, vol. 33, no. 4, pp. 70-77, Apr. 2000
-
(2000)
IEEE Computer
, vol.33
, Issue.4
, pp. 70-77
-
-
Goldstein, S.C.1
Schmit, H.2
Budiu, M.3
Cadambi, S.4
Moe, M.5
Taylor, R.R.6
-
12
-
-
84863374615
-
Bundled execution of recurring traces for energy-efficient general purpose processing
-
S. Gupta, S. Feng, A. Ansari, S. Mahlke, and D. August, "Bundled execution of recurring traces for energy-efficient general purpose processing"in Intl. Symp. on Microarchitecture (MICRO), 2011, pp. 12-23
-
(2011)
Intl. Symp. on Microarchitecture (MICRO)
, pp. 12-23
-
-
Gupta, S.1
Feng, S.2
Ansari, A.3
Mahlke, S.4
August, D.5
-
13
-
-
0021831531
-
The Manchester prototype dataflow computer
-
J. R. Gurd, C. C. Kirkham, and I. Watson, "The Manchester prototype dataflow computer"Comm. ACM, vol. 28, no. 1, pp. 34-52, 1985
-
(1985)
Comm ACM
, vol.28
, Issue.1
, pp. 34-52
-
-
Gurd, J.R.1
Kirkham, C.C.2
Watson, I.3
-
14
-
-
67650635164
-
Many-core vs many-thread machines: Stay away from the valley
-
Jan
-
Z. Guz, E. Bolotin, I. Keidar, A. Kolodny, A. Mendelson, and U.Weiser, "Many-core vs. many-thread machines: Stay away from the valley"IEEE Computer Architecture Letters, vol. 8, no. 1, pp. 25-28, Jan 2009
-
(2009)
IEEE Computer Architecture Letters
, vol.8
, Issue.1
, pp. 25-28
-
-
Guz, Z.1
Bolotin, E.2
Keidar, I.3
Kolodny, A.4
Mendelson, A.5
Weiser, U.6
-
15
-
-
77954995378
-
Understanding sources of inefficiency in general-purpose chips
-
R. Hameed, W. Qadeer, M. Wachs, O. Azizi, A. Solomatnikov, B. C. Lee, S. Richardson, C. Kozyrakis, and M. Horowitz, "Understanding sources of inefficiency in general-purpose chips"in Intl. Symp. on Computer Architecture (ISCA), 2010, pp. 37-47
-
(2010)
Intl. Symp. on Computer Architecture (ISCA)
, pp. 37-47
-
-
Hameed, R.1
Qadeer, W.2
Wachs, M.3
Azizi, O.4
Solomatnikov, A.5
Lee, B.C.6
Richardson, S.7
Kozyrakis, C.8
Horowitz, M.9
-
17
-
-
79953071805
-
Sponge: Portable stream programming on graphics engines
-
A. H. Hormati, M. Samadi, M. Woh, T. Mudge, and S. Mahlke, "Sponge: portable stream programming on graphics engines"in Intl. Conf. on Arch. Support for Prog. Lang. &Operating Systems (ASPLOS), 2011, pp. 381-392
-
(2011)
Intl. Conf. on Arch. Support for Prog. Lang. &Operating Systems (ASPLOS)
, pp. 381-392
-
-
Hormati, A.H.1
Samadi, M.2
Woh, M.3
Mudge, T.4
Mahlke, S.5
-
18
-
-
84874510087
-
-
Y. Huang, P. Ienne, O. Temam, Y. Chen, and C. Wu, "Elastic CGRAs"in Intl. Symp. on Field Programmable Gate Arrays, 2013, pp. 171-180
-
(2013)
Elastic CGRAs Intl. Symp. on Field Programmable Gate Arrays
, pp. 171-180
-
-
Huang, Y.1
Ienne, P.2
Temam, O.3
Chen, Y.4
Wu, C.5
-
19
-
-
84944414165
-
Runtime power monitoring in high-end processors: Methodology and empirical data
-
C. Isci and M. Martonosi, "Runtime power monitoring in high-end processors: Methodology and empirical data"in Intl. Symp. on Microarchitecture (MICRO), 2003, pp. 93-104
-
(2003)
Intl. Symp. on Microarchitecture (MICRO)
, pp. 93-104
-
-
Isci, C.1
Martonosi, M.2
-
20
-
-
80054875176
-
GPUs and the future of parallel computing
-
S. W. Keckler, W. J. Dally, B. Khailany, M. Garland, and D. Glasco, "GPUs and the future of parallel computing"IEEE Micro, vol. 31, pp. 7-17, 2011
-
(2011)
IEEE Micro
, vol.31
, pp. 7-17
-
-
Keckler, S.W.1
Dally, W.J.2
Khailany, B.3
Garland, M.4
Glasco, D.5
-
21
-
-
84862328133
-
Life after Dennard and how i learned to love the picojoule
-
(keynote)
-
S. Keckler, "Life after Dennard and how I learned to love the picojoule"Intl. Symp. on Microarchitecture (MICRO), 2012, (keynote)
-
(2012)
Intl. Symp. on Microarchitecture (MICRO)
-
-
Keckler, S.1
-
22
-
-
0035271572
-
Imagine: Media processing with streams
-
B. Khailany, W. J. Dally, U. J. Kapasi, P. Mattson, J. Namkoong, J. D. Owens, B. Towles, A. Chang, and S. Rixner, "Imagine: Media processing with streams"IEEE Micro, vol. 21, pp. 35-46, 2001
-
(2001)
IEEE Micro
, vol.21
, pp. 35-46
-
-
Khailany, B.1
Dally, W.J.2
Kapasi, U.J.3
Mattson, P.4
Namkoong, J.5
Owens, J.D.6
Towles, B.7
Chang, A.8
Rixner, S.9
-
24
-
-
0031599788
-
Space-time scheduling of instruction-level parallelism on a raw machine
-
W. Lee, R. Barua, M. Frank, D. Srikrishna, J. Babb, V. Sarkar, and S. Amarasinghe, "Space-time scheduling of instruction-level parallelism on a raw machine"in Intl. Conf. on Arch. Support for Prog. Lang. &Operating Systems (ASPLOS), 1998, pp. 46-57
-
(1998)
Intl. Conf. on Arch. Support for Prog. Lang. &Operating Systems (ASPLOS)
, pp. 46-57
-
-
Lee, W.1
Barua, R.2
Frank, M.3
Srikrishna, D.4
Babb, J.5
Sarkar, V.6
Amarasinghe, S.7
-
25
-
-
84881151222
-
GPUWattch: Enabling energy optimizations in GPGPUs
-
J. Leng, T. Hetherington, A. ElTantawy, S. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi, "GPUWattch: enabling energy optimizations in GPGPUs"in Intl. Symp. on Computer Architecture (ISCA), 2013, pp. 487-498
-
(2013)
Intl. Symp. on Computer Architecture (ISCA)
, pp. 487-498
-
-
Leng, J.1
Hetherington, T.2
Eltantawy, A.3
Gilani, S.4
Kim, N.S.5
Aamodt, T.M.6
Reddi, V.J.7
-
26
-
-
44849137198
-
NVIDIA Tesla: A unified graphics and computing architecture
-
E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym, "NVIDIA Tesla: A unified graphics and computing architecture"IEEE Micro, vol. 28, no. 2, pp. 39-55, 2008
-
(2008)
IEEE Micro
, vol.28
, Issue.2
, pp. 39-55
-
-
Lindholm, E.1
Nickolls, J.2
Oberman, S.3
Montrym, J.4
-
27
-
-
34547456544
-
Tartan: Evaluating spatial computation for whole program execution
-
Oct
-
M. Mishra, T. J. Callahan, T. Chelcea, G. Venkataramani, M. Budiu, and S. C. Goldstein, "Tartan: Evaluating spatial computation for whole program execution"in Intl. Conf. on Arch. Support for Prog. Lang. &Operating Systems (ASPLOS), Oct 2006, pp. 163-174
-
(2006)
Intl. Conf. on Arch. Support for Prog. Lang. &Operating Systems (ASPLOS)
, pp. 163-174
-
-
Mishra, M.1
Callahan, T.J.2
Chelcea, T.3
Venkataramani, G.4
Budiu, M.5
Goldstein, S.C.6
-
28
-
-
77951154340
-
The GPU computing era
-
J. Nickolls and W. Dally, "The GPU computing era"IEEE Micro, vol. 30, no. 2, pp. 56-69, 2010
-
(2010)
IEEE Micro
, vol.30
, Issue.2
, pp. 56-69
-
-
Nickolls, J.1
Dally, W.2
-
29
-
-
78651550268
-
Scalable parallel programming with CUDA
-
Garland, and K. Skadron
-
J. Nickolls, I. Buck, M. Garland, and K. Skadron, "Scalable parallel programming with CUDA"ACM Queue, vol. 6, no. 2, pp. 40-53, 2008
-
(2008)
ACM Queue
, vol.6
, Issue.2
, pp. 40-53
-
-
Nickolls, J.1
Buck, M.I.2
-
30
-
-
84905492307
-
-
Nvidia, Fermi Compute Architecture Whitepaper
-
Nvidia, Fermi Compute Architecture Whitepaper
-
-
-
-
31
-
-
84905504780
-
-
NVIDIA Tegra 4 family CPU architecture: 4-PLUS-1 quad core
-
NVIDIA, "NVIDIA Tegra 4 family CPU architecture: 4-PLUS-1 quad core"2013 [Online]. Available: http://www.nvidia.com/docs/IO/116757/NVIDIA- Quad-A15-whitepaper-FINALv2.pdf
-
(2013)
NVIDIA
-
-
-
32
-
-
70349100958
-
-
OpenCL Working Group, Oct 2009 ver 10
-
OpenCL Working Group, "The OpenCL specification"www.khronos. org/opencl, Oct 2009, ver. 1.0
-
The OpenCL Specification
-
-
-
33
-
-
84963624364
-
The program dependence web: A representation supporting control-, data-, and demanddriven interpretation of imperative languages
-
K. J. Ottenstein, R. A. Ballance, and A. B. MacCabe, "The program dependence web: a representation supporting control-, data-, and demanddriven interpretation of imperative languages"in Intl. Conf. on Programming Language Design and Impl. (PLDI), 1990, pp. 257-271
-
(1990)
Intl. Conf. on Programming Language Design and Impl. (PLDI)
, pp. 257-271
-
-
Ottenstein, K.J.1
Ballance, R.A.2
Maccabe, A.B.3
-
35
-
-
0017922490
-
The CRAY-1 computer system
-
Jan
-
R. M. Russell, "The CRAY-1 computer system"Comm. ACM, vol. 21, no. 1, pp. 63-72, Jan. 1978
-
(1978)
Comm ACM
, vol.21
, Issue.1
, pp. 63-72
-
-
Russell, R.M.1
-
36
-
-
0037669851
-
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
-
K. Sankaralingam, R. Nagarajan, H. Liu, C. Kim, J. Huh, D. Burger, S. W. Keckler, and C. R. Moore, "Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture"in Intl. Symp. on Computer Architecture (ISCA), 2003, pp. 422-433
-
(2003)
Intl. Symp. on Computer Architecture (ISCA)
, pp. 422-433
-
-
Sankaralingam, K.1
Nagarajan, R.2
Liu, H.3
Kim, C.4
Huh, J.5
Burger, D.6
Keckler, S.W.7
Moore, C.R.8
-
37
-
-
40349095135
-
Dataflow predication
-
A. Smith, R. Nagarajan, K. Sankaralingam, R. McDonald, D. Burger, S. W. Keckler, and K. S. McKinley, "Dataflow predication"in Intl. Symp. on Microarchitecture (MICRO), 2006, pp. 89-102
-
(2006)
Intl. Symp. on Microarchitecture (MICRO)
, pp. 89-102
-
-
Smith, A.1
Nagarajan, R.2
Sankaralingam, K.3
McDonald, R.4
Burger, D.5
Keckler, S.W.6
McKinley, K.S.7
-
38
-
-
84905492308
-
Threads on the cheap: Multithreaded execution in a WaveCache processor
-
S. Swanson, A. Schwerin, A. Petersen, M. Oskin, and S. Eggers, "Threads on the cheap: Multithreaded execution in a WaveCache processor"in Workshop on Complexity-effective Design (WCED), 2004
-
(2004)
Workshop on Complexity-effective Design (WCED)
-
-
Swanson, S.1
Schwerin, A.2
Petersen, A.3
Oskin, M.4
Eggers, S.5
-
39
-
-
84944392428
-
-
Dec
-
S. Swanson, K. Michelson, A. Schwerin, and M. Oskin, " WaveScalar"in Intl. Symp. on Microarchitecture (MICRO), Dec 2003, p. 291
-
(2003)
WaveScalar Intl. Symp. on Microarchitecture (MICRO)
, pp. 291
-
-
Swanson, S.1
Michelson, K.2
Schwerin, A.3
Oskin, M.4
-
40
-
-
0036505033
-
The Raw microprocessor: A computational fabric for software circuits and general-purpose programs
-
M. Taylor, J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat, B. Greenwald, H. Hoffman, P. Johnson, J.-W. Lee, W. Lee, A. Ma, A. Saraf, M. Seneski, N. Shnidman, V. Strumpen, M. Frank, S. Amarasinghe, and A. Agarwal, "The Raw microprocessor: a computational fabric for software circuits and general-purpose programs"IEEE Micro, vol. 22, no. 2, pp. 25-35, 2002
-
(2002)
IEEE Micro
, vol.22
, Issue.2
, pp. 25-35
-
-
Taylor, M.1
Kim, J.2
Miller, J.3
Wentzlaff, D.4
Ghodrat, F.5
Greenwald, B.6
Hoffman, H.7
Johnson, P.8
Lee, J.-W.9
Lee, W.10
Ma, A.11
Saraf, A.12
Seneski, M.13
Shnidman, N.14
Strumpen, V.15
Frank, M.16
Amarasinghe, S.17
Agarwal, A.18
-
43
-
-
0028087519
-
Performance estimation of multistreamed, superscalar processors
-
W. Yamamoto, M. Serrano, A. Talcott, R. Wood, and M. Nemirosky, "Performance estimation of multistreamed, superscalar processors"in Hawaii Intl. Conf. on System Sciences, vol. 1, 1994, pp. 195-204.
-
(1994)
Hawaii Intl. Conf. on System Sciences
, vol.1
, pp. 195-204
-
-
Yamamoto, W.1
Serrano, M.2
Talcott, A.3
Wood, R.4
Nemirosky, M.5
|