-
3
-
-
0036469652
-
Simplescalar: An infrastructure for computer system modeling
-
T. M. Austin, E. Larson, and D. Ernst, "Simplescalar: An infrastructure for computer system modeling," IEEE Computer, 2002.
-
(2002)
IEEE Computer
-
-
Austin, T.M.1
Larson, E.2
Ernst, D.3
-
4
-
-
0026867085
-
Dynamic dependency analysis of ordinary programs
-
T. M. Austin and G. S. Sohi, "Dynamic dependency analysis of ordinary programs," in ISCA, 1992.
-
(1992)
ISCA
-
-
Austin, T.M.1
Sohi, G.S.2
-
5
-
-
70349169075
-
Analyzing cuda workloads using a detailed gpu simulator
-
A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing cuda workloads using a detailed gpu simulator," in ISPASS, 2009.
-
(2009)
ISPASS
-
-
Bakhoda, A.1
Yuan, G.L.2
Fung, W.W.L.3
Wong, H.4
Aamodt, T.M.5
-
6
-
-
84881175680
-
Continuous real-world inputs can open up alternative accelerator designs
-
B. Belhadj, A. Joubert, Z. Li, R. Héliot, and O. Temam, "Continuous real-world inputs can open up alternative accelerator designs," in ISCA, 2013.
-
(2013)
ISCA
-
-
Belhadj, B.1
Joubert, A.2
Li, Z.3
Héliot, R.4
Temam, O.5
-
7
-
-
84859464490
-
The gem5 simulator
-
N. L. Binkert, B. M. Beckmann, G. Black, S. K. Reinhardt, A. G. Saidi, A. Basu, J. Hestness, D. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 simulator," SIGARCH Computer Architecture News, 2011.
-
(2011)
SIGARCH Computer Architecture News
-
-
Binkert, N.L.1
Beckmann, B.M.2
Black, G.3
Reinhardt, S.K.4
Saidi, A.G.5
Basu, A.6
Hestness, J.7
Hower, D.8
Krishna, T.9
Sardashti, S.10
Sen, R.11
Sewell, K.12
Shoaib, M.13
Vaish, N.14
Hill, M.D.15
Wood, D.A.16
-
8
-
-
0033719421
-
Wattch: A framework for architectural-level power analysis and optimizations
-
D. Brooks, V. Tiwari, and M. Martonosi, "Wattch: A framework for architectural-level power analysis and optimizations," in ISCA, 2000.
-
(2000)
ISCA
-
-
Brooks, D.1
Tiwari, V.2
Martonosi, M.3
-
9
-
-
0029666646
-
Memory bandwidth limitations of future microprocessors
-
D. Burger, J. R. Goodman, and A. Kagi, "Memory bandwidth limitations of future microprocessors," in ISCA, 1996.
-
(1996)
ISCA
-
-
Burger, D.1
Goodman, J.R.2
Kagi, A.3
-
10
-
-
76949106140
-
A highly flexible, parallel virtual machine: Design and experience of ildjit
-
S. Campanoni, G. Agosta, S. Crespi-Reghizzi, and A. D. Biagio, "A highly flexible, parallel virtual machine: Design and experience of ildjit," Software Practice Expererience, 2010.
-
(2010)
Software Practice Expererience
-
-
Campanoni, S.1
Agosta, G.2
Crespi-Reghizzi, S.3
Biagio, A.D.4
-
11
-
-
84874530623
-
An fpga memcached appliance
-
S. R. Chalamalasetti, K. Lim, M. Wright, A. AuYoung, P. Ranganathan, and M. Margala, "An fpga memcached appliance," in FPGA, 2013.
-
(2013)
FPGA
-
-
Chalamalasetti, S.R.1
Lim, K.2
Wright, M.3
Auyoung, A.4
Ranganathan, P.5
Margala, M.6
-
12
-
-
84881142714
-
Linqits: Big data on little clients
-
E. S. Chung, J. D. Davis, and J. Lee, "Linqits: big data on little clients," ISCA, 2013.
-
(2013)
ISCA
-
-
Chung, E.S.1
Davis, J.D.2
Lee, J.3
-
13
-
-
79951696448
-
Single-chip heterogeneous computing: Does the future include custom logic, fpgas, and gpgpus?
-
E. S. Chung, P. A. Milder, J. C. Hoe, and K. Mai, "Single-chip heterogeneous computing: Does the future include custom logic, fpgas, and gpgpus?" in MICRO, 2010.
-
(2010)
MICRO
-
-
Chung, E.S.1
Milder, P.A.2
Hoe, J.C.3
Mai, K.4
-
14
-
-
52649095061
-
Veal: Virtualized execution accelerator for loops
-
N. Clark, A. Hormati, and S. A. Mahlke, "Veal: Virtualized execution accelerator for loops," in ISCA, 2008.
-
(2008)
ISCA
-
-
Clark, N.1
Hormati, A.2
Mahlke, S.A.3
-
15
-
-
2442428419
-
Application-specific instruction generation for configurable processor architectures
-
J. Cong, Y. Fan, G. Han, and Z. Zhang, "Application-specific instruction generation for configurable processor architectures," in FPGA, 2004.
-
(2004)
FPGA
-
-
Cong, J.1
Fan, Y.2
Han, G.3
Zhang, Z.4
-
16
-
-
67650692183
-
Synthesis of reconfigurable highperformance multicore systems
-
J. Cong, K. Gururaj, and G. Han, "Synthesis of reconfigurable highperformance multicore systems," in FPGA, 2009.
-
(2009)
FPGA
-
-
Cong, J.1
Gururaj, K.2
Han, G.3
-
17
-
-
77952273045
-
The scalable heterogeneous computing (shoc) benchmark suite
-
A. Danalis, G. Marin, C. McCurdy, J. S. Meredith, P. C. Roth, K. Spafford, V. Tipparaju, and J. S. Vetter, "The scalable heterogeneous computing (shoc) benchmark suite," in Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, 2010.
-
(2010)
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
-
-
Danalis, A.1
Marin, G.2
McCurdy, C.3
Meredith, J.S.4
Roth, P.C.5
Spafford, K.6
Tipparaju, V.7
Vetter, J.S.8
-
18
-
-
84861950149
-
Dark silicon and the end of multicore scaling
-
H. Esmaeilzadeh, E. Blem, R. St. Amant, K. Sankaralingam, and D. Burger, "Dark silicon and the end of multicore scaling," Micro, IEEE, 2012.
-
(2012)
Micro IEEE
-
-
Esmaeilzadeh, H.1
Blem, E.2
St. Amant, R.3
Sankaralingam, K.4
Burger, D.5
-
19
-
-
84876591853
-
Neural acceleration for general-purpose approximate programs
-
H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger, "Neural acceleration for general-purpose approximate programs," in MICRO, 2012.
-
(2012)
MICRO
-
-
Esmaeilzadeh, H.1
Sampson, A.2
Ceze, L.3
Burger, D.4
-
20
-
-
80052679438
-
Buffer-integrated-cache: A cost-effective sram architecture for handheld and embedded platforms
-
C. F. Fajardo, Z. Fang, R. Iyer, G. F. Garcia, S. E. Lee, and L. Zhao, "Buffer-integrated-cache: A cost-effective sram architecture for handheld and embedded platforms," in DAC, 2011.
-
(2011)
DAC
-
-
Fajardo, C.F.1
Fang, Z.2
Iyer, R.3
Garcia, G.F.4
Lee, S.E.5
Zhao, L.6
-
22
-
-
0036296821
-
Slack: Maximizing performance under technological constraints
-
B. A. Fields, R. Bodk, and M. D. Hill, "Slack: Maximizing performance under technological constraints," in ISCA, 2002.
-
(2002)
ISCA
-
-
Fields, B.A.1
Bodk, R.2
Hill, M.D.3
-
24
-
-
79959906704
-
Kremlin: Rethinking and rebooting gprof for the multicore age
-
S. Garcia, D. Jeon, C. M. Louie, and M. B. Taylor, "Kremlin: rethinking and rebooting gprof for the multicore age," in PLDI, 2011.
-
(2011)
PLDI
-
-
Garcia, S.1
Jeon, D.2
Louie, C.M.3
Taylor, M.B.4
-
25
-
-
84869168810
-
Dyser: Unifying functionality and parallelism specialization for energy-efficient computing
-
V. Govindaraju, C.-H. Ho, T. Nowatzki, J. Chhugani, N. Satish, K. Sankaralingam, and C. Kim, "Dyser: Unifying functionality and parallelism specialization for energy-efficient computing," IEEE Micro, 2012.
-
(2012)
IEEE Micro
-
-
Govindaraju, V.1
Ho, C.-H.2
Nowatzki, T.3
Chhugani, J.4
Satish, N.5
Sankaralingam, K.6
Kim, C.7
-
26
-
-
84887502088
-
Breaking simd shackles with an exposed flexible microarchitecture and the access execute pdg
-
V. Govindaraju, T. Nowatzki, and K. Sankaralingam, "Breaking simd shackles with an exposed flexible microarchitecture and the access execute pdg," in PACT, 2013.
-
(2013)
PACT
-
-
Govindaraju, V.1
Nowatzki, T.2
Sankaralingam, K.3
-
27
-
-
84863374615
-
Bundled execution of recurring traces for energy-efficient general purpose processing
-
S. Gupta, S. Feng, A. Ansari, S. Mahlke, and D. August, "Bundled execution of recurring traces for energy-efficient general purpose processing," in MICRO, 2011.
-
(2011)
MICRO
-
-
Gupta, S.1
Feng, S.2
Ansari, A.3
Mahlke, S.4
August, D.5
-
28
-
-
77954995378
-
Understanding sources of inefficiency in general-purpose chips
-
R. Hameed, W. Qadeer, M. Wachs, O. Azizi, A. Solomatnikov, B. C. Lee, S. Richardson, C. Kozyrakis, and M. Horowitz, "Understanding sources of inefficiency in general-purpose chips," in ISCA, 2010.
-
(2010)
ISCA
-
-
Hameed, R.1
Qadeer, W.2
Wachs, M.3
Azizi, O.4
Solomatnikov, A.5
Lee, B.C.6
Richardson, S.7
Kozyrakis, C.8
Horowitz, M.9
-
29
-
-
84905475765
-
Optimal huffman tree-height reduction for instruction-level parallelism
-
Department of Computer Sciences The University of Texas at Austin
-
W. Hunt, B. A. Maher, D. Burger, and K. S. Mckinley, "Optimal huffman tree-height reduction for instruction-level parallelism," Technical Report TR-08-34, Department of Computer Sciences The University of Texas at Austin, 2008.
-
(2008)
Technical Report TR-08-34
-
-
Hunt, W.1
Maher, B.A.2
Burger, D.3
McKinley, K.S.4
-
30
-
-
77952985184
-
Code coverage and input variability: Effects on architecture and compiler research
-
H. C. Hunter andW. meiW. Hwu, "Code coverage and input variability: effects on architecture and compiler research," in CASES, 2002.
-
(2002)
CASES
-
-
Hunter, H.C.1
Mei, W.2
Hwu, W.3
-
31
-
-
81455154902
-
Kismet: Parallel speedup estimates for serial programs
-
D. Jeon, S. Garcia, C. M. Louie, and M. B. Taylor, "Kismet: parallel speedup estimates for serial programs," in OOPSLA, 2011.
-
(2011)
OOPSLA
-
-
Jeon, D.1
Garcia, S.2
Louie, C.M.3
Taylor, M.B.4
-
32
-
-
79951696651
-
Sd3: A scalable approach to dynamic data-dependence profiling
-
M. Kim, H. Kim, and C.-K. Luk, "Sd3: A scalable approach to dynamic data-dependence profiling," in MICRO, 2010.
-
(2010)
MICRO
-
-
Kim, M.1
Kim, H.2
Luk, C.-K.3
-
33
-
-
0024068822
-
Measuring parallelism in computation-intensive scientific/engineering applications
-
M. Kumar, "Measuring parallelism in computation-intensive scientific/engineering applications," IEEE Trans. Computers, 1988.
-
(1988)
IEEE Trans. Computers
-
-
Kumar, M.1
-
34
-
-
0026867146
-
Limits of control flow on parallelism
-
M. S. Lam and R. P. Wilson, "Limits of control flow on parallelism," in ISCA, 1992.
-
(1992)
ISCA
-
-
Lam, M.S.1
Wilson, R.P.2
-
36
-
-
84881151222
-
Gpuwattch: Enabling energy optimizations in gpgpus
-
J. Leng, T. H. Hetherington, A. ElTantawy, S. Z. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi, "Gpuwattch: enabling energy optimizations in gpgpus," in ISCA, 2013.
-
(2013)
ISCA
-
-
Leng, J.1
Hetherington, T.H.2
Eltantawy, A.3
Gilani, S.Z.4
Kim, N.S.5
Aamodt, T.M.6
Reddi, V.J.7
-
37
-
-
76749146060
-
Mcpat: An integrated power, area, and timing modeling framework for multicore and manycore architectures
-
S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "Mcpat: an integrated power, area, and timing modeling framework for multicore and manycore architectures," in MICRO, 2009.
-
(2009)
MICRO
-
-
Li, S.1
Ahn, J.H.2
Strong, R.D.3
Brockman, J.B.4
Tullsen, D.M.5
Jouppi, N.P.6
-
38
-
-
84881144734
-
Thin servers with smart pipes: Designing soc accelerators for memcached
-
K. T. Lim, D. Meisner, A. G. Saidi, P. Ranganathan, and T. F. Wenisch, "Thin servers with smart pipes: designing soc accelerators for memcached," in ISCA, 2013.
-
(2013)
ISCA
-
-
Lim, K.T.1
Meisner, D.2
Saidi, A.G.3
Ranganathan, P.4
Wenisch, T.F.5
-
39
-
-
84879851819
-
On learning-based methods for designspace exploration with high-level synthesis
-
H.-Y. Liu and L. P. Carloni, "On learning-based methods for designspace exploration with high-level synthesis," in DAC, 2013.
-
(2013)
DAC
-
-
Liu, H.-Y.1
Carloni, L.P.2
-
40
-
-
84862058364
-
Compositional system-level design exploration with planning of high-level synthesis
-
H.-Y. Liu, M. Petracca, and L. P. Carloni, "Compositional system-level design exploration with planning of high-level synthesis," in DATE, 2012.
-
(2012)
DATE
-
-
Liu, H.-Y.1
Petracca, M.2
Carloni, L.P.3
-
41
-
-
40349109005
-
Pathexpander: Architectural support for increasing the path coverage of dynamic bug detection
-
S. Lu, P. Zhou, W. Liu, Y. Zhou, and J. Torrellas, "Pathexpander: Architectural support for increasing the path coverage of dynamic bug detection," in MICRO, 2006.
-
(2006)
MICRO
-
-
Lu, S.1
Zhou, P.2
Liu, W.3
Zhou, Y.4
Torrellas, J.5
-
42
-
-
31944440969
-
Pin: Building customized program analysis tools with dynamic instrumentation
-
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood, "Pin: building customized program analysis tools with dynamic instrumentation," PLDI, 2005.
-
(2005)
PLDI
-
-
Luk, C.-K.1
Cohn, R.2
Muth, R.3
Patil, H.4
Klauser, A.5
Lowney, G.6
Wallace, S.7
Reddi, V.J.8
Hazelwood, K.9
-
43
-
-
84881162326
-
Convolution engine: Balancing efficiency & flexibility in specialized computing
-
W. Qadeer, R. Hameed, O. Shacham, P. Venkatesan, C. Kozyrakis, and M. A. Horowitz, "Convolution engine: balancing efficiency & flexibility in specialized computing," in ISCA, 2013.
-
(2013)
ISCA
-
-
Qadeer, W.1
Hameed, R.2
Shacham, O.3
Venkatesan, P.4
Kozyrakis, C.5
Horowitz, M.A.6
-
44
-
-
84863430504
-
Measuring limits of parallelism and characterizing its vulnerability to resource constraints
-
L. Rauchwerger, P. K. Dubey, and R. Nair, "Measuring limits of parallelism and characterizing its vulnerability to resource constraints," in MICRO, 1993.
-
(1993)
MICRO
-
-
Rauchwerger, L.1
Dubey, P.K.2
Nair, R.3
-
45
-
-
84889594827
-
Quantifying acceleration: Power/performance trade-offs of application kernels in hardware
-
B. Reagen, Y. S. Shao, G.-Y. Wei, and D. Brooks, "Quantifying acceleration: Power/performance trade-offs of application kernels in hardware," in ISLPED, 2013.
-
(2013)
ISLPED
-
-
Reagen, B.1
Shao, Y.S.2
Wei, G.-Y.3
Brooks, D.4
-
47
-
-
84880285819
-
Sonic millip3de: A massively parallel 3d-stacked accelerator for 3d ultrasound
-
R. Sampson, M. Yang, S. Wei, C. Chakrabarti, and T. F. Wenisch, "Sonic millip3de: A massively parallel 3d-stacked accelerator for 3d ultrasound," in HPCA, 2013.
-
(2013)
HPCA
-
-
Sampson, R.1
Yang, M.2
Wei, S.3
Chakrabarti, C.4
Wenisch, T.F.5
-
48
-
-
34249810603
-
Nosq: Store-load communication without a store queue
-
T. Sha, M. M. K. Martin, and A. Roth, "Nosq: Store-load communication without a store queue," in MICRO, 2006.
-
(2006)
MICRO
-
-
Sha, T.1
Martin, M.M.K.2
Roth, A.3
-
49
-
-
84881437667
-
Isa-independent workload characterization and its implications for specialized architectures
-
Y. S. Shao and D. Brooks, "Isa-independent workload characterization and its implications for specialized architectures," in ISPASS, 2013.
-
(2013)
ISPASS
-
-
Shao, Y.S.1
Brooks, D.2
-
50
-
-
84864858301
-
A defect-tolerant accelerator for emerging highperformance applications
-
O. Temam, "A defect-tolerant accelerator for emerging highperformance applications," in ISCA, 2012.
-
(2012)
ISCA
-
-
Temam, O.1
-
51
-
-
0026989702
-
On the limits of program parallelism and its smoothability
-
K. B. Theobald, G. R. Gao, and L. J. Hendren, "On the limits of program parallelism and its smoothability," in MICRO, 1992.
-
(1992)
MICRO
-
-
Theobald, K.B.1
Gao, G.R.2
Hendren, L.J.3
-
52
-
-
77952256041
-
Conservation cores: Reducing the energy of mature computations
-
G. Venkatesh, J. Sampson, N. Goulding, S. Garcia, V. Bryksin, J. Lugo-Martinez, S. Swanson, and M. B. Taylor, "Conservation cores: reducing the energy of mature computations," ASPLOS, 2010.
-
(2010)
ASPLOS
-
-
Venkatesh, G.1
Sampson, J.2
Goulding, N.3
Garcia, S.4
Bryksin, V.5
Lugo-Martinez, J.6
Swanson, S.7
Taylor, M.B.8
-
53
-
-
0026137115
-
Limits of instruction-level parallelism
-
D.W. Wall, "Limits of instruction-level parallelism," in ASPLOS, 1991.
-
(1991)
ASPLOS
-
-
Wall, D.W.1
-
55
-
-
84881185269
-
Navigating big data with high-throughput, energy-efficient data partitioning
-
L. Wu, R. J. Barker, M. A. Kim, and K. A. Ross, "Navigating big data with high-throughput, energy-efficient data partitioning," in ISCA, 2013.
-
(2013)
ISCA
-
-
Wu, L.1
Barker, R.J.2
Kim, M.A.3
Ross, K.A.4
-
56
-
-
84893898462
-
A 3d-stacked logic-in-memory accelerator for application-specific data intensive computing
-
Q. Zhu, B. Akin, H. E. Sumbul, F. Sadi, J. Hoe, L. Pileggi, and F. Franchetti, "A 3d-stacked logic-in-memory accelerator for application-specific data intensive computing," in 3DIC, 2013.
-
(2013)
3DIC
-
-
Zhu, Q.1
Akin, B.2
Sumbul, H.E.3
Sadi, F.4
Hoe, J.5
Pileggi, L.6
Franchetti, F.7
|