-
5
-
-
35648995516
-
The landscape of parallel computing research: A view from berkeley
-
University of California, Berkeley, Dec
-
K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick. The landscape of parallel computing research: A view from berkeley. Technical report, EECS Department, University of California, Berkeley, Dec 2006.
-
(2006)
Technical Report, EECS Department
-
-
Asanovic, K.1
Bodik, R.2
Catanzaro, B.C.3
Gebis, J.J.4
Husbands, P.5
Keutzer, K.6
Patterson, D.A.7
Plishker, W.L.8
Shalf, J.9
Williams, S.W.10
Yelick, K.A.11
-
6
-
-
80052656013
-
Virtualization of heterogeneous machines
-
June 2011
-
J. Auerbach, D. Bacon, P. Cheng, R. Rabbah, and S. Shukla. Virtualization of heterogeneous machines. In (DAC), pages 890-894, June 2011.
-
DAC
, pp. 890-894
-
-
Auerbach, J.1
Bacon, D.2
Cheng, P.3
Rabbah, R.4
Shukla, S.5
-
7
-
-
84885667952
-
Multi-pumping for resource reduction in fpga high-level synthesis
-
March 2013
-
A. Canis, J. H. Anderson, and S. D. Brown. Multi-pumping for resource reduction in fpga high-level synthesis. In (DATE), pages 194-197, March 2013.
-
DATE
, pp. 194-197
-
-
Canis, A.1
Anderson, J.H.2
Brown, S.D.3
-
8
-
-
84859246670
-
R-mat: A recursive model for graph mining
-
Carnegie Mellon University
-
D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-mat: A recursive model for graph mining. In CS Department, Carnegie Mellon University, 2004.
-
(2004)
CS Department
-
-
Chakrabarti, D.1
Zhan, Y.2
Faloutsos, C.3
-
9
-
-
70649092154
-
Rodinia: A benchmark suite for heterogeneous computing
-
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In (IISWC), 2009.
-
(2009)
IISWC
-
-
Che, S.1
Boyer, M.2
Meng, J.3
Tarjan, D.4
Sheaffer, J.W.5
Lee, S.-H.6
Skadron, K.7
-
10
-
-
78751505898
-
A characterization of the rodinia benchmark suite with comparison to contemporary CMP workloads
-
S. Che, J. W. Sheaffer, M. Boyer, L. G. Szafaryn, L. Wang, and K. Skadron. A characterization of the rodinia benchmark suite with comparison to contemporary CMP workloads. In (IISWC), 2010.
-
(2010)
IISWC
-
-
Che, S.1
Sheaffer, J.W.2
Boyer, M.3
Szafaryn, L.G.4
Wang, L.5
Skadron, K.6
-
11
-
-
84897780584
-
Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning
-
New York, NY, USA ACM
-
T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ASPLOS '14, pages 269-284, New York, NY, USA, 2014. ACM.
-
(2014)
ASPLOS '14
, pp. 269-284
-
-
Chen, T.1
Du, Z.2
Sun, N.3
Wang, J.4
Wu, C.5
Chen, Y.6
Temam, O.7
-
12
-
-
84881142714
-
Linqits: Big data on little clients
-
E. S. Chung, J. D. Davis, and J. Lee. Linqits: big data on little clients. ISCA, 2013.
-
(2013)
ISCA
-
-
Chung, E.S.1
Davis, J.D.2
Lee, J.3
-
13
-
-
79951696448
-
Single-chip heterogeneous computing: Does the future include custom logic fpgas, and gpgpus?
-
E. S. Chung, P. A. Milder, J. C. Hoe, and K. Mai. Single-chip heterogeneous computing: Does the future include custom logic, fpgas, and gpgpus? In MICRO, 2010.
-
(2010)
MICRO
-
-
Chung, E.S.1
Milder, P.A.2
Hoe, J.C.3
Mai, K.4
-
14
-
-
84889592098
-
Composable accelerator-rich microprocessor enhanced for adaptivity and longevity
-
Sept 2013
-
J. Cong, M. Ghodrat, M. Gill, B. Grigorian, H. Huang, and G. Reinman. Composable accelerator-rich microprocessor enhanced for adaptivity and longevity. In (ISLPED), pages 305-310, Sept 2013.
-
ISLPED
, pp. 305-310
-
-
Cong, J.1
Ghodrat, M.2
Gill, M.3
Grigorian, B.4
Huang, H.5
Reinman, G.6
-
15
-
-
84865554555
-
Charm: A composable heterogeneous accelerator-rich microprocessor
-
J. Cong, M. A. Ghodrat, M. Gill, B. Grigorian, and G. Reinman. Charm: a composable heterogeneous accelerator-rich microprocessor. In ISLPED, 2012.
-
(2012)
ISLPED
-
-
Cong, J.1
Ghodrat, M.A.2
Gill, M.3
Grigorian, B.4
Reinman, G.5
-
16
-
-
67650692183
-
Synthesis of reconfigurable highperformance multicore systems
-
J. Cong, K. Gururaj, and G. Han. Synthesis of reconfigurable highperformance multicore systems. In FPGA, 2009.
-
(2009)
FPGA
-
-
Cong, J.1
Gururaj, K.2
Han, G.3
-
17
-
-
84862082123
-
Combining module selection and replication for throughput-driven streaming programs
-
San Jose, CA, USA EDA Consortium
-
J. Cong, M. Huang, B. Liu, P. Zhang, and Y. Zou. Combining module selection and replication for throughput-driven streaming programs. DATE '12, pages 1018-1023, San Jose, CA, USA, 2012. EDA Consortium.
-
(2012)
DATE '12
, pp. 1018-1023
-
-
Cong, J.1
Huang, M.2
Liu, B.3
Zhang, P.4
Zou, Y.5
-
18
-
-
77952273045
-
The scalable heterogeneous computing (shoc) benchmark suite
-
A. Danalis, G. Marin, C. McCurdy, J. S. Meredith, P. C. Roth, K. Spafford, V. Tipparaju, and J. S. Vetter. The scalable heterogeneous computing (shoc) benchmark suite. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, 2010.
-
(2010)
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
-
-
Danalis, A.1
Marin, G.2
McCurdy, C.3
Meredith, J.S.4
Roth, P.C.5
Spafford, K.6
Tipparaju, V.7
Vetter, J.S.8
-
19
-
-
70350771127
-
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
-
SC 2008. International Conference for Nov 2008
-
K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. Patterson, J. Shalf, and K. Yelick. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In High Performance Computing, Networking, Storage and Analysis, 2008. SC 2008. International Conference for, pages 1-12, Nov 2008.
-
(2008)
High Performance Computing, Networking, Storage and Analysis
, pp. 1-12
-
-
Datta, K.1
Murphy, M.2
Volkov, V.3
Williams, S.4
Carter, J.5
Oliker, L.6
Patterson, D.7
Shalf, J.8
Yelick, K.9
-
20
-
-
77953098517
-
Using speculative functional units in high level synthesis
-
March 2010
-
A. Del Barrio, M. Molina, J. Mendias, R. Hermida, and S. Memik. Using speculative functional units in high level synthesis. In (DATE), pages 1779-1784, March 2010.
-
DATE
, pp. 1779-1784
-
-
Del Barrio, A.1
Molina, M.2
Mendias, J.3
Hermida, R.4
Memik, S.5
-
22
-
-
64849117951
-
Bridging the computation gap between programmable processors and hardwired accelerators
-
K. Fan, M. Kudlur, G. S. Dasika, and S. A. Mahlke. Bridging the computation gap between programmable processors and hardwired accelerators. In HPCA, 2009.
-
(2009)
HPCA
-
-
Fan, K.1
Kudlur, M.2
Dasika, G.S.3
Mahlke, S.A.4
-
23
-
-
84885631298
-
Compiling control-intensive loops for cgras with state-based full predication
-
March 2013
-
K. Han, K. Choi, and J. Lee. Compiling control-intensive loops for cgras with state-based full predication. In (DATE), pages 1579-1582, March 2013.
-
DATE
, pp. 1579-1582
-
-
Han, K.1
Choi, K.2
Lee, J.3
-
24
-
-
51749101517
-
Chstone: A benchmark program suite for practical c-based high-level synthesis
-
IEEE
-
Y. Hara, H. Tomiyama, S. Honda, H. Takada, and K. Ishii. Chstone: A benchmark program suite for practical c-based high-level synthesis. In ISCAS, pages 1192-1195. IEEE, 2008.
-
(2008)
ISCAS
, pp. 1192-1195
-
-
Hara, Y.1
Tomiyama, H.2
Honda, S.3
Takada, H.4
Ishii, K.5
-
25
-
-
60649099910
-
Accelerating large graph algorithms on the gpu using cuda
-
P. Harish and P. Narayanan. Accelerating large graph algorithms on the gpu using cuda. In HiPC, 2007.
-
(2007)
HiPC
-
-
Harish, P.1
Narayanan, P.2
-
26
-
-
84856541553
-
Efficient parallel graph exploration on multi-core cpu and gpu
-
S. Hong, T. Oguntebi, and K. Olukotun. Efficient parallel graph exploration on multi-core cpu and gpu. In PACT, 2011.
-
(2011)
PACT
-
-
Hong, S.1
Oguntebi, T.2
Olukotun, K.3
-
27
-
-
0000904908
-
Fast pattern matching in strings
-
D. E. Knuth, J. Morris, and V. R. Pratt. Fast pattern matching in strings. SIAM Journal of Computing, 6(2):323-350, 1977.
-
(1977)
SIAM Journal of Computing
, vol.6
, Issue.2
, pp. 323-350
-
-
Knuth, D.E.1
Morris, J.2
Pratt, V.R.3
-
28
-
-
0026137116
-
The cache performance and optimizations of blocked algorithms
-
New York, NY, USA ACM
-
M. D. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS IV, pages 63-74, New York, NY, USA, 1991. ACM.
-
(1991)
Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS IV
, pp. 63-74
-
-
Lam, M.D.1
Rothberg, E.E.2
Wolf, M.E.3
-
32
-
-
84879851819
-
On learning-based methods for designspace exploration with high-level synthesis
-
H.-Y. Liu and L. P. Carloni. On learning-based methods for designspace exploration with high-level synthesis. In DAC, 2013.
-
(2013)
DAC
-
-
Liu, H.-Y.1
Carloni, L.P.2
-
33
-
-
84862058364
-
Compositional system-level design exploration with planning of high-level synthesis
-
H.-Y. Liu, M. Petracca, and L. P. Carloni. Compositional system-level design exploration with planning of high-level synthesis. In DATE, 2012.
-
(2012)
DATE
-
-
Liu, H.-Y.1
Petracca, M.2
Carloni, L.P.3
-
34
-
-
84857883486
-
The accelerator store: A shared memory framework for accelerator-based systems
-
M. J. Lyons, M. Hempstead, G.-Y. Wei, and D. Brooks. The accelerator store: A shared memory framework for accelerator-based systems. TACO, 2012.
-
(2012)
TACO
-
-
Lyons, M.J.1
Hempstead, M.2
Wei, G.-Y.3
Brooks, D.4
-
35
-
-
84879864963
-
A high-level synthesis flow for the implementation of iterative stencil loop algorithms on fpga devices
-
New York, NY, USA ACM
-
A. A. Nacci, V. Rana, F. Bruschi, D. Sciuto, I. Beretta, and D. Atienza. A high-level synthesis flow for the implementation of iterative stencil loop algorithms on fpga devices. DAC '13, pages 52:1-52:6, New York, NY, USA, 2013. ACM.
-
(2013)
DAC '13
, pp. 521-526
-
-
Nacci, A.A.1
Rana, V.2
Bruschi, F.3
Sciuto, D.4
Beretta, I.5
Atienza, D.6
-
36
-
-
84897843178
-
Building zynq accelerators with vivado high level synthesis
-
S. Neuendorffer and F. Martinez-Vallina. Building zynq accelerators with vivado high level synthesis. In FPGA, 2013.
-
(2013)
FPGA
-
-
Neuendorffer, S.1
Martinez-Vallina, F.2
-
37
-
-
84881163269
-
Triggered instructions: A control paradigm for spatially-programmed architectures
-
New York, NY, USA ACM
-
A. Parashar, M. Pellauer, M. Adler, B. Ahsan, N. Crago, D. Lustig, V. Pavlov, A. Zhai, M. Gambhir, A. Jaleel, R. Allmon, R. Rayess, S. Maresh, and J. Emer. Triggered instructions: A control paradigm for spatially-programmed architectures. ISCA '13, pages 142-153, New York, NY, USA, 2013. ACM.
-
(2013)
ISCA '13
, pp. 142-153
-
-
Parashar, A.1
Pellauer, M.2
Adler, M.3
Ahsan, B.4
Crago, N.5
Lustig, D.6
Pavlov, V.7
Zhai, A.8
Gambhir, M.9
Jaleel, A.10
Allmon, R.11
Rayess, R.12
Maresh, S.13
Emer, J.14
-
38
-
-
35348913704
-
Analysis of redundancy and application balance in the spec cpu2006 benchmark suite
-
New York, NY, USA ACM
-
A. Phansalkar, A. Joshi, and L. K. John. Analysis of redundancy and application balance in the spec cpu2006 benchmark suite. ISCA '07, pages 412-423, New York, NY, USA, 2007. ACM.
-
(2007)
ISCA '07
, pp. 412-423
-
-
Phansalkar, A.1
Joshi, A.2
John, L.K.I.3
-
40
-
-
84881162326
-
Convolution engine: Balancing efficiency & flexibility in specialized computing
-
W. Qadeer, R. Hameed, O. Shacham, P. Venkatesan, C. Kozyrakis, and M. A. Horowitz. Convolution engine: balancing efficiency & flexibility in specialized computing. In ISCA, 2013.
-
(2013)
ISCA
-
-
Qadeer, W.1
Hameed, R.2
Shacham, O.3
Venkatesan, P.4
Kozyrakis, C.5
Horowitz, M.A.6
-
41
-
-
84889594827
-
Quantifying acceleration: Power/performance trade-offs of application kernels in hardware
-
B. Reagen, Y. S. Shao, G.-Y. Wei, and D. Brooks. Quantifying acceleration: Power/performance trade-offs of application kernels in hardware. In ISLPED, 2013.
-
(2013)
ISLPED
-
-
Reagen, B.1
Shao, Y.S.2
Wei, G.-Y.3
Brooks, D.4
-
42
-
-
84881437667
-
Isa-independent workload characterization and its implications for specialized architectures
-
Y. S. Shao and D. Brooks. Isa-independent workload characterization and its implications for specialized architectures. In ISPASS, 2013.
-
(2013)
ISPASS
-
-
Shao, Y.S.1
Brooks, D.2
-
43
-
-
84905487457
-
Aladdin: A pre-rtl, power-performance accelerator simulator enabling large design space exploration of customized architectures
-
Y. S. Shao, B. Reagen, G.-Y. Wei, and D. Brooks. Aladdin: A pre-rtl, power-performance accelerator simulator enabling large design space exploration of customized architectures. In ISCA, 2014.
-
(2014)
ISCA
-
-
Shao, Y.S.1
Reagen, B.2
Wei, G.-Y.3
Brooks, D.4
-
44
-
-
84873470137
-
Parboil: A revised benchmark suite for scientific and commercial throughput computing
-
Urbana, Mar.
-
J. A. Stratton, C. Rodrigrues, I.-J. Sung, N. Obeid, L. Chang, G. Liu, and W.-M. W. Hwu. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Number IMPACT-12-01, Urbana, Mar. 2012.
-
(2012)
Number IMPACT-12-01
-
-
Stratton, J.A.1
Rodrigrues, C.2
Sung, I.-J.3
Obeid, N.4
Chang, L.5
Liu, G.6
Hwu, W.-M.W.7
-
46
-
-
84858776502
-
Qscores: Trading dark silicon for scalable energy efficiency with quasi-specific cores
-
G. Venkatesh, J. Sampson, N. Goulding-Hotta, S. K. Venkata, M. B. Taylor, and S. Swanson. Qscores: trading dark silicon for scalable energy efficiency with quasi-specific cores. In MICRO, 2011.
-
(2011)
MICRO
-
-
Venkatesh, G.1
Sampson, J.2
Goulding-Hotta, N.3
Venkata, S.K.4
Taylor, M.B.5
Swanson, S.6
-
48
-
-
84879847956
-
Memory partitioning for multidimensional arrays in high-level synthesis
-
New York, NY, USA ACM
-
Y. Wang, P. Li, P. Zhang, C. Zhang, and J. Cong. Memory partitioning for multidimensional arrays in high-level synthesis. DAC '13, pages 12:1-12:8, New York, NY, USA, 2013. ACM.
-
(2013)
DAC '13
, pp. 121-128
-
-
Wang, Y.1
Li, P.2
Zhang, P.3
Zhang, C.4
Cong, J.5
-
49
-
-
84946077541
-
An interactive symbolicnumeric interface to parallel ellpack for building general pde solvers
-
Purdue University
-
S. Weerawarana, E. N. Houstis, and J. R. Rice. An interactive symbolicnumeric interface to parallel ellpack for building general pde solvers. In Tech Reports, Purdue University, 1990.
-
(1990)
Tech Reports
-
-
Weerawarana, S.1
Houstis, E.N.2
Rice, J.R.3
-
50
-
-
33845417137
-
Quantifying locality in the memory access patterns of hpc applications
-
J. Weinberg, M. O. McCracken, E. Strohmaier, and A. Snavely. Quantifying locality in the memory access patterns of hpc applications. In SC, 2005.
-
(2005)
SC
-
-
Weinberg, J.1
McCracken, M.O.2
Strohmaier, E.3
Snavely, A.4
-
51
-
-
84894113950
-
High-level synthesis of dynamic data structures: A case study using vivado hls
-
Dec 2013
-
F. Winterstein, S. Bayliss, and G. Constantinides. High-level synthesis of dynamic data structures: A case study using vivado hls. In Field-Programmable Technology (FPT), 2013 International Conference on, pages 362-365, Dec 2013.
-
Field-Programmable Technology (FPT), 2013 International Conference on
, pp. 362-365
-
-
Winterstein, F.1
Bayliss, S.2
Constantinides, G.3
-
52
-
-
37849037259
-
Introducing entropies for representing program behavior and branch predictor performance
-
T. Yokota, K. Ootsu, and T. Baba. Introducing entropies for representing program behavior and branch predictor performance. In Workshop on Experimental Computer Science, 2007.
-
(2007)
Workshop on Experimental Computer Science
-
-
Yokota, T.1
Ootsu, K.2
Baba, T.3
|