메뉴 건너뛰기




Volumn , Issue , 2014, Pages 110-119

MachSuite: Benchmarks for accelerator design and customized architectures

Author keywords

[No Author keywords available]

Indexed keywords

APPLICATION PROGRAMS; ARCHITECTURE; HIGH LEVEL SYNTHESIS;

EID: 84946020782     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/IISWC.2014.6983050     Document Type: Conference Paper
Times cited : (245)

References (52)
  • 6
    • 80052656013 scopus 로고    scopus 로고
    • Virtualization of heterogeneous machines
    • June 2011
    • J. Auerbach, D. Bacon, P. Cheng, R. Rabbah, and S. Shukla. Virtualization of heterogeneous machines. In (DAC), pages 890-894, June 2011.
    • DAC , pp. 890-894
    • Auerbach, J.1    Bacon, D.2    Cheng, P.3    Rabbah, R.4    Shukla, S.5
  • 7
    • 84885667952 scopus 로고    scopus 로고
    • Multi-pumping for resource reduction in fpga high-level synthesis
    • March 2013
    • A. Canis, J. H. Anderson, and S. D. Brown. Multi-pumping for resource reduction in fpga high-level synthesis. In (DATE), pages 194-197, March 2013.
    • DATE , pp. 194-197
    • Canis, A.1    Anderson, J.H.2    Brown, S.D.3
  • 8
    • 84859246670 scopus 로고    scopus 로고
    • R-mat: A recursive model for graph mining
    • Carnegie Mellon University
    • D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-mat: A recursive model for graph mining. In CS Department, Carnegie Mellon University, 2004.
    • (2004) CS Department
    • Chakrabarti, D.1    Zhan, Y.2    Faloutsos, C.3
  • 10
    • 78751505898 scopus 로고    scopus 로고
    • A characterization of the rodinia benchmark suite with comparison to contemporary CMP workloads
    • S. Che, J. W. Sheaffer, M. Boyer, L. G. Szafaryn, L. Wang, and K. Skadron. A characterization of the rodinia benchmark suite with comparison to contemporary CMP workloads. In (IISWC), 2010.
    • (2010) IISWC
    • Che, S.1    Sheaffer, J.W.2    Boyer, M.3    Szafaryn, L.G.4    Wang, L.5    Skadron, K.6
  • 11
    • 84897780584 scopus 로고    scopus 로고
    • Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning
    • New York, NY, USA ACM
    • T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ASPLOS '14, pages 269-284, New York, NY, USA, 2014. ACM.
    • (2014) ASPLOS '14 , pp. 269-284
    • Chen, T.1    Du, Z.2    Sun, N.3    Wang, J.4    Wu, C.5    Chen, Y.6    Temam, O.7
  • 12
    • 84881142714 scopus 로고    scopus 로고
    • Linqits: Big data on little clients
    • E. S. Chung, J. D. Davis, and J. Lee. Linqits: big data on little clients. ISCA, 2013.
    • (2013) ISCA
    • Chung, E.S.1    Davis, J.D.2    Lee, J.3
  • 13
    • 79951696448 scopus 로고    scopus 로고
    • Single-chip heterogeneous computing: Does the future include custom logic fpgas, and gpgpus?
    • E. S. Chung, P. A. Milder, J. C. Hoe, and K. Mai. Single-chip heterogeneous computing: Does the future include custom logic, fpgas, and gpgpus? In MICRO, 2010.
    • (2010) MICRO
    • Chung, E.S.1    Milder, P.A.2    Hoe, J.C.3    Mai, K.4
  • 14
    • 84889592098 scopus 로고    scopus 로고
    • Composable accelerator-rich microprocessor enhanced for adaptivity and longevity
    • Sept 2013
    • J. Cong, M. Ghodrat, M. Gill, B. Grigorian, H. Huang, and G. Reinman. Composable accelerator-rich microprocessor enhanced for adaptivity and longevity. In (ISLPED), pages 305-310, Sept 2013.
    • ISLPED , pp. 305-310
    • Cong, J.1    Ghodrat, M.2    Gill, M.3    Grigorian, B.4    Huang, H.5    Reinman, G.6
  • 15
  • 16
    • 67650692183 scopus 로고    scopus 로고
    • Synthesis of reconfigurable highperformance multicore systems
    • J. Cong, K. Gururaj, and G. Han. Synthesis of reconfigurable highperformance multicore systems. In FPGA, 2009.
    • (2009) FPGA
    • Cong, J.1    Gururaj, K.2    Han, G.3
  • 17
    • 84862082123 scopus 로고    scopus 로고
    • Combining module selection and replication for throughput-driven streaming programs
    • San Jose, CA, USA EDA Consortium
    • J. Cong, M. Huang, B. Liu, P. Zhang, and Y. Zou. Combining module selection and replication for throughput-driven streaming programs. DATE '12, pages 1018-1023, San Jose, CA, USA, 2012. EDA Consortium.
    • (2012) DATE '12 , pp. 1018-1023
    • Cong, J.1    Huang, M.2    Liu, B.3    Zhang, P.4    Zou, Y.5
  • 20
    • 77953098517 scopus 로고    scopus 로고
    • Using speculative functional units in high level synthesis
    • March 2010
    • A. Del Barrio, M. Molina, J. Mendias, R. Hermida, and S. Memik. Using speculative functional units in high level synthesis. In (DATE), pages 1779-1784, March 2010.
    • DATE , pp. 1779-1784
    • Del Barrio, A.1    Molina, M.2    Mendias, J.3    Hermida, R.4    Memik, S.5
  • 21
    • 84876591853 scopus 로고    scopus 로고
    • Neural acceleration for general-purpose approximate programs
    • H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger. Neural acceleration for general-purpose approximate programs. In MICRO, 2012.
    • (2012) MICRO
    • Esmaeilzadeh, H.1    Sampson, A.2    Ceze, L.3    Burger, D.4
  • 22
    • 64849117951 scopus 로고    scopus 로고
    • Bridging the computation gap between programmable processors and hardwired accelerators
    • K. Fan, M. Kudlur, G. S. Dasika, and S. A. Mahlke. Bridging the computation gap between programmable processors and hardwired accelerators. In HPCA, 2009.
    • (2009) HPCA
    • Fan, K.1    Kudlur, M.2    Dasika, G.S.3    Mahlke, S.A.4
  • 23
    • 84885631298 scopus 로고    scopus 로고
    • Compiling control-intensive loops for cgras with state-based full predication
    • March 2013
    • K. Han, K. Choi, and J. Lee. Compiling control-intensive loops for cgras with state-based full predication. In (DATE), pages 1579-1582, March 2013.
    • DATE , pp. 1579-1582
    • Han, K.1    Choi, K.2    Lee, J.3
  • 24
    • 51749101517 scopus 로고    scopus 로고
    • Chstone: A benchmark program suite for practical c-based high-level synthesis
    • IEEE
    • Y. Hara, H. Tomiyama, S. Honda, H. Takada, and K. Ishii. Chstone: A benchmark program suite for practical c-based high-level synthesis. In ISCAS, pages 1192-1195. IEEE, 2008.
    • (2008) ISCAS , pp. 1192-1195
    • Hara, Y.1    Tomiyama, H.2    Honda, S.3    Takada, H.4    Ishii, K.5
  • 25
    • 60649099910 scopus 로고    scopus 로고
    • Accelerating large graph algorithms on the gpu using cuda
    • P. Harish and P. Narayanan. Accelerating large graph algorithms on the gpu using cuda. In HiPC, 2007.
    • (2007) HiPC
    • Harish, P.1    Narayanan, P.2
  • 26
    • 84856541553 scopus 로고    scopus 로고
    • Efficient parallel graph exploration on multi-core cpu and gpu
    • S. Hong, T. Oguntebi, and K. Olukotun. Efficient parallel graph exploration on multi-core cpu and gpu. In PACT, 2011.
    • (2011) PACT
    • Hong, S.1    Oguntebi, T.2    Olukotun, K.3
  • 32
    • 84879851819 scopus 로고    scopus 로고
    • On learning-based methods for designspace exploration with high-level synthesis
    • H.-Y. Liu and L. P. Carloni. On learning-based methods for designspace exploration with high-level synthesis. In DAC, 2013.
    • (2013) DAC
    • Liu, H.-Y.1    Carloni, L.P.2
  • 33
    • 84862058364 scopus 로고    scopus 로고
    • Compositional system-level design exploration with planning of high-level synthesis
    • H.-Y. Liu, M. Petracca, and L. P. Carloni. Compositional system-level design exploration with planning of high-level synthesis. In DATE, 2012.
    • (2012) DATE
    • Liu, H.-Y.1    Petracca, M.2    Carloni, L.P.3
  • 34
    • 84857883486 scopus 로고    scopus 로고
    • The accelerator store: A shared memory framework for accelerator-based systems
    • M. J. Lyons, M. Hempstead, G.-Y. Wei, and D. Brooks. The accelerator store: A shared memory framework for accelerator-based systems. TACO, 2012.
    • (2012) TACO
    • Lyons, M.J.1    Hempstead, M.2    Wei, G.-Y.3    Brooks, D.4
  • 35
    • 84879864963 scopus 로고    scopus 로고
    • A high-level synthesis flow for the implementation of iterative stencil loop algorithms on fpga devices
    • New York, NY, USA ACM
    • A. A. Nacci, V. Rana, F. Bruschi, D. Sciuto, I. Beretta, and D. Atienza. A high-level synthesis flow for the implementation of iterative stencil loop algorithms on fpga devices. DAC '13, pages 52:1-52:6, New York, NY, USA, 2013. ACM.
    • (2013) DAC '13 , pp. 521-526
    • Nacci, A.A.1    Rana, V.2    Bruschi, F.3    Sciuto, D.4    Beretta, I.5    Atienza, D.6
  • 36
    • 84897843178 scopus 로고    scopus 로고
    • Building zynq accelerators with vivado high level synthesis
    • S. Neuendorffer and F. Martinez-Vallina. Building zynq accelerators with vivado high level synthesis. In FPGA, 2013.
    • (2013) FPGA
    • Neuendorffer, S.1    Martinez-Vallina, F.2
  • 38
    • 35348913704 scopus 로고    scopus 로고
    • Analysis of redundancy and application balance in the spec cpu2006 benchmark suite
    • New York, NY, USA ACM
    • A. Phansalkar, A. Joshi, and L. K. John. Analysis of redundancy and application balance in the spec cpu2006 benchmark suite. ISCA '07, pages 412-423, New York, NY, USA, 2007. ACM.
    • (2007) ISCA '07 , pp. 412-423
    • Phansalkar, A.1    Joshi, A.2    John, L.K.I.3
  • 41
    • 84889594827 scopus 로고    scopus 로고
    • Quantifying acceleration: Power/performance trade-offs of application kernels in hardware
    • B. Reagen, Y. S. Shao, G.-Y. Wei, and D. Brooks. Quantifying acceleration: Power/performance trade-offs of application kernels in hardware. In ISLPED, 2013.
    • (2013) ISLPED
    • Reagen, B.1    Shao, Y.S.2    Wei, G.-Y.3    Brooks, D.4
  • 42
    • 84881437667 scopus 로고    scopus 로고
    • Isa-independent workload characterization and its implications for specialized architectures
    • Y. S. Shao and D. Brooks. Isa-independent workload characterization and its implications for specialized architectures. In ISPASS, 2013.
    • (2013) ISPASS
    • Shao, Y.S.1    Brooks, D.2
  • 43
    • 84905487457 scopus 로고    scopus 로고
    • Aladdin: A pre-rtl, power-performance accelerator simulator enabling large design space exploration of customized architectures
    • Y. S. Shao, B. Reagen, G.-Y. Wei, and D. Brooks. Aladdin: A pre-rtl, power-performance accelerator simulator enabling large design space exploration of customized architectures. In ISCA, 2014.
    • (2014) ISCA
    • Shao, Y.S.1    Reagen, B.2    Wei, G.-Y.3    Brooks, D.4
  • 48
    • 84879847956 scopus 로고    scopus 로고
    • Memory partitioning for multidimensional arrays in high-level synthesis
    • New York, NY, USA ACM
    • Y. Wang, P. Li, P. Zhang, C. Zhang, and J. Cong. Memory partitioning for multidimensional arrays in high-level synthesis. DAC '13, pages 12:1-12:8, New York, NY, USA, 2013. ACM.
    • (2013) DAC '13 , pp. 121-128
    • Wang, Y.1    Li, P.2    Zhang, P.3    Zhang, C.4    Cong, J.5
  • 49
    • 84946077541 scopus 로고
    • An interactive symbolicnumeric interface to parallel ellpack for building general pde solvers
    • Purdue University
    • S. Weerawarana, E. N. Houstis, and J. R. Rice. An interactive symbolicnumeric interface to parallel ellpack for building general pde solvers. In Tech Reports, Purdue University, 1990.
    • (1990) Tech Reports
    • Weerawarana, S.1    Houstis, E.N.2    Rice, J.R.3
  • 50
    • 33845417137 scopus 로고    scopus 로고
    • Quantifying locality in the memory access patterns of hpc applications
    • J. Weinberg, M. O. McCracken, E. Strohmaier, and A. Snavely. Quantifying locality in the memory access patterns of hpc applications. In SC, 2005.
    • (2005) SC
    • Weinberg, J.1    McCracken, M.O.2    Strohmaier, E.3    Snavely, A.4
  • 52
    • 37849037259 scopus 로고    scopus 로고
    • Introducing entropies for representing program behavior and branch predictor performance
    • T. Yokota, K. Ootsu, and T. Baba. Introducing entropies for representing program behavior and branch predictor performance. In Workshop on Experimental Computer Science, 2007.
    • (2007) Workshop on Experimental Computer Science
    • Yokota, T.1    Ootsu, K.2    Baba, T.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.