SCOPUS 정보 검색 플랫폼

Proceedings of the Annual International Symposium on Microarchitecture, MICRO

Volumn 05-09-December-2015, Issue , 2015, Pages 482-493

Neural acceleration for GPU throughput processors

(5) Yazdanbakhsh, Amir a Park, Jongse a Sharma, Hardik a Lotfi Kamran, Pejman b Esmaeilzadeh, Hadi a

a Georgia Institute of Technology (United States)

b Institute for Research in Fundamental Sciences (IPM) (United States)

Author keywords

approximate computing; GPU; neural processing unit

Indexed keywords

ACCELERATION; BENCHMARKING; COMPUTER ARCHITECTURE; COMPUTER GRAPHICS; EMBEDDED SYSTEMS; IMAGE CODING; PROGRAM PROCESSORS; QUALITY CONTROL;

APPROXIMATE COMPUTING; APPROXIMATION TECHNIQUES; CYCLE-ACCURATE SIMULATION; DATA-LEVEL PARALLELISM; GRAPHICS PROCESSING UNITS; HARDWARE OVERHEADS; MANY-CORE ACCELERATORS; NEURAL-PROCESSING;

COMPUTER HARDWARE;

EID: 84959896262 PISSN: 10724451 EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/2830772.2830810 Document Type: Conference Paper

Times cited : (104)

References (59)

1
- 80052528714
- Dark silicon and the end of multicore scaling
- H. Esmaeilzadeh, E. Blem, R. St. Amant, K. Sankaralingam, and D. Burger, "Dark silicon and the end of multicore scaling, " in ISCA, 2011.
- (2011) ISCA
- Esmaeilzadeh, H.¹ Blem, E.² St. Amant, R.³ Sankaralingam, K.⁴ Burger, D.⁵

2
- 79961040286
- To-ward dark silicon in servers
- N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki, "To-ward dark silicon in servers, " IEEE Micro, 2011.
- (2011) IEEE Micro
- Hardavellas, N.¹ Ferdman, M.² Falsafi, B.³ Ailamaki, A.⁴

3
- 77952256041
- Conservation cores: Reducing the energy of mature computations
- G. Venkatesh, J. Sampson, N. Goulding, S. Garcia, V. Bryksin, J. Lugo-Martinez, S. Swanson, and M. B. Taylor, "Conservation cores: Reducing the energy of mature computations, " in ASP-LOS, 2010.
- (2010) ASP-LOS
- Venkatesh, G.¹ Sampson, J.² Goulding, N.³ Garcia, S.⁴ Bryksin, V.⁵ Lugo-Martinez, J.⁶ Swanson, S.⁷ Taylor, M.B.⁸

4
- 83755217707
- J. Gantz and D. Reinsel, "Extracting value from chaos. " http: //www. emc. com.
- Extracting Value from Chaos
- Gantz, J.¹ Reinsel, D.²

5
- 85026956356
- GeForce 400 series. " http: //en. wikipedia. org, 2015.
- (2015) GeForce 400 Series

6
- 84892531161
- SAGE: Self-tuning approximation for graphics engines
- M. Samadi, J. Lee, D. A. Jamshidi, A. Hormati, and S. Mahlke, "SAGE: self-tuning approximation for graphics engines, " in MI-CRO, 2013.
- (2013) MI-CRO
- Samadi, M.¹ Lee, J.² Jamshidi, D.A.³ Hormati, A.⁴ Mahlke, S.⁵

7
- 84897771889
- Paraprox: Pattern-based approximation for data parallel applications
- M. Samadi, D. A. Jamshidi, J. Lee, and S. Mahlke, "Paraprox: Pattern-based approximation for data parallel applications, " in ASPLOS, 2014.
- (2014) ASPLOS
- Samadi, M.¹ Jamshidi, D.A.² Lee, J.³ Mahlke, S.⁴

8
- 84905460431
- Eliminating re-dundant fragment shader executions on a mobile GPU via hard-ware memoization
- J.-M. Arnau, J.-M. Parcerisa, and P. Xekalakis, "Eliminating re-dundant fragment shader executions on a mobile GPU via hard-ware memoization, " ISCA, 2014.
- (2014) ISCA
- Arnau, J.-M.¹ Parcerisa, J.-M.² Xekalakis, P.³

9
- 84872693395
- Branch and data herding: Reducing control and memory divergence for error-tolerant GPU applica-tions
- J. Sartori and R. Kumar, "Branch and data herding: Reducing control and memory divergence for error-tolerant GPU applica-tions, " Multimedia, IEEE Transactions on, 2013.
- (2013) Multimedia, IEEE Transactions on
- Sartori, J.¹ Kumar, R.²

10
- 84876591853
- Neu-ral acceleration for general-purpose approximate programs
- H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger, "Neu-ral acceleration for general-purpose approximate programs, " in MICRO, 2012.
- (2012) MICRO
- Esmaeilzadeh, H.¹ Sampson, A.² Ceze, L.³ Burger, D.⁴

11
- 84905440628
- General-purpose code acceleration with limited-precision analog compu-tation
- R. S. Amant, A. Yazdanbakhsh, J. Park, B. Thwaites, H. Es-maeilzadeh, A. Hassibi, L. Ceze, and D. Burger, "General-purpose code acceleration with limited-precision analog compu-tation, " in ISCA, 2014.
- (2014) ISCA
- Amant, R.S.¹ Yazdanbakhsh, A.² Park, J.³ Thwaites, B.⁴ Es-Maeilzadeh, H.⁵ Hassibi, A.⁶ Ceze, L.⁷ Burger, D.⁸

12
- 84934325706
- BRAINIAC: Bringing reliable accuracy into neurally-implemented approxi-mate computing
- B. Grigorian, N. Farahpour, and G. Reinman, "BRAINIAC: Bringing reliable accuracy into neurally-implemented approxi-mate computing, " in HPCA, 2015.
- (2015) HPCA
- Grigorian, B.¹ Farahpour, N.² Reinman, G.³

13
- 84934280945
- SNNAP: Approximate computing on programmable socs via neural acceleration
- T. Moreau, M. Wyse, J. Nelson, A. Sampson, H. Esmaeilzadeh, L. Ceze, and M. Oskin, "SNNAP: Approximate computing on programmable socs via neural acceleration, " in HPCA, 2015.
- (2015) HPCA
- Moreau, T.¹ Wyse, M.² Nelson, J.³ Sampson, A.⁴ Esmaeilzadeh, H.⁵ Ceze, L.⁶ Oskin, M.⁷

14
- 84926041511
- EMEURO: A framework for gen-erating multi-purpose accelerators via deep learning
- L. McAfee and K. Olukotun, "EMEURO: A framework for gen-erating multi-purpose accelerators via deep learning, " in CGO, 2015.
- (2015) CGO
- McAfee, L.¹ Olukotun, K.²

15
- 84919678129
- Accelerating divergent applica-tions on SIMD architectures using neural networks
- B. Grigorian and G. Reinman, "Accelerating divergent applica-tions on SIMD architectures using neural networks, " in ICCD, 2014.
- (2014) ICCD
- Grigorian, B.¹ Reinman, G.²

16
- 79959878920
- EnerJ: Approximate data types for safe and general low-power computation
- A. Sampson, W. Dietl, E. Fortuna, D. Gnanapragasam, L. Ceze, and D. Grossman, "EnerJ: Approximate data types for safe and general low-power computation, " in PLDI, 2011.
- (2011) PLDI
- Sampson, A.¹ Dietl, W.² Fortuna, E.³ Gnanapragasam, D.⁴ Ceze, L.⁵ Grossman, D.⁶

17
- 84888167548
- Verifying quanti-tative reliability for programs that execute on unreliable hard-ware
- M. Carbin, S. Misailovic, and M. C. Rinard, "Verifying quanti-tative reliability for programs that execute on unreliable hard-ware, " in OOPSLA, 2013.
- (2013) OOPSLA
- Carbin, M.¹ Misailovic, S.² Rinard, M.C.³

18
- 84960395601
- Flexjava: Language support for safe and modular approximate programming
- J. Park, H. Esmaeilzadeh, X. Zhang, M. Naik, and W. Harris, "Flexjava: Language support for safe and modular approximate programming, " in FSE, 2015.
- (2015) FSE
- Park, J.¹ Esmaeilzadeh, H.² Zhang, X.³ Naik, M.⁴ Harris, W.⁵

19
- 84945965935
- Axilog: Language support for approximate hardware design
- A. Yazdanbakhsh, D. Mahajan, B. Thwaites, J. Park, A. Na-gendrakumar, S. Sethuraman, K. Ramkrishnan, N. Ravindran, R. Jariwala, A. Rahimi, H. Esmaeilzadeh, and K. Bazargan, "Axilog: Language support for approximate hardware design, " in DATE, 2015.
- (2015) DATE
- Yazdanbakhsh, A.¹ Mahajan, D.² Thwaites, B.³ Park, J.⁴ Na-Gendrakumar, A.⁵ Sethuraman, S.⁶ Ramkrishnan, K.⁷ Ravindran, N.⁸ Jariwala, R.⁹ Rahimi, A.¹⁰ Esmaeilzadeh, H.¹¹ Bazargan, K.¹²

20
- 84987170701
- An efficient way to find the side effects of proce-dure calls and the aliases of variables
- J. P. Banning, "An efficient way to find the side effects of proce-dure calls and the aliases of variables, " in POPL, 1979.
- (1979) POPL
- Banning, J.P.¹

21
- 0000646059
- Learning internal representations by error propagation
- D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning internal representations by error propagation, " in PDP, 1986.
- (1986) PDP
- Rumelhart, D.E.¹ Hinton, G.E.² Williams, R.J.³

22
- 84959913594
- Whitepaper: NVIDIA Fermi. " http: //www. nvidia. com.
- Whitepaper: NVIDIA Fermi

23
- 84872053761
- NVIDIA corporation
- NVIDIA corporation. NVIDIA CUDA SDK code samples. " http: //www. nvidia. com.
- NVIDIA CUDA SDK Code Samples

24
- 70649092154
- Rodinia: A benchmark suite for heterogeneous computing
- S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron, "Rodinia: A benchmark suite for heterogeneous computing, " in IISWC, 2009.
- (2009) IISWC
- Che, S.¹ Boyer, M.² Meng, J.³ Tarjan, D.⁴ Sheaffer, J.W.⁵ Lee, S.-H.⁶ Skadron, K.⁷

25
- 85026957134
- jMonkeyEngine, 2015.
- (2015) J Monkey Engine

26
- 82555191201
- Inverse kinematics solution for robotic manipulators using a CUDA-based parallel genetic algo-rithm
- O. A. Aguilar and J. C. Huegel, "Inverse kinematics solution for robotic manipulators using a cuda-based parallel genetic algo-rithm, " AAI, 2011.
- (2011) AAI
- Aguilar, O.A.¹ Huegel, J.C.²

27
- 85026960058
- A high performance implementation of likelihood estimators on GPUs
- M. Creel and M. Zubair, "A high performance implementation of likelihood estimators on gpus, " in CES, 2013.
- (2013) CES
- Creel, M.¹ Zubair, M.²

28
- 84858790858
- Archi-tecture support for disciplined approximate programming
- H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger, "Archi-tecture support for disciplined approximate programming, " in ASPLOS, 2012.
- (2012) ASPLOS
- Esmaeilzadeh, H.¹ Sampson, A.² Ceze, L.³ Burger, D.⁴

29
- 77954707631
- Green: A framework for support-ing energy-conscious programming using controlled approxima-tion
- W. Baek and T. M. Chilimbi, "Green: A framework for support-ing energy-conscious programming using controlled approxima-tion, " in PLDI, 2010.
- (2010) PLDI
- Baek, W.¹ Chilimbi, T.M.²

30
- 80053213080
- Managing performance vs. Accuracy trade-offs with loop perforation
- S. Sidiroglou-Douskos, S. Misailovic, H. Hoffmann, and M. Ri-nard, "Managing performance vs. accuracy trade-offs with loop perforation, " in FSE, 2011.
- (2011) FSE
- Sidiroglou-Douskos, S.¹ Misailovic, S.² Hoffmann, H.³ Ri-Nard, M.⁴

31
- 70349169075
- An-alyzing CUDA workloads using a detailed GPU simulator
- A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt, "An-alyzing cuda workloads using a detailed GPU simulator, " in IS-PASS, 2009.
- (2009) IS-PASS
- Bakhoda, A.¹ Yuan, G.² Fung, W.³ Wong, H.⁴ Aamodt, T.⁵

32
- 84881151222
- GPU wattch: Enabling energy optimizations in gpgpus
- J. Leng, T. Hetherington, A. ElTantawy, S. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi, "GPUWattch: Enabling energy optimizations in gpgpus, " in ISCA, 2013.
- (2013) ISCA
- Leng, J.¹ Hetherington, T.² ElTantawy, A.³ Gilani, S.⁴ Kim, N.S.⁵ Aamodt, T.M.⁶ Reddi, V.J.⁷

33
- 76749146060
- McPAT: An integrated power, area, and tim-ing modeling framework for multicore and manycore architec-tures
- S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "McPAT: An integrated power, area, and tim-ing modeling framework for multicore and manycore architec-tures, " in MICRO, 2009.
- (2009) MICRO
- Li, S.¹ Ahn, J.H.² Strong, R.D.³ Brockman, J.B.⁴ Tullsen, D.M.⁵ Jouppi, N.P.⁶

34
- 47349084021
- Op-timizing NUCA organizations and wiring alternatives for large caches with CACTI 6. 0
- N. Muralimanohar, R. Balasubramonian, and N. Jouppi, "Op-timizing NUCA organizations and wiring alternatives for large caches with CACTI 6. 0, " in MICRO, 2007.
- (2007) MICRO
- Muralimanohar, N.¹ Balasubramonian, R.² Jouppi, N.³

35
- 84876590572
- Cache-conscious wavefront scheduling
- T. G. Rogers, M. O'Connor, and T. M. Aamodt, "Cache-conscious wavefront scheduling, " in MICRO, 2012.
- (2012) MICRO
- Rogers, T.G.¹ O'Connor, M.² Aamodt, T.M.³

36
- 85026954459
- Memory access scheduling
- S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens, "Memory access scheduling, " Archit. News, 2000.
- (2000) Archit. News
- Rixner, S.¹ Dally, W.J.² Kapasi, U.J.³ Mattson, P.⁴ Owens, J.D.⁵

37
- 84977856417
- A case for core-assisted bottleneck acceleration in GPUs: Enabling efficient data compression
- N. Vijaykumar, G. Pekhimenko, A. Jog, A. Bhowmick, R. Ausavarungnirun, C. Das, M. Kandemir, T. C. Mowry, and O. Mutlu, "A case for core-assisted bottleneck acceleration in gpus: Enabling efficient data compression, " in ISCA, 2015.
- (2015) ISCA
- Vijaykumar, N.¹ Pekhimenko, G.² Jog, A.³ Bhowmick, A.⁴ Ausavarungnirun, R.⁵ Das, C.⁶ Kandemir, M.⁷ Mowry, T.C.⁸ Mutlu, O.⁹

38
- 79959885067
- Flikker: Saving refresh-power in mobile devices through crit-ical data partitioning
- S. Liu, K. Pattabiraman, T. Moscibroda, and B. G. Zorn, "Flikker: Saving refresh-power in mobile devices through crit-ical data partitioning, " in ASPLOS, 2011.
- (2011) ASPLOS
- Liu, S.¹ Pattabiraman, K.² Moscibroda, T.³ Zorn, B.G.⁴

39
- 84892535445
- Approximate storage in solid-state memories
- A. Sampson, J. Nelson, K. Strauss, and L. Ceze, "Approximate storage in solid-state memories, " in MICRO, 2013.
- (2013) MICRO
- Sampson, A.¹ Nelson, J.² Strauss, K.³ Ceze, L.⁴

40
- 34047100388
- Ultra-efficient (embed-ded) SOC architectures based on probabilistic CMOS (PCMOS) technology
- L. N. Chakrapani, B. E. S. Akgul, S. Cheemalavagu, P. Kork-maz, K. V. Palem, and B. Seshasayee, "Ultra-efficient (embed-ded) SOC architectures based on probabilistic CMOS (PCMOS) technology, " in DATE, 2006.
- (2006) DATE
- Chakrapani, L.N.¹ Akgul, B.E.S.² Cheemalavagu, S.³ Kork-Maz, P.⁴ Palem, K.V.⁵ Seshasayee, B.⁶

41
- 77953110390
- ERSA: Error resilient system architecture for probabilistic applications
- L. Leem, H. Cho, J. Bau, Q. A. Jacobson, and S. Mitra, "ERSA: Error resilient system architecture for probabilistic applications, " in DATE, 2010.
- (2010) DATE
- Leem, L.¹ Cho, H.² Bau, J.³ Jacobson, Q.A.⁴ Mitra, S.⁵

42
- 77954745730
- Quality of service profiling
- S. Misailovic, S. Sidiroglou, H. Hoffman, and M. Rinard, "Quality of service profiling, " in ICSE, 2010.
- (2010) ICSE
- Misailovic, S.¹ Sidiroglou, S.² Hoffman, H.³ Rinard, M.⁴

43
- 78650166825
- Pat-terns and statistical analysis for understanding reduced resource computing
- M. Rinard, H. Hoffmann, S. Misailovic, and S. Sidiroglou, "Pat-terns and statistical analysis for understanding reduced resource computing, " in Onward!, 2010.
- (2010) Onward!
- Rinard, M.¹ Hoffmann, H.² Misailovic, S.³ Sidiroglou, S.⁴

44
- 70450227331
- Petabricks: A language and compiler for algorithmic choice
- J. Ansel, C. Chan, Y. L. Wong, M. Olszewski, Q. Zhao, A. Edel-man, and S. Amarasinghe, "Petabricks: A language and compiler for algorithmic choice, " in PLDI, 2009.
- (2009) PLDI
- Ansel, J.¹ Chan, C.² Wong, Y.L.³ Olszewski, M.⁴ Zhao, Q.⁵ Edel-Man, A.⁶ Amarasinghe, S.⁷

45
- 85008028657
- Fuzzy memoization for floating-point multimedia applications
- C. Alvarez, J. Corbal, and M. Valero, "Fuzzy memoization for floating-point multimedia applications, " IEEE Trans. Comput., 2005.
- (2005) IEEE Trans. Comput.
- Alvarez, C.¹ Corbal, J.² Valero, M.³

46
- 77954968857
- Relax: An ar-chitectural framework for software recovery of hardware faults
- M. de Kruijf, S. Nomura, and K. Sankaralingam, "Relax: An ar-chitectural framework for software recovery of hardware faults, " in ISCA, 2010.
- (2010) ISCA
- De Kruijf, M.¹ Nomura, S.² Sankaralingam, K.³

47
- 34547697289
- Application-level correctness and its impact on fault tolerance
- X. Li and D. Yeung, "Application-level correctness and its impact on fault tolerance, " in HPCA, 2007.
- (2007) HPCA
- Li, X.¹ Yeung, D.²

48
- 70350059816
- Exploiting application-level correctness for low-cost fault tolerance
- X. Li and D. Yeung, "Exploiting application-level correctness for low-cost fault tolerance, " J. Instruction-Level Parallelism, 2008.
- (2008) J. Instruction-Level Parallelism
- Li, X.¹ Yeung, D.²

49
- 79959860111
- Exploring the synergy of emerging workloads and silicon reliability trends
- M. de Kruijf and K. Sankaralingam, "Exploring the synergy of emerging workloads and silicon reliability trends, " in SELSE, 2009.
- (2009) SELSE
- De Kruijf, M.¹ Sankaralingam, K.²

50
- 84862943500
- A fault criticality evaluation frame-work of digital systems for error tolerant video applications
- Y. Fang, H. Li, and X. Li, "A fault criticality evaluation frame-work of digital systems for error tolerant video applications, " in ATS, 2011.
- (2011) ATS
- Fang, Y.¹ Li, H.² Li, X.³

51
- 84892524324
- Quality programmable vector processors for approximate computing
- S. Venkataramani, V. K. Chippa, S. T. Chakradhar, K. Roy, and A. Raghunathan, "Quality programmable vector processors for approximate computing, " in MICRO, 2013.
- (2013) MICRO
- Venkataramani, S.¹ Chippa, V.K.² Chakradhar, S.T.³ Roy, K.⁴ Raghunathan, A.⁵

52
- 84903843071
- ASLAN: Synthesis of approximate sequential circuits
- A. Ranjan, A. Raha, S. Venkataramani, K. Roy, and A. Raghu-nathan, "ASLAN: Synthesis of approximate sequential circuits, " in DATE, 2014.
- (2014) DATE
- Ranjan, A.¹ Raha, A.² Venkataramani, S.³ Roy, K.⁴ Raghu-Nathan, A.⁵

53
- 84863541914
- SALSA: Systematic logic synthesis of approx-imate circuits
- S. Venkataramani, A. Sabne, V. Kozhikkottu, K. Roy, and A. Raghunathan, "SALSA: Systematic logic synthesis of approx-imate circuits, " in DAC, 2012.
- (2012) DAC
- Venkataramani, S.¹ Sabne, A.² Kozhikkottu, V.³ Roy, K.⁴ Raghunathan, A.⁵

54
- 84893368533
- Approximate logic synthesis under general error magnitude and frequency con-straints
- J. Miao, A. Gerstlauer, and M. Orshansky, "Approximate logic synthesis under general error magnitude and frequency con-straints, " in ICCAD, 2013.
- (2013) ICCAD
- Miao, J.¹ Gerstlauer, A.² Orshansky, M.³

55
- 84903831997
- ABACUS: A tech-nique for automated behavioral synthesis of approximate com-puting circuits
- K. Nepal, Y. Li, R. I. Bahar, and S. Reda, "ABACUS: A tech-nique for automated behavioral synthesis of approximate com-puting circuits, " in DATE, 2014.
- (2014) DATE
- Nepal, K.¹ Li, Y.² Bahar, R.I.³ Reda, S.⁴

56
- 84878512735
- Synthesizing parsimonious inexact circuits through probabilistic design tech-niques
- A. Lingamneni, C. Enz, K. Palem, and C. Piguet, "Synthesizing parsimonious inexact circuits through probabilistic design tech-niques, " ACM Trans. Embed. Comput. Syst., 2013.
- (2013) ACM Trans. Embed. Comput. Syst.
- Lingamneni, A.¹ Enz, C.² Palem, K.³ Piguet, C.⁴

57
- 84862690555
- Algorithmic methodologies for ultra-efficient inexact architectures for sustaining technology scaling
- A. Lingamneni, K. K. Muntimadugu, C. Enz, R. M. Karp, K. V. Palem, and C. Piguet, "Algorithmic methodologies for ultra-efficient inexact architectures for sustaining technology scaling, " in CF, 2012.
- (2012) CF
- Lingamneni, A.¹ Muntimadugu, K.K.² Enz, C.³ Karp, R.M.⁴ Palem, K.V.⁵ Piguet, C.⁶

58
- 84881175680
- Con-tinuous real-world inputs can open up alternative accelerator de-signs
- B. Belhadj, A. Joubert, Z. Li, R. Heliot, and O. Temam, "Con-tinuous real-world inputs can open up alternative accelerator de-signs, " in ISCA, 2013.
- (2013) ISCA
- Belhadj, B.¹ Joubert, A.² Li, Z.³ Heliot, R.⁴ Temam, O.⁵

59
- 84897884384
- Leveraging the error resilience of machine-learning ap-plications for designing highly energy efficient accelerators
- Z. Du, A. Lingamneni, Y. Chen, K. Palem, O. Temam, and C. Wu, "Leveraging the error resilience of machine-learning ap-plications for designing highly energy efficient accelerators, " in ASP-DAC, 2014.
- (2014) ASP-DAC
- Du, Z.¹ Lingamneni, A.² Chen, Y.³ Palem, K.⁴ Temam, O.⁵ Wu, C.⁶

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.