SCOPUS 정보 검색 플랫폼

Proceedings - 2016 43rd International Symposium on Computer Architecture, ISCA 2016

Volumn , Issue , 2016, Pages 243-254

EIE: Efficient Inference Engine on Compressed Deep Neural Network

(7) Han, Song a Liu, Xingyu a Mao, Huizi a Pu, Jing a Pedram, Ardavan a Horowitz, Mark A a Dally, William J b

a STANFORD UNIVERSITY (United States)

b NVIDIA (United States)

Author keywords

Algorithm Hardware co Design; ASIC; Deep Learning; Hardware Acceleration; Model Compression

Indexed keywords

APPLICATION SPECIFIC INTEGRATED CIRCUITS; BUDGET CONTROL; COMPUTER ARCHITECTURE; COMPUTER HARDWARE; DYNAMIC RANDOM ACCESS STORAGE; EMBEDDED SYSTEMS; ENERGY CONSERVATION; ENGINES; HARDWARE; NETWORK ARCHITECTURE; RECONFIGURABLE HARDWARE; STATIC RANDOM ACCESS STORAGE;

CO-DESIGNS; DEEP LEARNING; DEEP NEURAL NETWORKS; HARDWARE ACCELERATION; MODEL COMPRESSION; MULTIPLE CONNECTIONS; REDUNDANT CONNECTIONS; SPARSE MATRIX-VECTOR MULTIPLICATION;

ENERGY EFFICIENCY;

EID: 84988443578 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ISCA.2016.30 Document Type: Conference Paper

Times cited : (2598)

References (45)

1
- 84876231242
- Imagenet classification with deep convolutional neural networks
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in NIPS, 2012.
- (2012) NIPS
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

2
- 84964983441
- arXiv:1409.4842
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions," arXiv:1409.4842, 2014.
- (2014) Going Deeper with Convolutions
- Szegedy, C.¹ Liu, W.² Jia, Y.³ Sermanet, P.⁴ Reed, S.⁵ Anguelov, D.⁶ Erhan, D.⁷ Vanhoucke, V.⁸ Rabinovich, A.⁹

3
- 84925410541
- arXiv:1409.1556
- K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv:1409.1556, 2014.
- (2014) Very Deep Convolutional Networks for Large-scale Image Recognition
- Simonyan, K.¹ Zisserman, A.²

4
- 79959829092
- Recurrent neural network based language model
- September 26-30
- T. Mikolov, M. Karafiát, L. Burget, J. Cernocky, and S. Khudanpur, "Recurrent neural network based language model." in INTERSPEECH, September 26-30, 2010, 2010, pp. 1045-1048.
- (2010) INTERSPEECH , pp. 1045-1048
- Mikolov, T.¹ Karafiát, M.² Burget, L.³ Cernocky, J.⁴ Khudanpur, S.⁵

5
- 0032203257
- Gradient-based learning applied to document recognition
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
- (1998) Proceedings of the IEEE , vol.86 , Issue.11 , pp. 2278-2324
- LeCun, Y.¹ Bottou, L.² Bengio, Y.³ Haffner, P.⁴

6
- 84911198048
- Deepface: Closing the gap to human-level performance in face verification
- Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, "Deepface: Closing the gap to human-level performance in face verification," in CVPR. IEEE, 2014, pp. 1701-1708.
- (2014) CVPR. IEEE , pp. 1701-1708
- Taigman, Y.¹ Yang, M.² Ranzato, M.³ Wolf, L.⁴

7
- 84942676733
- arXiv:1412.2306
- A. Karpathy and L. Fei-Fei, "Deep visual-semantic alignments for generating image descriptions," arXiv:1412.2306, 2014.
- (2014) Deep Visual-semantic Alignments for Generating Image Descriptions
- Karpathy, A.¹ Fei-Fei, L.²

8
- 84897484337
- Deep learning with cots hpc systems
- A. Coates, B. Huval, T. Wang, D. Wu, B. Catanzaro, and N. Andrew, "Deep learning with cots hpc systems," in 30th ICML, 2013.
- (2013) 30th ICML
- Coates, A.¹ Huval, B.² Wang, T.³ Wu, D.⁴ Catanzaro, B.⁵ Andrew, N.⁶

9
- 84988351861
- Stanford VLSI wiki
- M. Horowitz. Energy table for 45nm process, Stanford VLSI wiki. [Online]. Available: https://sites.google.com/site/seecproject
- Energy Table for 45nm Process
- Horowitz, M.¹

10
- 84897780584
- Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning
- T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, "Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning," in ASPLOS, 2014.
- (2014) ASPLOS
- Chen, T.¹ Du, Z.² Sun, N.³ Wang, J.⁴ Wu, C.⁵ Chen, Y.⁶ Temam, O.⁷

11
- 84988406311
- Dadiannao: A machine-learning supercomputer
- December
- Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, and O. Temam, "Dadiannao: A machine-learning supercomputer," in MICRO, December 2014.
- (2014) MICRO
- Chen, Y.¹ Luo, T.² Liu, S.³ Zhang, S.⁴ He, L.⁵ Wang, J.⁶ Li, L.⁷ Chen, T.⁸ Xu, Z.⁹ Sun, N.¹⁰ Temam, O.¹¹

12
- 84959912559
- Shidiannao: Shifting vision processing closer to the sensor
- Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, X. Feng, Y. Chen, and O. Temam, "Shidiannao: shifting vision processing closer to the sensor," in ISCA. ACM, 2015, pp. 92-104.
- (2015) ISCA. ACM , pp. 92-104
- Du, Z.¹ Fasthuber, R.² Chen, T.³ Ienne, P.⁴ Li, L.⁵ Luo, T.⁶ Feng, X.⁷ Chen, Y.⁸ Temam, O.⁹

13
- 70450060046
- Cnp: An FPGA-based processor for convolutional networks
- C. Farabet, C. Poulet, J. Y. Han, and Y. LeCun, "Cnp: An FPGA-based processor for convolutional networks," in FPL, 2009.
- (2009) FPL
- Farabet, C.¹ Poulet, C.² Han, J.Y.³ LeCun, Y.⁴

14
- 84966533810
- Going deeper with embedded FPGA platform for convolutional neural network
- J. Qiu, J. Wang, S. Yao, K. Guo, B. Li, E. Zhou, J. Yu, T. Tang, N. Xu, S. Song, Y. Wang, and H. Yang, "Going deeper with embedded FPGA platform for convolutional neural network," in FPGA, 2016.
- (2016) FPGA
- Qiu, J.¹ Wang, J.² Yao, S.³ Guo, K.⁴ Li, B.⁵ Zhou, E.⁶ Yu, J.⁷ Tang, T.⁸ Xu, N.⁹ Song, S.¹⁰ Wang, Y.¹¹ Yang, H.¹²

15
- 84988345240
- ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars
- A. Shafiee and et al., "ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars," ISCA, 2016.
- (2016) ISCA
- Shafiee, A.¹

16
- 84965140688
- Learning both weights and connections for efficient neural networks
- S. Han, J. Pool, J. Tran, and W. J. Dally, "Learning both weights and connections for efficient neural networks," in Proceedings of Advances in Neural Information Processing Systems, 2015.
- (2015) Proceedings of Advances in Neural Information Processing Systems
- Han, S.¹ Pool, J.² Tran, J.³ Dally, W.J.⁴

17
- 84955316677
- arXiv:1504.08083
- R. Girshick, "Fast R-CNN," arXiv:1504.08083, 2015.
- (2015) Fast R-CNN
- Girshick, R.¹

18
- 0031573117
- Long short-term memory
- S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, 1997.
- (1997) Neural Computation
- Hochreiter, S.¹ Schmidhuber, J.²

19
- 27744588611
- Framewise phoneme classification with bidirectional lstm and other neural network architectures
- A. Graves and J. Schmidhuber, "Framewise phoneme classification with bidirectional lstm and other neural network architectures," Neural Networks, 2005.
- (2005) Neural Networks
- Graves, A.¹ Schmidhuber, J.²

20
- 84942430054
- Can deep learning revolutionize mobile sensing?
- N. D. Lane and P. Georgiev, "Can deep learning revolutionize mobile sensing" in International Workshop on Mobile Computing Systems and Applications. ACM, 2015, pp. 117-122.
- (2015) International Workshop on Mobile Computing Systems and Applications. ACM , pp. 117-122
- Lane, N.D.¹ Georgiev, P.²

21
- 84898959963
- A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs
- Richard Dorrance and Fengbo Ren and Dejan Markovíc, "A Scalable Sparse Matrix-vector Multiplication Kernel for Energy-efficient Sparse-blas on FPGAs," in FPGA, 2014.
- (2014) FPGA
- Dorrance, R.¹ Ren, F.² Markovíc, D.³

22
- 77956509090
- Rectified linear units improve restricted boltzmann machines
- V. Nair and G. E. Hinton, "Rectified linear units improve restricted boltzmann machines," in ICML, 2010.
- (2010) ICML
- Nair, V.¹ Hinton, G.E.²

23
- 85083950579
- Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding
- S. Han, H. Mao, and W. J. Dally, "Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding," International Conference on Learning Representations 2016.
- (2016) International Conference on Learning Representations
- Han, S.¹ Mao, H.² Dally, W.J.³

24
- 10044233808
- Ph.D. dissertation, UC Berkeley
- R. W. Vuduc, "Automatic performance tuning of sparse matrix kernels," Ph.D. dissertation, UC Berkeley, 2003.
- (2003) Automatic Performance Tuning of Sparse Matrix Kernels
- Vuduc, R.W.¹

25
- 84859005455
- Cacti 6.0: A tool to model large caches
- N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi, "Cacti 6.0: A tool to model large caches," HP Laboratories, pp. 22-31, 2009.
- (2009) HP Laboratories , pp. 22-31
- Muralimanohar, N.¹ Balasubramonian, R.² Jouppi, N.P.³

26
- 84988355721
- NVIDIA
- NVIDIA. Technical brief: NVIDIA jetson TK1 development kit bringing GPU-accelerated computing to embedded systems.
- Technical Brief: NVIDIA Jetson TK1 Development Kit Bringing GPU-accelerated Computing to Embedded Systems

27
- 84988414938
- NVIDIA
- NVIDIA. Whitepaper: GPU-based deep learning inference: A performance and power analysis.
- Whitepaper: GPU-based Deep Learning Inference: A Performance and Power Analysis

28
- 84913555165
- arXiv:1408.5093
- Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, "Caffe: Convolutional architecture for fast feature embedding," arXiv:1408.5093, 2014.
- (2014) Caffe: Convolutional Architecture for Fast Feature Embedding
- Jia, Y.¹ Shelhamer, E.² Donahue, J.³ Karayev, S.⁴ Long, J.⁵ Girshick, R.⁶ Guadarrama, S.⁷ Darrell, T.⁸

29
- 84960161169
- Imagenet: A large-scale hierarchical image database
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in Computer Vision and Pattern Recognition. 2009.
- (2009) Computer Vision and Pattern Recognition
- Deng, J.¹ Dong, W.² Socher, R.³ Li, L.-J.⁴ Li, K.⁵ Fei-Fei, L.⁶

30
- 84988340112
- arXiv:1602.07360
- F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, "Squeezenet: Alexnet-level accuracy with 50x fewer parameters and 0.5mb model size," arXiv:1602.07360, 2016.
- (2016) Squeezenet: Alexnet-level Accuracy with 50x Fewer Parameters And 0.5mb Model Size
- Iandola, F.N.¹ Han, S.² Moskewicz, M.W.³ Ashraf, K.⁴ Dally, W.J.⁵ Keutzer, K.⁶

31
- 20344389052
- Sparse matrix-vector multiplication on FPGAS
- Ling Zhuo and Viktor K. Prasanna, "Sparse Matrix-Vector Multiplication on FPGAs," in FPGA, 2005.
- (2005) FPGA
- Zhuo, L.¹ Prasanna, V.K.²

32
- 12444270445
- V. Eijkhout, LAPACK working note 50: Distributed sparse data structures for linear algebra operations, 1992.
- (1992) LAPACK Working Note 50: Distributed Sparse Data Structures for Linear Algebra Operations
- Eijkhout, V.¹

33
- 85017247188
- arXiv:1509.09308
- A. Lavin, "Fast algorithms for convolutional neural networks," arXiv:1509.09308, 2015.
- (2015) Fast Algorithms for Convolutional Neural Networks
- Lavin, A.¹

34
- 0000991092
- Comparing biases for minimal network construction with back-propagation
- S. J. Hanson and L. Y. Pratt, "Comparing biases for minimal network construction with back-propagation," in NIPS, 1989.
- (1989) NIPS
- Hanson, S.J.¹ Pratt, L.Y.²

35
- 0000494466
- Optimal brain damage
- Y. LeCun, J. S. Denker, S. A. Solla, R. E. Howard, and L. D. Jackel, "Optimal brain damage." in NIPs, vol. 89, 1989.
- (1989) NIPs , vol.89
- LeCun, Y.¹ Denker, J.S.² Solla, S.A.³ Howard, R.E.⁴ Jackel, L.D.⁵

36
- 0001234705
- Second order derivatives for network pruning: Optimal brain surgeon
- B. Hassibi, D. G. Stork et al., "Second order derivatives for network pruning: Optimal brain surgeon," Advances in neural information processing systems, pp. 164-164, 1993.
- (1993) Advances in Neural Information Processing Systems , pp. 164
- Hassibi, B.¹ Stork, D.G.²

37
- 84937896655
- Exploiting linear structure within convolutional networks for efficient evaluation
- E. L. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus, "Exploiting linear structure within convolutional networks for efficient evaluation," in NIPS 2014.
- (2014) NIPS
- Denton, E.L.¹ Zaremba, W.² Bruna, J.³ LeCun, Y.⁴ Fergus, R.⁵

38
- 84965111647
- arXiv:1411.4229
- X. Zhang, J. Zou, X. Ming, K. He, and J. Sun, "Efficient and accurate approximations of nonlinear convolutional networks," arXiv:1411.4229, 2014.
- (2014) Efficient and Accurate Approximations of Nonlinear Convolutional Networks
- Zhang, X.¹ Zou, J.² Ming, X.³ He, K.⁴ Sun, J.⁵

39
- 84988349874
- Minerva: Enabling low-power, highly-accurate deep neural network accelerators
- B. Reagen, P. Whatmough, R. Adolf, S. Rama, H. Lee, S. K. Lee, J. M. Hernndez-Lobato, G.-Y. Wei, and D. Brooks, "Minerva: Enabling low-power, highly-accurate deep neural network accelerators," ISCA, 2016.
- (2016) ISCA
- Reagen, B.¹ Whatmough, P.² Adolf, R.³ Rama, S.⁴ Lee, H.⁵ Lee, S.K.⁶ Hernndez-Lobato, J.M.⁷ Wei, G.-Y.⁸ Brooks, D.⁹

40
- 85015256754
- arXiv:1603.08270
- S. K. Esser and et al., "Convolutional networks for fast, energyefficient neuromorphic computing," arXiv:1603.08270, 2016.
- (2016) Convolutional Networks for Fast, Energyefficient Neuromorphic Computing
- Esser, S.K.¹

41
- 84962921765
- Optimizing FPGA-based accelerator design for deep convolutional neural networks
- C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, "Optimizing FPGA-based accelerator design for deep convolutional neural networks," in FPGA, 2015.
- (2015) FPGA
- Zhang, C.¹ Li, P.² Sun, G.³ Guan, Y.⁴ Xiao, B.⁵ Cong, J.⁶

42
- 79955066714
- Automatically tuning sparse matrix-vector multiplication for GPU architectures
- Alexander Monakov and Anton Lokhmotov and Arutyun Avetisyan, "Automatically tuning sparse matrix-vector multiplication for GPU architectures," in HiPEAC, 2010.
- (2010) HiPEAC
- Monakov, A.¹ Lokhmotov, A.² Avetisyan, A.³

43
- 70350368872
- Nvidia Technical Report NVR-2008-004 Tech. Rep.
- N. Bell and M. Garland, "Efficient sparse matrix-vector multiplication on cuda," Nvidia Technical Report NVR-2008-004, Tech. Rep., 2008.
- (2008) Efficient Sparse Matrix-vector Multiplication on CUDA
- Bell, N.¹ Garland, M.²

44
- 74049143158
- Implementing sparse matrixvector multiplication on throughput-oriented processors
- Bell, Nathan and Garland, Michael, "Implementing Sparse Matrixvector Multiplication on Throughput-oriented Processors," in High Performance Computing Networking, Storage and Analysis, 2009.
- (2009) High Performance Computing Networking, Storage and Analysis
- Bell, N.¹ Garland, M.²

45
- 84912524416
- A high memory bandwidth FPGA accelerator for sparse matrixvector multiplication
- J. Fowers and K. Ovtcharov and K. Strauss and E.S. Chung and G. Stitt, "A high memory bandwidth FPGA accelerator for sparse matrixvector multiplication," in FCCM, 2014.
- (2014) FCCM
- Fowers, J.¹ Ovtcharov, K.² Strauss, K.³ Chung, E.S.⁴ Stitt, G.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.