SCOPUS 정보 검색 플랫폼

Concurrency and Computation: Practice and Experience

Volumn 29, Issue 20, 2017, Pages

FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency

(6) Qiao, Yuran a Shen, Junzhong a Xiao, Tao a Yang, Qianming a Wen, Mei a Zhang, Chunyuan a

a NATIONAL UNIVERSITY OF DEFENSE TECHNOLOGY (China)

Author keywords

Accelerator; Caffe; CNN; FPGA; Matrix Multiplier

Indexed keywords

ACCELERATION; APPLICATION PROGRAMS; COMPUTER PROGRAMMING; CONVOLUTION; DEEP NEURAL NETWORKS; ENERGY EFFICIENCY; FIELD PROGRAMMABLE GATE ARRAYS (FPGA); MATRIX ALGEBRA; MEMORY ARCHITECTURE; PARTICLE ACCELERATORS; PROGRAM PROCESSORS; SYSTEM-ON-CHIP;

ACCELERATOR ARCHITECTURES; APPLICATION DEVELOPERS; CAFFE; COMPUTATIONAL PERFORMANCE; COMPUTATIONAL WORKLOAD; HIGH ENERGY EFFICIENCY; PERFORMANCE PORTABILITY; PRACTICE AND EXPERIENCE;

CONVOLUTIONAL NEURAL NETWORKS;

EID: 84966447574 PISSN: 15320626 EISSN: 15320634 Source Type: Journal
DOI: 10.1002/cpe.3850 Document Type: Conference Paper

Times cited : (47)

References (28)

1
- 84876231242
- Advances in Neural Information Processing Systems, Curran Associates, Inc., Lake Tahoe, Nevada, USA
- Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, Curran Associates, Inc.: Lake Tahoe, Nevada, USA, 2012; 1097–1105.
- (2012) Imagenet classification with deep convolutional neural networks , pp. 1097-1105
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

2
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups
- Hinton G, Deng L, Yu D, Dahl GE, Mohamed Ar, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN, Kingsbury B. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. Signal Processing Magazine, IEEE 2012; 29(6):82–97.
- (2012) Signal Processing Magazine, IEEE , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.E.⁴ Mohamed, A.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.N.¹⁰ Kingsbury, B.¹¹

3
- 84911400494
- 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE,, Columbus, OH, USA
- Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR): IEEE, Columbus, OH, USA, 2014; 580–587.
- (2014) Rich feature hierarchies for accurate object detection and semantic segmentation , pp. 580-587
- Girshick, R.¹ Donahue, J.² Darrell, T.³ Malik, J.⁴

4
- 84908529622
- In, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE,, Columbus, OH, USA
- Gokhale V, Jin J, Dundar A, Martini B, Culurciello E. A 240 G-ops/s mobile coprocessor for deep neural networks. In 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW): IEEE, Columbus, OH, USA, 2014; 696–701.
- (2014) A 240 G-ops/s mobile coprocessor for deep neural networks , pp. 696-701
- Gokhale, V.¹ Jin, J.² Dundar, A.³ Martini, B.⁴ Culurciello, E.⁵

5
- 84946878588
- Microsoft Research Whitepaper, Microsoft Research
- Ovtcharov K, Ruwase O, Kim JY, Fowers J, Strauss K, Chung ES. Accelerating deep convolutional neural networks using specialized hardware. Microsoft Research Whitepaper: Microsoft Research, 2015.
- (2015) Accelerating deep convolutional neural networks using specialized hardware
- Ovtcharov, K.¹ Ruwase, O.² Kim, J.Y.³ Fowers, J.⁴ Strauss, K.⁵ Chung, E.S.⁶

6
- 84919463060
- GPU implementation of a parallel two-list algorithm for the subset-sum problem
- Wan L, Li K, Liu J, Li K. GPU implementation of a parallel two-list algorithm for the subset-sum problem. Concurrency Computation Practice Experience 2015; 27(1):119–145.
- (2015) Concurrency Computation Practice Experience , vol.27 , Issue.1 , pp. 119-145
- Wan, L.¹ Li, K.² Liu, J.³ Li, K.⁴

7
- 84928685154
- An iteration-based hybrid parallel algorithm for tridiagonal systems of equations on multi-core architectures
- Tang G, Yang W, Li K, Ye Y, Xiao G, Li K. An iteration-based hybrid parallel algorithm for tridiagonal systems of equations on multi-core architectures. Concurrency Computation Practice Experience 2015; 27(17):5076–5095.
- (2015) Concurrency Computation Practice Experience , vol.27 , Issue.17 , pp. 5076-5095
- Tang, G.¹ Yang, W.² Li, K.³ Ye, Y.⁴ Xiao, G.⁵ Li, K.⁶

8
- 85007407966
- July
- NVIDIA. TESLA K20 GPU accelerator board specification, July 2013.
- (2013) TESLA K20 GPU accelerator board specification

9
- 70450060046
- FPL 2009. International Conference on Field Programmable Logic and Applications, 2009, IEEE,, Prague, Czech Republic
- Farabet C, Poulet C, Han JY, LeCun Y. CNP: an FPGA-based processor for convolutional networks. FPL 2009. International Conference on Field Programmable Logic and Applications, 2009: IEEE, Prague, Czech Republic, 2009; 32–37.
- (2009) CNP: an FPGA-based processor for convolutional networks , pp. 32-37
- Farabet, C.¹ Poulet, C.² Han, J.Y.³ LeCun, Y.⁴

10
- 84962921765
- Optimizing FPGA-based accelerator design for deep convolutional neural networks
- ACM,, Monterey, CA, USA
- Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J. Optimizing FPGA-based accelerator design for deep convolutional neural networks. Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays: ACM, Monterey, CA, USA, 2015; 161–170.
- (2015) Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays , pp. 161-170
- Zhang, C.¹ Li, P.² Sun, G.³ Guan, Y.⁴ Xiao, B.⁵ Cong, J.⁶

11
- 78149249904
- A programmable parallel accelerator for learning and classification
- ACM,, New York, NY, USA
- Cadambi S, Majumdar A, Becchi M, Chakradhar S, Graf HP. A programmable parallel accelerator for learning and classification. Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques: ACM, New York, NY, USA, 2010; 273–284.
- (2010) Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques , pp. 273-284
- Cadambi, S.¹ Majumdar, A.² Becchi, M.³ Chakradhar, S.⁴ Graf, H.P.⁵

12
- 84946878550
- Deep learning with limited numerical precision
- arXiv1502.02551.
- Gupta S, Agrawal A, Gopalakrishnan K, Narayanan P. Deep learning with limited numerical precision. arXiv preprint 2015. arXiv:1502.02551.
- (2015) arXiv preprint
- Gupta, S.¹ Agrawal, A.² Gopalakrishnan, K.³ Narayanan, P.⁴

13
- 84913580146
- Caffe: convolutional architecture for fast feature embedding
- ACM,, Orlando, Florida, USA
- Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe: convolutional architecture for fast feature embedding. Proceedings of the ACM International Conference on Multimedia: ACM, Orlando, Florida, USA, 2014; 675–678.
- (2014) Proceedings of the ACM International Conference on Multimedia , pp. 675-678
- Jia, Y.¹ Shelhamer, E.² Donahue, J.³ Karayev, S.⁴ Long, J.⁵ Girshick, R.⁶ Guadarrama, S.⁷ Darrell, T.⁸

14
- 85029668613
- BigLearn, NIPS Workshop, Granada, Spain
- Collobert R, Kavukcuoglu K, Farabet C. Torch7: a matlab-like environment for machine learning. BigLearn, NIPS Workshop, Granada, Spain, 2011.
- (2011) Torch7: a matlab-like environment for machine learning
- Collobert, R.¹ Kavukcuoglu, K.² Farabet, C.³

15
- 84919744750
- Ph.D. Thesis, Université Paris-Est
- Farabet C. Towards real-time image understanding with convolutional networks. Ph.D. Thesis, Université Paris-Est, 2013.
- (2013) Towards real-time image understanding with convolutional networks
- Farabet, C.¹

16
- 20344376214
- 64-bit floating-point FPGA matrix multiplication
- ACM,, New York, NY, USA
- Dou Y, Vassiliadis S, Kuzmanov GK, Gaydadjiev GN. 64-bit floating-point FPGA matrix multiplication. Proceedings of the 2005 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays: ACM, New York, NY, USA, 2005; 86–95.
- (2005) Proceedings of the 2005 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays , pp. 86-95
- Dou, Y.¹ Vassiliadis, S.² Kuzmanov, G.K.³ Gaydadjiev, G.N.⁴

17
- 0242618016
- Prentice Hall Upper Saddle River, New Jersey, USA
- Gorman Mel. Understanding the Linux Virtual Memory Manager. Prentice Hall Upper Saddle River: New Jersey, USA, 2004.
- (2004) Understanding the Linux Virtual Memory Manager
- Gorman, M.¹

18
- 84959333505
- Unified virtual memory support for deep CNN accelerator on SoC FPGA
- In, Springer, Zhangjiajie, China
- Xiao T, Qiao Y, Shen J, Yang Q, Wen M. Unified virtual memory support for deep CNN accelerator on SoC FPGA. In Algorithms and Architectures for Parallel Processing. Springer: Zhangjiajie, China 2015; 64–76.
- (2015) Algorithms and Architectures for Parallel Processing , pp. 64-76
- Xiao, T.¹ Qiao, Y.² Shen, J.³ Yang, Q.⁴ Wen, M.⁵

19
- 80054919955
- 2011 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE,, Colorado Springs, CO, USA
- Farabet C, Martini B, Corda B, Akselrod P, Culurciello E, LeCun Y. Neuflow: a runtime reconfigurable dataflow processor for vision. 2011 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW): IEEE, Colorado Springs, CO, USA, 2011; 109–116.
- (2011) Neuflow: a runtime reconfigurable dataflow processor for vision , pp. 109-116
- Farabet, C.¹ Martini, B.² Corda, B.³ Akselrod, P.⁴ Culurciello, E.⁵ LeCun, Y.⁶

20
- 84946769679
- Learning face representation from scratch
- Yi D, Lei Z, Liao S, Li SZ. Learning face representation from scratch. arXiv preprint 2014: arXiv:1411.7923.
- (2014) arXiv preprint
- Yi, D.¹ Lei, Z.² Liao, S.³ Li, S.Z.⁴

21
- 77955007393
- A dynamically configurable coprocessor for convolutional neural networks
- Chakradhar S, Sankaradas M, Jakkula V, Cadambi S. A dynamically configurable coprocessor for convolutional neural networks. ACM SIGARCH Computer Architecture News 2010; 38(3):247–257.
- (2010) ACM SIGARCH Computer Architecture News , vol.38 , Issue.3 , pp. 247-257
- Chakradhar, S.¹ Sankaradas, M.² Jakkula, V.³ Cadambi, S.⁴

22
- 84892533708
- 2013 IEEE 31st International Conference on Computer Design (ICCD), IEEE
- Peemen M, Setio AAA, Mesman B, Corporaal H. Memory-centric accelerator design for convolutional neural networks. 2013 IEEE 31st International Conference on Computer Design (ICCD): IEEE, 2013; 13–19.
- (2013) Memory-centric accelerator design for convolutional neural networks , pp. 13-19
- Peemen, M.¹ Setio, A.A.A.² Mesman, B.³ Corporaal, H.⁴

23
- 84897780584
- Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning
- ACM
- Chen T, Du Z, Sun N, Wang J, Wu C, Chen Y, Temam O. Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems: ACM, 2014; 269–284.
- (2014) Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems , pp. 269-284
- Chen, T.¹ Du, Z.² Sun, N.³ Wang, J.⁴ Wu, C.⁵ Chen, Y.⁶ Temam, O.⁷

24
- 84944081816
- CuDNN: Efficient primitives for deep learning
- Chetlur S, Woolley C, Vandermersch P, Cohen J, Tran J, Catanzaro B, Shelhamer E. CuDNN: Efficient primitives for deep learning. arXiv preprint 2014: arXiv:1410.0759.
- (2014) arXiv preprint
- Chetlur, S.¹ Woolley, C.² Vandermersch, P.³ Cohen, J.⁴ Tran, J.⁵ Catanzaro, B.⁶ Shelhamer, E.⁷

25
- 84920152252
- Accuracy evaluation of deep belief networks with fixed-point arithmetic
- Jiang J, Hu R, Mikel L, Dou Y. Accuracy evaluation of deep belief networks with fixed-point arithmetic. Computer Modelling & New Technologies 2014; 18(6):7–14.
- (2014) Computer Modelling & New Technologies , vol.18 , Issue.6 , pp. 7-14
- Jiang, J.¹ Hu, R.² Mikel, L.³ Dou, Y.⁴

26
- 84966674121
- Learning both weights and connections for efficient neural networks
- Han S, Pool J, Tran J, Dally WJ. Learning both weights and connections for efficient neural networks. arXiv preprint 2015: arXiv:1506.02626.
- (2015) arXiv preprint
- Han, S.¹ Pool, J.² Tran, J.³ Dally, W.J.⁴

27
- 84919470072
- Performance analysis and optimization for SPMV on GPU using probabilistic modeling
- Li K, Yang W, Li K. Performance analysis and optimization for SPMV on GPU using probabilistic modeling. IEEE Transactions on Parallel and Distributed Systems 2015; 26(1):196–205.
- (2015) IEEE Transactions on Parallel and Distributed Systems , vol.26 , Issue.1 , pp. 196-205
- Li, K.¹ Yang, W.² Li, K.³

28
- 84939230567
- Performance optimization using partitioned SPMV on GPUs and multicore CPUs
- Yang W, Li K, Mo Z, Li K. Performance optimization using partitioned SPMV on GPUs and multicore CPUs. IEEE Transactions on Computers 2015; 64(9):2623–2636.
- (2015) IEEE Transactions on Computers , vol.64 , Issue.9 , pp. 2623-2636
- Yang, W.¹ Li, K.² Mo, Z.³ Li, K.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.