-
1
-
-
84876231242
-
-
Advances in Neural Information Processing Systems, Curran Associates, Inc., Lake Tahoe, Nevada, USA
-
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, Curran Associates, Inc.: Lake Tahoe, Nevada, USA, 2012; 1097–1105.
-
(2012)
Imagenet classification with deep convolutional neural networks
, pp. 1097-1105
-
-
Krizhevsky, A.1
Sutskever, I.2
Hinton, G.E.3
-
2
-
-
85032751458
-
Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups
-
Hinton G, Deng L, Yu D, Dahl GE, Mohamed Ar, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN, Kingsbury B. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. Signal Processing Magazine, IEEE 2012; 29(6):82–97.
-
(2012)
Signal Processing Magazine, IEEE
, vol.29
, Issue.6
, pp. 82-97
-
-
Hinton, G.1
Deng, L.2
Yu, D.3
Dahl, G.E.4
Mohamed, A.5
Jaitly, N.6
Senior, A.7
Vanhoucke, V.8
Nguyen, P.9
Sainath, T.N.10
Kingsbury, B.11
-
3
-
-
84911400494
-
-
2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE,, Columbus, OH, USA
-
Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR): IEEE, Columbus, OH, USA, 2014; 580–587.
-
(2014)
Rich feature hierarchies for accurate object detection and semantic segmentation
, pp. 580-587
-
-
Girshick, R.1
Donahue, J.2
Darrell, T.3
Malik, J.4
-
4
-
-
84908529622
-
-
In, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE,, Columbus, OH, USA
-
Gokhale V, Jin J, Dundar A, Martini B, Culurciello E. A 240 G-ops/s mobile coprocessor for deep neural networks. In 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW): IEEE, Columbus, OH, USA, 2014; 696–701.
-
(2014)
A 240 G-ops/s mobile coprocessor for deep neural networks
, pp. 696-701
-
-
Gokhale, V.1
Jin, J.2
Dundar, A.3
Martini, B.4
Culurciello, E.5
-
5
-
-
84946878588
-
-
Microsoft Research Whitepaper, Microsoft Research
-
Ovtcharov K, Ruwase O, Kim JY, Fowers J, Strauss K, Chung ES. Accelerating deep convolutional neural networks using specialized hardware. Microsoft Research Whitepaper: Microsoft Research, 2015.
-
(2015)
Accelerating deep convolutional neural networks using specialized hardware
-
-
Ovtcharov, K.1
Ruwase, O.2
Kim, J.Y.3
Fowers, J.4
Strauss, K.5
Chung, E.S.6
-
6
-
-
84919463060
-
GPU implementation of a parallel two-list algorithm for the subset-sum problem
-
Wan L, Li K, Liu J, Li K. GPU implementation of a parallel two-list algorithm for the subset-sum problem. Concurrency Computation Practice Experience 2015; 27(1):119–145.
-
(2015)
Concurrency Computation Practice Experience
, vol.27
, Issue.1
, pp. 119-145
-
-
Wan, L.1
Li, K.2
Liu, J.3
Li, K.4
-
7
-
-
84928685154
-
An iteration-based hybrid parallel algorithm for tridiagonal systems of equations on multi-core architectures
-
Tang G, Yang W, Li K, Ye Y, Xiao G, Li K. An iteration-based hybrid parallel algorithm for tridiagonal systems of equations on multi-core architectures. Concurrency Computation Practice Experience 2015; 27(17):5076–5095.
-
(2015)
Concurrency Computation Practice Experience
, vol.27
, Issue.17
, pp. 5076-5095
-
-
Tang, G.1
Yang, W.2
Li, K.3
Ye, Y.4
Xiao, G.5
Li, K.6
-
9
-
-
70450060046
-
-
FPL 2009. International Conference on Field Programmable Logic and Applications, 2009, IEEE,, Prague, Czech Republic
-
Farabet C, Poulet C, Han JY, LeCun Y. CNP: an FPGA-based processor for convolutional networks. FPL 2009. International Conference on Field Programmable Logic and Applications, 2009: IEEE, Prague, Czech Republic, 2009; 32–37.
-
(2009)
CNP: an FPGA-based processor for convolutional networks
, pp. 32-37
-
-
Farabet, C.1
Poulet, C.2
Han, J.Y.3
LeCun, Y.4
-
10
-
-
84962921765
-
Optimizing FPGA-based accelerator design for deep convolutional neural networks
-
ACM,, Monterey, CA, USA
-
Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J. Optimizing FPGA-based accelerator design for deep convolutional neural networks. Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays: ACM, Monterey, CA, USA, 2015; 161–170.
-
(2015)
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
, pp. 161-170
-
-
Zhang, C.1
Li, P.2
Sun, G.3
Guan, Y.4
Xiao, B.5
Cong, J.6
-
11
-
-
78149249904
-
A programmable parallel accelerator for learning and classification
-
ACM,, New York, NY, USA
-
Cadambi S, Majumdar A, Becchi M, Chakradhar S, Graf HP. A programmable parallel accelerator for learning and classification. Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques: ACM, New York, NY, USA, 2010; 273–284.
-
(2010)
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques
, pp. 273-284
-
-
Cadambi, S.1
Majumdar, A.2
Becchi, M.3
Chakradhar, S.4
Graf, H.P.5
-
13
-
-
84913580146
-
Caffe: convolutional architecture for fast feature embedding
-
ACM,, Orlando, Florida, USA
-
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe: convolutional architecture for fast feature embedding. Proceedings of the ACM International Conference on Multimedia: ACM, Orlando, Florida, USA, 2014; 675–678.
-
(2014)
Proceedings of the ACM International Conference on Multimedia
, pp. 675-678
-
-
Jia, Y.1
Shelhamer, E.2
Donahue, J.3
Karayev, S.4
Long, J.5
Girshick, R.6
Guadarrama, S.7
Darrell, T.8
-
14
-
-
85029668613
-
-
BigLearn, NIPS Workshop, Granada, Spain
-
Collobert R, Kavukcuoglu K, Farabet C. Torch7: a matlab-like environment for machine learning. BigLearn, NIPS Workshop, Granada, Spain, 2011.
-
(2011)
Torch7: a matlab-like environment for machine learning
-
-
Collobert, R.1
Kavukcuoglu, K.2
Farabet, C.3
-
16
-
-
20344376214
-
64-bit floating-point FPGA matrix multiplication
-
ACM,, New York, NY, USA
-
Dou Y, Vassiliadis S, Kuzmanov GK, Gaydadjiev GN. 64-bit floating-point FPGA matrix multiplication. Proceedings of the 2005 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays: ACM, New York, NY, USA, 2005; 86–95.
-
(2005)
Proceedings of the 2005 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
, pp. 86-95
-
-
Dou, Y.1
Vassiliadis, S.2
Kuzmanov, G.K.3
Gaydadjiev, G.N.4
-
18
-
-
84959333505
-
Unified virtual memory support for deep CNN accelerator on SoC FPGA
-
In, Springer, Zhangjiajie, China
-
Xiao T, Qiao Y, Shen J, Yang Q, Wen M. Unified virtual memory support for deep CNN accelerator on SoC FPGA. In Algorithms and Architectures for Parallel Processing. Springer: Zhangjiajie, China 2015; 64–76.
-
(2015)
Algorithms and Architectures for Parallel Processing
, pp. 64-76
-
-
Xiao, T.1
Qiao, Y.2
Shen, J.3
Yang, Q.4
Wen, M.5
-
19
-
-
80054919955
-
-
2011 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE,, Colorado Springs, CO, USA
-
Farabet C, Martini B, Corda B, Akselrod P, Culurciello E, LeCun Y. Neuflow: a runtime reconfigurable dataflow processor for vision. 2011 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW): IEEE, Colorado Springs, CO, USA, 2011; 109–116.
-
(2011)
Neuflow: a runtime reconfigurable dataflow processor for vision
, pp. 109-116
-
-
Farabet, C.1
Martini, B.2
Corda, B.3
Akselrod, P.4
Culurciello, E.5
LeCun, Y.6
-
21
-
-
77955007393
-
A dynamically configurable coprocessor for convolutional neural networks
-
Chakradhar S, Sankaradas M, Jakkula V, Cadambi S. A dynamically configurable coprocessor for convolutional neural networks. ACM SIGARCH Computer Architecture News 2010; 38(3):247–257.
-
(2010)
ACM SIGARCH Computer Architecture News
, vol.38
, Issue.3
, pp. 247-257
-
-
Chakradhar, S.1
Sankaradas, M.2
Jakkula, V.3
Cadambi, S.4
-
22
-
-
84892533708
-
-
2013 IEEE 31st International Conference on Computer Design (ICCD), IEEE
-
Peemen M, Setio AAA, Mesman B, Corporaal H. Memory-centric accelerator design for convolutional neural networks. 2013 IEEE 31st International Conference on Computer Design (ICCD): IEEE, 2013; 13–19.
-
(2013)
Memory-centric accelerator design for convolutional neural networks
, pp. 13-19
-
-
Peemen, M.1
Setio, A.A.A.2
Mesman, B.3
Corporaal, H.4
-
23
-
-
84897780584
-
Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning
-
ACM
-
Chen T, Du Z, Sun N, Wang J, Wu C, Chen Y, Temam O. Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems: ACM, 2014; 269–284.
-
(2014)
Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems
, pp. 269-284
-
-
Chen, T.1
Du, Z.2
Sun, N.3
Wang, J.4
Wu, C.5
Chen, Y.6
Temam, O.7
-
24
-
-
84944081816
-
CuDNN: Efficient primitives for deep learning
-
Chetlur S, Woolley C, Vandermersch P, Cohen J, Tran J, Catanzaro B, Shelhamer E. CuDNN: Efficient primitives for deep learning. arXiv preprint 2014: arXiv:1410.0759.
-
(2014)
arXiv preprint
-
-
Chetlur, S.1
Woolley, C.2
Vandermersch, P.3
Cohen, J.4
Tran, J.5
Catanzaro, B.6
Shelhamer, E.7
-
25
-
-
84920152252
-
Accuracy evaluation of deep belief networks with fixed-point arithmetic
-
Jiang J, Hu R, Mikel L, Dou Y. Accuracy evaluation of deep belief networks with fixed-point arithmetic. Computer Modelling & New Technologies 2014; 18(6):7–14.
-
(2014)
Computer Modelling & New Technologies
, vol.18
, Issue.6
, pp. 7-14
-
-
Jiang, J.1
Hu, R.2
Mikel, L.3
Dou, Y.4
-
26
-
-
84966674121
-
Learning both weights and connections for efficient neural networks
-
Han S, Pool J, Tran J, Dally WJ. Learning both weights and connections for efficient neural networks. arXiv preprint 2015: arXiv:1506.02626.
-
(2015)
arXiv preprint
-
-
Han, S.1
Pool, J.2
Tran, J.3
Dally, W.J.4
-
27
-
-
84919470072
-
Performance analysis and optimization for SPMV on GPU using probabilistic modeling
-
Li K, Yang W, Li K. Performance analysis and optimization for SPMV on GPU using probabilistic modeling. IEEE Transactions on Parallel and Distributed Systems 2015; 26(1):196–205.
-
(2015)
IEEE Transactions on Parallel and Distributed Systems
, vol.26
, Issue.1
, pp. 196-205
-
-
Li, K.1
Yang, W.2
Li, K.3
-
28
-
-
84939230567
-
Performance optimization using partitioned SPMV on GPUs and multicore CPUs
-
Yang W, Li K, Mo Z, Li K. Performance optimization using partitioned SPMV on GPUs and multicore CPUs. IEEE Transactions on Computers 2015; 64(9):2623–2636.
-
(2015)
IEEE Transactions on Computers
, vol.64
, Issue.9
, pp. 2623-2636
-
-
Yang, W.1
Li, K.2
Mo, Z.3
Li, K.4
|