-
1
-
-
84857819132
-
Theano: A CPU and GPU math expression compiler
-
June Oral Presentation
-
Bergstra, James, Breuleux, Olivier, Bastien, Frédéric, Lamblin, Pascal, Pascanu, Razvan, Desjardins, Guillaume, Turian, Joseph, Warde-Farley, David, and Bengio, Yoshua. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), June 2010. Oral Presentation.
-
(2010)
Proceedings of the Python for Scientific Computing Conference (SciPy)
-
-
Bergstra, J.1
Breuleux, O.2
Bastien, F.3
Lamblin, P.4
Pascanu, R.5
Desjardins, G.6
Turian, J.7
Warde-Farley, D.8
Bengio, Y.9
-
2
-
-
0014905463
-
A linear filtering approach to the computation of discrete fourier transform
-
December
-
Bluestein, Leo I. A linear filtering approach to the computation of discrete Fourier transform. Audio and Electroacoustics, IEEE Transactions on, 18(4):451–455, December 1970. ISSN 0018-9278.
-
(1970)
Audio and Electroacoustics, IEEE Transactions on
, vol.18
, Issue.4
, pp. 451-455
-
-
Bluestein, L.I.1
-
3
-
-
84920106320
-
Efficient training of convolutional deep belief networks in the frequency domain for application to high-resolution 2d and 3d images
-
Brosch, Tom and Tam, Roger C. Efficient training of convolutional deep belief networks in the frequency domain for application to high-resolution 2d and 3d images. Neural Computation, 27 (1):211–227, 2015. doi: 10.1162/NECO_a_00682. URL http://dx.doi.org/10.1162/NECO_a_00682.
-
(2015)
Neural Computation
, vol.27
, Issue.1
, pp. 211-227
-
-
Brosch, T.1
Tam, R.C.2
-
4
-
-
77954996735
-
-
Burrus, C. Sidney. Fast fourier transforms, 2008. URL http://cnx.org/contents/16e8e5e8-4f22-4b53-9cd6-a15b14f01ce4@5.6:16/Fast_Fourier_Transforms_(6x9_V.
-
(2008)
Sidney. Fast Fourier Transforms
-
-
Burrus, C.1
-
5
-
-
70349332332
-
High performance convolutional neural networks for document processing
-
Lorette, Guy ed, October Université de Rennes 1, Suvisoft. URL http://www.suvisoft.com
-
Chellapilla, Kumar, Puri, Sidd, and Simard, Patrice. High Performance Convolutional Neural Networks for Document Processing. In Lorette, Guy (ed.), Tenth International Workshop on Frontiers in Handwriting Recognition, La Baule (France), October 2006. Université de Rennes 1, Suvisoft. URL https://hal.inria.fr/inria-00112631. http://www.suvisoft.com.
-
(2006)
Tenth International Workshop on Frontiers in Handwriting Recognition, La Baule (France)
-
-
Chellapilla, K.1
Puri, S.2
Simard, P.3
-
6
-
-
84959213942
-
Cudnn: Efficient primitives for deep learning
-
abs
-
Chetlur, Sharan, Woolley, Cliff, Vandermersch, Philippe, Cohen, Jonathan, Tran, John, Catanzaro, Bryan, and Shelhamer, Evan. cudnn: Efficient primitives for deep learning. CoRR, abs/1410.0759, 2014. URL http://arxiv.org/abs/1410.0759.
-
(2014)
CoRR
-
-
Chetlur, S.1
Woolley, C.2
Vandermersch, P.3
Cohen, J.4
Tran, J.5
Catanzaro, B.6
Shelhamer, E.7
-
7
-
-
84888340666
-
Torch7: A matlab-like environment for machine learning
-
Collobert, R., Kavukcuoglu, K., and Farabet, C. Torch7: A matlab-like environment for machine learning. In BigLearn, NIPS Workshop, 2011a.
-
(2011)
BigLearn, NIPS Workshop
-
-
Collobert, R.1
Kavukcuoglu, K.2
Farabet, C.3
-
8
-
-
80053558787
-
Natural language processing (almost) from scratch
-
November
-
Collobert, Ronan, Weston, Jason, Bottou, Léon, Karlen, Michael, Kavukcuoglu, Koray, and Kuksa, Pavel. Natural language processing (almost) from scratch. J. Mach. Learn. Res., 12:2493–2537, November 2011b. ISSN 1532-4435. URL http://dl.acm.org/citation.cfm?id=1953048.2078186.
-
(2011)
J. Mach. Learn. Res.
, vol.12
, pp. 2493-2537
-
-
Collobert, R.1
Weston, J.2
Bottou, L.3
Karlen, M.4
Kavukcuoglu, K.5
Kuksa, P.6
-
9
-
-
84968470212
-
An algorithm for the machine calculation of complex fourier series
-
Cooley, James W. and Tukey, John W. An algorithm for the machine calculation of complex fourier series. Mathematics of computation, 19(90):297–301, 1965.
-
(1965)
Mathematics of Computation
, vol.19
, Issue.90
, pp. 297-301
-
-
Cooley, J.W.1
Tukey, J.W.2
-
10
-
-
53749092570
-
Parallel computing experiences with CUDA
-
July
-
Garland, Michael, Le Grand, Scott, Nickolls, John, Anderson, Joshua, Hardwick, Jim, Morton, Scott, Phillips, Everett, Zhang, Yao, and Volkov, Vasily. Parallel computing experiences with cuda. IEEE Micro, 28(4):13–27, July 2008. ISSN 0272-1732. doi: 10.1109/MM.2008.57. URL http://dx.doi.org/10.1109/MM.2008.57.
-
(2008)
IEEE Micro
, vol.28
, Issue.4
, pp. 13-27
-
-
Garland, M.1
Le Grand, S.2
Nickolls, J.3
Anderson, J.4
Hardwick, J.5
Morton, S.6
Phillips, E.7
Zhang, Y.8
Volkov, V.9
-
12
-
-
84949665448
-
A family of high-performance matrix multiplication algorithms
-
London, UK, UK, Springer-Verlag
-
Gunnels, John A., Henry, Greg M., and van de Geijn, Robert A. A family of high-performance matrix multiplication algorithms. In Proceedings of the International Conference on Computational Sciences-Part I, ICCS’01, pp. 51–60, London, UK, UK, 2001. Springer-Verlag. ISBN 3-540-42232-3. URL http://dl.acm.org/citation.cfm?id=645455.653765.
-
(2001)
Proceedings of the International Conference on Computational Sciences-Part I, ICCS’01
, pp. 51-60
-
-
Gunnels, J.A.1
Henry, G.M.2
van de Geijn, R.A.3
-
13
-
-
85026986651
-
Supernode partitioning
-
New York, NY, USA, ACM
-
Irigoin, F. and Triolet, R. Supernode partitioning. In Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL’88, pp. 319–329, New York, NY, USA, 1988. ACM. ISBN 0-89791-252-7. doi: 10.1145/73560.73588. URL http://doi.acm.org/10.1145/73560.73588.
-
(1988)
Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL’88
, pp. 319-329
-
-
Irigoin, F.1
Triolet, R.2
-
14
-
-
84913555165
-
-
arXiv preprint
-
Jia, Yangqing, Shelhamer, Evan, Donahue, Jeff, Karayev, Sergey, Long, Jonathan, Girshick, Ross, Guadarrama, Sergio, and Darrell, Trevor. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.
-
(2014)
Caffe: Convolutional Architecture for Fast Feature Embedding
-
-
Jia, Y.1
Shelhamer, E.2
Donahue, J.3
Karayev, S.4
Long, J.5
Girshick, R.6
Guadarrama, S.7
Darrell, T.8
-
16
-
-
84876231242
-
Imagenet classification with deep convolutional neural networks
-
Pereira, F., Burges, C.J.C., Bottou, L., and Weinberger, K.Q. eds, Curran Associates, Inc
-
Krizhevsky, Alex, Sutskever, Ilya, and Hinton, Geoffrey E. Imagenet classification with deep convolutional neural networks. In Pereira, F., Burges, C.J.C., Bottou, L., and Weinberger, K.Q. (eds.), Advances in Neural Information Processing Systems 25, pp. 1097–1105. Curran Associates, Inc., 2012. URL http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf.
-
(2012)
Advances in Neural Information Processing Systems
, vol.25
, pp. 1097-1105
-
-
Krizhevsky, A.1
Sutskever, I.2
Hinton, G.E.3
-
17
-
-
85015159430
-
Maxdnn: An efficient convolution kernel for deep learning with maxwell GPUs
-
Lavin, Andrew. maxdnn: An efficient convolution kernel for deep learning with maxwell gpus. CoRR, abs/1501.06633, 2015. URL http://arxiv.org/abs/1501.06633.
-
(2015)
CoRR
-
-
Lavin, A.1
-
18
-
-
0003496929
-
-
Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1st edition
-
Lyons, Richard G. Understanding Digital Signal Processing. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1st edition, 1996. ISBN 0201634678.
-
(1996)
Understanding Digital Signal Processing
-
-
Lyons, R.G.1
-
19
-
-
84937834001
-
Fast training of convolutional networks through ffts
-
Mathieu, Michaël, Henaff, Mikael, and LeCun, Yann. Fast training of convolutional networks through ffts. CoRR, abs/1312.5851, 2013. URL http://arxiv.org/abs/1312.5851.
-
(2013)
CoRR
-
-
Mathieu, M.1
Henaff, M.2
LeCun, Y.3
-
20
-
-
84883116448
-
Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines
-
Ragan-Kelley, Jonathan, Barnes, Connelly, Adams, Andrew, Paris, Sylvain, Durand, Frédo, and Amarasinghe, Saman P. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI’13, Seattle, WA, USA, June 16-19, 2013, pp. 519–530, 2013. doi: 10.1145/2462156.2462176. URL http://doi.acm.org/10.1145/2462156.2462176.
-
(2013)
ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI’13, Seattle, WA, USA, June 16-19, 2013
, pp. 519-530
-
-
Ragan-Kelley, J.1
Barnes, C.2
Adams, A.3
Paris, S.4
Durand, F.5
Amarasinghe, S.P.6
-
22
-
-
85083951635
-
Overfeat: Integrated recognition, localization and detection using convolutional networks
-
CBLS, April
-
Sermanet, Pierre, Eigen, David, Zhang, Xiang, Mathieu, Michael, Fergus, Rob, and LeCun, Yann. Overfeat: Integrated recognition, localization and detection using convolutional networks. In International Conference on Learning Representations (ICLR 2014). CBLS, April 2014. URL http://openreview.net/document/d332e77d-459a-4af8-b3ed-55ba.
-
(2014)
International Conference on Learning Representations (ICLR 2014)
-
-
Sermanet, P.1
Eigen, D.2
Zhang, X.3
Mathieu, M.4
Fergus, R.5
LeCun, Y.6
-
23
-
-
77954713325
-
Speeding up nek5000 with autotuning and specialization
-
New York, NY, USA, ACM
-
Shin, Jaewook, Hall, Mary W., Chame, Jacqueline, Chen, Chun, Fischer, Paul F., and Hovland, Paul D. Speeding up nek5000 with autotuning and specialization. In Proceedings of the 24th ACM International Conference on Supercomputing, ICS’10, pp. 253–262, New York, NY, USA, 2010. ACM. ISBN 978-1-4503-0018-6.
-
(2010)
Proceedings of the 24th ACM International Conference on Supercomputing, ICS’10
, pp. 253-262
-
-
Shin, J.1
Hall, M.W.2
Chame, J.3
Chen, C.4
Fischer, P.F.5
Hovland, P.D.6
-
24
-
-
0025467711
-
A bridging model for parallel computation
-
August
-
Valiant, Leslie G. A bridging model for parallel computation. Commun. ACM, 33(8):103–111, August 1990. ISSN 0001-0782. doi: 10.1145/79173.79181. URL http://doi.acm.org/10.1145/79173.79181.
-
(1990)
Commun. ACM
, vol.33
, Issue.8
, pp. 103-111
-
-
Valiant, L.G.1
-
25
-
-
83155174209
-
Better performance at lower occupancy
-
Volkov, V. Better performance at lower occupancy. In GPU Technology Conference, 2010. URL http://www.cs.berkeley.edu/~volkov/volkov10-GTC.pdf.
-
(2010)
GPU Technology Conference
-
-
Volkov, V.1
|