-
1
-
-
84906214784
-
Exploring convolutional neural network structures and optimization for speech recognition
-
O. Abdel-Hamid, L. Deng, and D. Yu. Exploring convolutional neural network structures and optimization for speech recognition. Proceedings of Interspeech, 2013.
-
(2013)
Proceedings of Interspeech
-
-
Abdel-Hamid, O.1
Deng, L.2
Yu, D.3
-
11
-
-
85032751593
-
Research developments and directions in speech recognition and understanding
-
May
-
J. Baker, L. Deng, J. Glass, S. Khudanpur, C.-H. Lee, N. Morgan, and D. O'Shaughnessy. Research developments and directions in speech recognition and understanding. IEEE Signal Processing Magazine, 26(3):75-80, May 2009.
-
(2009)
IEEE Signal Processing Magazine
, vol.26
, Issue.3
, pp. 75-80
-
-
Baker, J.1
Deng, L.2
Glass, J.3
Khudanpur, S.4
Lee, C.-H.5
Morgan, N.6
O'shaughnessy, D.7
-
12
-
-
85032759066
-
Updated MINS report on speech recognition and understanding
-
July
-
J. Baker, L. Deng, J. Glass, S. Khudanpur, C.-H. Lee, N. Morgan, and D. O'Shaughnessy. Updated MINS report on speech recognition and understanding. IEEE Signal Processing Magazine, 26(4), July 2009.
-
(2009)
IEEE Signal Processing Magazine
, vol.26
, Issue.4
-
-
Baker, J.1
Deng, L.2
Glass, J.3
Khudanpur, S.4
Lee, C.-H.5
Morgan, N.6
O'shaughnessy, D.7
-
19
-
-
79959407847
-
Neural net language models
-
Y. Bengio. Neural net language models. Scholarpedia, 3, 2008.
-
(2008)
Scholarpedia
, vol.3
-
-
Bengio, Y.1
-
22
-
-
84883201530
-
Deep learning of representations: Looking forward
-
Springer
-
Y. Bengio. Deep learning of representations: Looking forward. In Statistical Language and Speech Processing, pages 1-37. Springer, 2013.
-
(2013)
Statistical Language and Speech Processing
, pp. 1-37
-
-
Bengio, Y.1
-
25
-
-
0026835134
-
Global optimization of a neural network-hidden markov model hybrid
-
Y. Bengio, R. De Mori, G. Flammia, and R. Kompe. Global optimization of a neural network-hidden markov model hybrid. IEEE Transactions on Neural Networks, 3:252-259, 1992.
-
(1992)
IEEE Transactions on Neural Networks
, vol.3
, pp. 252-259
-
-
Bengio, Y.1
De Mori, R.2
Flammia, G.3
Kompe, R.4
-
27
-
-
0142166851
-
A neural probabilistic language model
-
Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin. A neural probabilistic language model. Journal of Machine Learning Research, 3:1137-1155, 2003.
-
(2003)
Journal of Machine Learning Research
, vol.3
, pp. 1137-1155
-
-
Bengio, Y.1
Ducharme, R.2
Vincent, P.3
Jauvin, C.4
-
33
-
-
0035250280
-
An application of discriminative feature extraction to filter-bank-based speech recognition
-
A. Biem, S. Katagiri, E. McDermott, and B. Juang. An application of discriminative feature extraction to filter-bank-based speech recognition. IEEE Transactions on Speech and Audio Processing, 9:96-110, 2001.
-
(2001)
IEEE Transactions on Speech and Audio Processing
, vol.9
, pp. 96-110
-
-
Biem, A.1
Katagiri, S.2
McDermott, E.3
Juang, B.4
-
35
-
-
85032752364
-
Graphical model architectures for speech recognition
-
J. Bilmes and C. Bartels. Graphical model architectures for speech recognition. IEEE Signal Processing Magazine, 22:89-100, 2005.
-
(2005)
IEEE Signal Processing Magazine
, vol.22
, pp. 89-100
-
-
Bilmes, J.1
Bartels, C.2
-
36
-
-
84877727208
-
A semantic matching energy function for learning with multi-relational data - Application to word-sense disambiguation
-
May
-
A. Bordes, X. Glorot, J. Weston, and Y. Bengio. A semantic matching energy function for learning with multi-relational data - application to word-sense disambiguation. Machine Learning, May 2013.
-
(2013)
Machine Learning
-
-
Bordes, A.1
Glorot, X.2
Weston, J.3
Bengio, Y.4
-
38
-
-
84890014676
-
From machine learning to machine reasoning: An essay
-
L. Bottou. From machine learning to machine reasoning: An essay. Journal of Machine Learning Research, 14:3207-3260, 2013.
-
(2013)
Journal of Machine Learning Research
, vol.14
, pp. 3207-3260
-
-
Bottou, L.1
-
44
-
-
0030196364
-
Stacked regression
-
L. Breiman. Stacked regression. Machine Learning, 24:49-64, 1996.
-
(1996)
Machine Learning
, vol.24
, pp. 49-64
-
-
Breiman, L.1
-
45
-
-
84903690898
-
-
Final Report for 1998 Workshop on Language Engineering, CLSP, Johns Hopkins
-
J. Bridle, L. Deng, J. Picone, H. Richards, J. Ma, T. Kamm, M. Schuster, S. Pike, and R. Reagan. An investigation of segmental hidden dynamic models of speech coarticulation for automatic speech recognition. Final Report for 1998 Workshop on Language Engineering, CLSP, Johns Hopkins, 1998.
-
(1998)
An Investigation of Segmental Hidden Dynamic Models of Speech Coarticulation for Automatic Speech Recognition
-
-
Bridle, J.1
Deng, L.2
Picone, J.3
Richards, H.4
Ma, J.5
Kamm, T.6
Schuster, M.7
Pike, S.8
Reagan, R.9
-
46
-
-
84886675337
-
Large vocabulary speech recognition on parallel architectures
-
November
-
P. Cardinal, P. Dumouchel, and G. Boulianne. Large vocabulary speech recognition on parallel architectures. IEEE Transactions on Audio, Speech, and Language Processing, 21(11):2290-2300, November 2013.
-
(2013)
IEEE Transactions on Audio Speech, and Language Processing
, vol.21
, Issue.11
, pp. 2290-2300
-
-
Cardinal, P.1
Dumouchel, P.2
Boulianne, G.3
-
47
-
-
0031189914
-
Multitask learning
-
R. Caruana. Multitask learning. Machine Learning, 28:41-75, 1997.
-
(1997)
Machine Learning
, vol.28
, pp. 41-75
-
-
Caruana, R.1
-
50
-
-
0031146514
-
Hmm-based speech recognition using state-dependent, discriminatively derived transforms on Mel-warped DFT features
-
R. Chengalvarayan and L. Deng. Hmm-based speech recognition using state-dependent, discriminatively derived transforms on Mel-warped DFT features. IEEE Transactions on Speech and Audio Processing, pages 243-256, 1997.
-
(1997)
IEEE Transactions on Speech and Audio Processing
, pp. 243-256
-
-
Chengalvarayan, R.1
Deng, L.2
-
52
-
-
0032206267
-
Speech trajectory discrimination using the minimum classification error learning
-
R. Chengalvarayan and L. Deng. Speech trajectory discrimination using the minimum classification error learning. IEEE Transactions on Speech and Audio Processing, 6(6):505-515, 1998.
-
(1998)
IEEE Transactions on Speech and Audio Processing
, vol.6
, Issue.6
, pp. 505-515
-
-
Chengalvarayan, R.1
Deng, L.2
-
55
-
-
78649669320
-
Deep, big, simple neural nets for handwritten digit recognition
-
December
-
D. Ciresan, U. Meier, L. Gambardella, and J. Schmidhuber. Deep, big, simple neural nets for handwritten digit recognition. Neural Computation, December 2010.
-
(2010)
Neural Computation
-
-
Ciresan, D.1
Meier, U.2
Gambardella, L.3
Schmidhuber, J.4
-
59
-
-
84897484337
-
Deep learning with COTS HPC
-
A. Coates, B. Huval, T. Wang, D. Wu, A. Ng, and B. Catanzaro. Deep learning with COTS HPC. In Proceedings of International Conference on Machine Learning (ICML). 2013.
-
(2013)
Proceedings of International Conference on Machine Learning (ICML)
-
-
Coates, A.1
Huval, B.2
Wang, T.3
Wu, D.4
Ng, A.5
Catanzaro, B.6
-
63
-
-
80053558787
-
Natural language processing (almost) from scratch
-
R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. Journal on Machine Learning Research, 12:2493-2537, 2011.
-
(2011)
Journal on Machine Learning Research
, vol.12
, pp. 2493-2537
-
-
Collobert, R.1
Weston, J.2
Bottou, L.3
Karlen, M.4
Kavukcuoglu, K.5
Kuksa, P.6
-
64
-
-
85162069624
-
Phone recognition with the mean-covariance restricted boltzmann machine
-
G. Dahl, M. Ranzato, A. Mohamed, and G. Hinton. Phone recognition with the mean-covariance restricted boltzmann machine. In Proceedings of Neural Information Processing Systems (NIPS), volume 23, pages 469-477. 2010.
-
(2010)
Proceedings of Neural Information Processing Systems (NIPS)
, vol.23
, pp. 469-477
-
-
Dahl, G.1
Ranzato, M.2
Mohamed, A.3
Hinton, G.4
-
68
-
-
84055222005
-
Context-dependent, pre-trained deep neural networks for large vocabulary speech recognition
-
January
-
G. Dahl, D. Yu, L. Deng, and A. Acero. Context-dependent, pre-trained deep neural networks for large vocabulary speech recognition. IEEE Transactions on Audio, Speech, & Language Processing, 20(1):30-42, January 2012.
-
(2012)
IEEE Transactions on Audio Speech, & Language Processing
, vol.20
, Issue.1
, pp. 30-42
-
-
Dahl, G.1
Yu, D.2
Deng, L.3
Acero, A.4
-
69
-
-
84877760312
-
Large scale distributed deep networks
-
J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Ng. Large scale distributed deep networks. In Proceedings of Neural Information Processing Systems (NIPS). 2012.
-
(2012)
Proceedings of Neural Information Processing Systems (NIPS)
-
-
Dean, J.1
Corrado, G.2
Monga, R.3
Chen, K.4
Devin, M.5
Le, Q.6
Mao, M.7
Ranzato, M.8
Senior, A.9
Tucker, P.10
Yang, K.11
Ng, A.12
-
71
-
-
0026854213
-
A generalized hidden Markov model with state-conditioned trend functions of time for the speech signal
-
L. Deng. A generalized hidden Markov model with state-conditioned trend functions of time for the speech signal. Signal Processing, 27(1):65-78, 1992.
-
(1992)
Signal Processing
, vol.27
, Issue.1
, pp. 65-78
-
-
Deng, L.1
-
72
-
-
0027678649
-
A stochastic model of speech incorporating hierarchical nonstationarity
-
L. Deng. A stochastic model of speech incorporating hierarchical nonstationarity. IEEE Transactions on Speech and Audio Processing, 1(4):471-475, 1993.
-
(1993)
IEEE Transactions on Speech and Audio Processing
, vol.1
, Issue.4
, pp. 471-475
-
-
Deng, L.1
-
73
-
-
0032119268
-
A dynamic, feature-based approach to the interface between phonology and phonetics for speech modeling and recognition
-
L. Deng. A dynamic, feature-based approach to the interface between phonology and phonetics for speech modeling and recognition. Speech Communication, 24(4):299-323, 1998.
-
(1998)
Speech Communication
, vol.24
, Issue.4
, pp. 299-323
-
-
Deng, L.1
-
74
-
-
0039503389
-
Computational models for speech production
-
Springer Verlag
-
L. Deng. Computational models for speech production. In Computational Models of Speech Pattern Processing, pages 199-213. Springer Verlag, 1999.
-
(1999)
Computational Models of Speech Pattern Processing
, pp. 199-213
-
-
Deng, L.1
-
75
-
-
33744966595
-
Switching dynamic system models for speech articulation and acoustics
-
Springer-Verlag, New York
-
L. Deng. Switching dynamic system models for speech articulation and acoustics. In Mathematical Foundations of Speech and Language Processing, pages 115-134. Springer-Verlag, New York, 2003.
-
(2003)
Mathematical Foundations of Speech and Language Processing
, pp. 115-134
-
-
Deng, L.1
-
78
-
-
85032752689
-
The MNIST database of handwritten digit images for machine learning research
-
November
-
L. Deng. The MNIST database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6), November 2012.
-
(2012)
IEEE Signal Processing Magazine
, vol.29
, Issue.6
-
-
Deng, L.1
-
83
-
-
0031185482
-
Speaker-independent phonetic classification using hidden markov models with state-conditioned mixtures of trend functions
-
L. Deng and M. Aksmanovic. Speaker-independent phonetic classification using hidden markov models with state-conditioned mixtures of trend functions. IEEE Transactions on Speech and Audio Processing, 5:319-324, 1997.
-
(1997)
IEEE Transactions on Speech and Audio Processing
, vol.5
, pp. 319-324
-
-
Deng, L.1
Aksmanovic, M.2
-
84
-
-
0028516022
-
Speech recognition using hidden Markov models with polynomial regression functions as nonstationary states
-
L. Deng, M. Aksmanovic, D. Sun, and J. Wu. Speech recognition using hidden Markov models with polynomial regression functions as nonstationary states. IEEE Transactions on Speech and Audio Processing, 2(4):507-520, 1994.
-
(1994)
IEEE Transactions on Speech and Audio Processing
, vol.2
, Issue.4
, pp. 507-520
-
-
Deng, L.1
Aksmanovic, M.2
Sun, D.3
Wu, J.4
-
86
-
-
0026458724
-
Structural design of a hidden Markov model based speech recognizer using multi-valued phonetic features: Comparison with segmental speech units
-
L. Deng and K. Erler. Structural design of a hidden Markov model based speech recognizer using multi-valued phonetic features: Comparison with segmental speech units. Journal of the Acoustical Society of America, 92(6):3058-3067, 1992.
-
(1992)
Journal of the Acoustical Society of America
, vol.92
, Issue.6
, pp. 3058-3067
-
-
Deng, L.1
Erler, K.2
-
87
-
-
0028256706
-
Analysis of correlation structure for a neural predictive model with application to speech recognition
-
L. Deng, K. Hassanein, and M. Elmasry. Analysis of correlation structure for a neural predictive model with application to speech recognition. Neural Networks, 7(2):331-339, 1994.
-
(1994)
Neural Networks
, vol.7
, Issue.2
, pp. 331-339
-
-
Deng, L.1
Hassanein, K.2
Elmasry, M.3
-
92
-
-
0026189555
-
Phonemic hidden Markov models with continuous mixture output densities for large vocabulary word recognition
-
L. Deng, M. Lennig, V. Gupta, F. Seitz, P. Mermelstein, and P. Kenny. Phonemic hidden Markov models with continuous mixture output densities for large vocabulary word recognition. IEEE Transactions on Signal Processing, 39(7):1677-1681, 1991.
-
(1991)
IEEE Transactions on Signal Processing
, vol.39
, Issue.7
, pp. 1677-1681
-
-
Deng, L.1
Lennig, M.2
Gupta, V.3
Seitz, F.4
Mermelstein, P.5
Kenny, P.6
-
93
-
-
10244257175
-
Large vocabulary word recognition using context-dependent allophonic hidden Markov models
-
L. Deng, M. Lennig, F. Seitz, and P. Mermelstein. Large vocabulary word recognition using context-dependent allophonic hidden Markov models. Computer Speech and Language, 4(4):345-357, 1990.
-
(1990)
Computer Speech and Language
, vol.4
, Issue.4
, pp. 345-357
-
-
Deng, L.1
Lennig, M.2
Seitz, F.3
Mermelstein, P.4
-
94
-
-
84890491198
-
Recent advances in deep learning for speech research at Microsoft
-
L. Deng, J. Li, K. Huang, Yao, D. Yu, F. Seide, M. Seltzer, G. Zweig, X. He, J. Williams, Y. Gong, and A. Acero. Recent advances in deep learning for speech research at Microsoft. In Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP). 2013a.
-
(2013)
Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP)
-
-
Deng, L.1
Li, J.2
Huang, K.3
Yao4
Yu, D.5
Seide, F.6
Seltzer, M.7
Zweig, G.8
He, X.9
Williams, J.10
Gong, Y.11
Acero, A.12
-
95
-
-
84876672166
-
Machine learning paradigms in speech recognition: An overview
-
May
-
L. Deng and X. Li. Machine learning paradigms in speech recognition: An overview. IEEE Transactions on Audio, Speech, & Language, 21:1060-1089, May 2013.
-
(2013)
IEEE Transactions on Audio, Speech, & Language
, vol.21
, pp. 1060-1089
-
-
Deng, L.1
Li, X.2
-
96
-
-
0033623527
-
Spontaneous speech recognition using a statistical coarticulatory model for the vocal tract resonance dynamics
-
L. Deng and J. Ma. Spontaneous speech recognition using a statistical coarticulatory model for the vocal tract resonance dynamics. Journal of the Acoustical Society America, 108:3036-3048, 2000.
-
(2000)
Journal of the Acoustical Society America
, vol.108
, pp. 3036-3048
-
-
Deng, L.1
Ma, J.2
-
98
-
-
0031198059
-
Production models as a structural basis for automatic speech recognition
-
August
-
L. Deng, G. Ramsay, and D. Sun. Production models as a structural basis for automatic speech recognition. Speech Communication, 33(2-3):93-111, August 1997.
-
(1997)
Speech Communication
, vol.33
, Issue.2-3
, pp. 93-111
-
-
Deng, L.1
Ramsay, G.2
Sun, D.3
-
99
-
-
0030190520
-
Transitional speech units and their representation by regressive Markov states: Applications to speech recognition
-
July
-
L. Deng and H. Sameti. Transitional speech units and their representation by regressive Markov states: Applications to speech recognition. IEEE Transactions on speech and audio processing, 4(4):301-306, July 1996.
-
(1996)
IEEE Transactions on Speech and Audio Processing
, vol.4
, Issue.4
, pp. 301-306
-
-
Deng, L.1
Sameti, H.2
-
100
-
-
79959842828
-
Binary coding of speech spectrograms using a deep autoencoder
-
L. Deng, M. Seltzer, D. Yu, A. Acero, A. Mohamed, and G. Hinton. Binary coding of speech spectrograms using a deep autoencoder. In Proceedings of Interspeech. 2010.
-
(2010)
Proceedings of Interspeech
-
-
Deng, L.1
Seltzer, M.2
Yu, D.3
Acero, A.4
Mohamed, A.5
Hinton, G.6
-
101
-
-
0028234947
-
A statistical approach to automatic speech recognition using the atomic speech units constructed from overlapping articulatory features
-
L. Deng and D. Sun. A statistical approach to automatic speech recognition using the atomic speech units constructed from overlapping articulatory features. Journal of the Acoustical Society of America, 85(5):2702-2719, 1994.
-
(1994)
Journal of the Acoustical Society of America
, vol.85
, Issue.5
, pp. 2702-2719
-
-
Deng, L.1
Sun, D.2
-
103
-
-
0036880074
-
Distributed speech processing in mipad's multimodal user interface
-
L. Deng, K.Wang, A. Acero, H.W. Hon, J. Droppo, C. Boulis, Y.Wang, D. Jacoby, M. Mahajan, C. Chelba, and X. Huang. Distributed speech processing in mipad's multimodal user interface. IEEE Transactions on Speech and Audio Processing, 10(8):605-619, 2002.
-
(2002)
IEEE Transactions on Speech and Audio Processing
, vol.10
, Issue.8
, pp. 605-619
-
-
Deng, L.1
Wang, K.2
Acero, A.3
Hon, H.W.4
Droppo, J.5
Boulis, C.6
Wang, Y.7
Jacoby, D.8
Mahajan, M.9
Chelba, C.10
Huang, X.11
-
104
-
-
18744401086
-
Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion
-
L. Deng, J. Wu, J. Droppo, and A. Acero. Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion. IEEE Transactions on Speech and Audio Processing, 13(3):412-421, 2005.
-
(2005)
IEEE Transactions on Speech and Audio Processing
, vol.13
, Issue.3
, pp. 412-421
-
-
Deng, L.1
Wu, J.2
Droppo, J.3
Acero, A.4
-
106
-
-
84865768819
-
Deep convex network: A scalable architecture for speech pattern classification
-
L. Deng and D. Yu. Deep convex network: A scalable architecture for speech pattern classification. In Proceedings of Interspeech. 2011.
-
(2011)
Proceedings of Interspeech
-
-
Deng, L.1
Yu, D.2
-
107
-
-
33744966561
-
A bidirectional target filtering model of speech coarticulation: Two-stage implementation for phonetic recognition
-
January
-
L. Deng, D. Yu, and A. Acero. A bidirectional target filtering model of speech coarticulation: Two-stage implementation for phonetic recognition. IEEE Transactions on Audio and Speech Processing, 14(1):256-265, January 2006.
-
(2006)
IEEE Transactions on Audio and Speech Processing
, vol.14
, Issue.1
, pp. 256-265
-
-
Deng, L.1
Yu, D.2
Acero, A.3
-
108
-
-
34047266395
-
Structured speech modeling
-
September
-
L. Deng, D. Yu, and A. Acero. Structured speech modeling. IEEE Transactions on Audio, Speech and Language Processing, 14(5):1492-1504, September 2006.
-
(2006)
IEEE Transactions on Audio, Speech and Language Processing
, vol.14
, Issue.5
, pp. 1492-1504
-
-
Deng, L.1
Yu, D.2
Acero, A.3
-
111
-
-
84991233704
-
A deep learning approach to machine transliteration
-
Athens, Greece, March
-
T. Deselaers, S. Hasan, O. Bender, and H. Ney. A deep learning approach to machine transliteration. In Proceedings of 4th Workshop on Statistical Machine Translation, pages 233-241. Athens, Greece, March 2009.
-
(2009)
Proceedings of 4th Workshop on Statistical Machine Translation
, pp. 233-241
-
-
Deselaers, T.1
Hasan, S.2
Bender, O.3
Ney, H.4
-
114
-
-
80055055551
-
Why does unsupervised pre-training help deep learning?
-
D. Erhan, Y. Bengio, A. Courvelle, P.Manzagol, P. Vencent, and S. Bengio. Why does unsupervised pre-training help deep learning? Journal on Machine Learning Research, pages 201-208, 2010.
-
(2010)
Journal on Machine Learning Research
, pp. 201-208
-
-
Erhan, D.1
Bengio, Y.2
Courvelle, A.3
Manzagol, P.4
Vencent, P.5
Bengio, S.6
-
116
-
-
0032119668
-
The hierarchical hidden Markov model: Analysis and applications
-
S. Fine, Y. Singer, and N. Tishby. The hierarchical hidden Markov model: Analysis and applications. Machine Learning, 32:41-62, 1998.
-
(1998)
Machine Learning
, vol.32
, pp. 41-62
-
-
Fine, S.1
Singer, Y.2
Tishby, N.3
-
117
-
-
84898958665
-
Devise: A deep visual-semantic embedding model
-
A. Frome, G. Corrado, J. Shlens, S. Bengio, J. Dean, M. Ranzato, and T. Mikolov. Devise: A deep visual-semantic embedding model. In Proceedings of Neural Information Processing Systems (NIPS). 2013.
-
(2013)
Proceedings of Neural Information Processing Systems (NIPS)
-
-
Frome, A.1
Corrado, G.2
Shlens, J.3
Bengio, S.4
Dean, J.5
Ranzato, M.6
Mikolov, T.7
-
118
-
-
44849099965
-
Phone-discriminating minimum classification error (p-mce) training for phonetic recognition
-
Q. Fu, X. He, and L. Deng. Phone-discriminating minimum classification error (p-mce) training for phonetic recognition. In Proceedings of Interspeech. 2007.
-
(2007)
Proceedings of Interspeech
-
-
Fu, Q.1
He, X.2
Deng, L.3
-
127
-
-
77955783938
-
Error approximation and minimum phone error acoustic model estimation
-
August
-
M. Gibson and T. Hain. Error approximation and minimum phone error acoustic model estimation. IEEE Transactions on Audio, Speech, and Language Processing, 18(6):1269-1279, August 2010.
-
(2010)
IEEE Transactions on Audio Speech, and Language Processing
, vol.18
, Issue.6
, pp. 1269-1279
-
-
Gibson, M.1
Hain, T.2
-
139
-
-
84857892556
-
Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics
-
M. Gutmann and A. Hyvarinen. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. Journal of Machine Learning Research, 13:307-361, 2012.
-
(2012)
Journal of Machine Learning Research
, vol.13
, pp. 307-361
-
-
Gutmann, M.1
Hyvarinen, A.2
-
140
-
-
85008520364
-
Transcribing meetings with the AMIDA systems
-
T. Hain, L. Burget, J. Dines, P. Garner, F. Grezl, A. Hannani, M. Huijbregts, M. Karafiat, M. Lincoln, and V. Wan. Transcribing meetings with the AMIDA systems. IEEE Transactions on Audio, Speech, and Language Processing, 20:486-498, 2012.
-
(2012)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.20
, pp. 486-498
-
-
Hain, T.1
Burget, L.2
Dines, J.3
Garner, P.4
Grezl, F.5
Hannani, A.6
Huijbregts, M.7
Karafiat, M.8
Lincoln, M.9
Wan, V.10
-
144
-
-
85032751114
-
Speech recognition, machine translation, and speech translation - A unifying discriminative framework
-
November 2011
-
X. He and L. Deng. Speech recognition, machine translation, and speech translation - a unifying discriminative framework. IEEE Signal Processing Magazine, 28, November 2011.
-
IEEE Signal Processing Magazine
, vol.28
-
-
He, X.1
Deng, L.2
-
146
-
-
84876669905
-
Speech-centric information processing: An optimization-oriented approach
-
X. He and L. Deng. Speech-centric information processing: An optimization-oriented approach. In Proceedings of the IEEE. 2013.
-
(2013)
Proceedings of the IEEE
-
-
He, X.1
Deng, L.2
-
147
-
-
85032750905
-
Discriminative learning in sequential pattern recognition - A unifying review for optimization-oriented speech recognition
-
X. He, L. Deng, andW. Chou. Discriminative learning in sequential pattern recognition - a unifying review for optimization-oriented speech recognition. IEEE Signal Processing Magazine, 25:14-36, 2008.
-
(2008)
IEEE Signal Processing Magazine
, vol.25
, pp. 14-36
-
-
He, X.1
Deng, L.2
Chou, W.3
-
148
-
-
85008035419
-
Equivalence of generative and log-liner models
-
February
-
G. Heigold, H. Ney, P. Lehnen, T. Gass, and R. Schluter. Equivalence of generative and log-liner models. IEEE Transactions on Audio, Speech, and Language Processing, 19(5):1138-1148, February 2011.
-
(2011)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.19
, Issue.5
, pp. 1138-1148
-
-
Heigold, G.1
Ney, H.2
Lehnen, P.3
Gass, T.4
Schluter, R.5
-
149
-
-
84887376734
-
Investigations on an EM-style optimization algorithm for discriminative training of HMMs
-
December
-
G. Heigold, H. Ney, and R. Schluter. Investigations on an EM-style optimization algorithm for discriminative training of HMMs. IEEE Transactions on Audio, Speech, and Language Processing, 21(12):2616-2626, December 2013.
-
(2013)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.21
, Issue.12
, pp. 2616-2626
-
-
Heigold, G.1
Ney, H.2
Schluter, R.3
-
150
-
-
84890539009
-
Multilingual acoustic models using distributed deep neural networks
-
G. Heigold, V. Vanhoucke, A. Senior, P. Nguyen, M. Ranzato,M. Devin, and J. Dean. Multilingual acoustic models using distributed deep neural networks. In Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP). 2013.
-
(2013)
Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP)
-
-
Heigold, G.1
Vanhoucke, V.2
Senior, A.3
Nguyen, P.4
Ranzatom. Devin, M.5
Dean, J.6
-
151
-
-
69249105007
-
Discriminative input stream combination for conditional random field phone recognition
-
November
-
I. Heintz, E. Fosler-Lussier, and C. Brew. Discriminative input stream combination for conditional random field phone recognition. IEEE Transactions on Audio, Speech, and Language Processing, 17(8):1533-1546, November 2009.
-
(2009)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.17
, Issue.8
, pp. 1533-1546
-
-
Heintz, I.1
Fosler-Lussier, E.2
Brew, C.3
-
155
-
-
70350435251
-
Speech recognition using augmented conditional random fields
-
February
-
Y. Hifny and S. Renals. Speech recognition using augmented conditional random fields. IEEE Transactions on Audio, Speech, and Language Processing, 17(2):354-365, February 2009.
-
(2009)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.17
, Issue.2
, pp. 354-365
-
-
Hifny, Y.1
Renals, S.2
-
156
-
-
0025519204
-
Mapping part-whole hierarchies into connectionist networks
-
G. Hinton. Mapping part-whole hierarchies into connectionist networks. Artificial Intelligence, 46:47-75, 1990.
-
(1990)
Artificial Intelligence
, vol.46
, pp. 47-75
-
-
Hinton, G.1
-
157
-
-
0009438133
-
Preface to the special issue on connectionist symbol processing
-
G. Hinton. Preface to the special issue on connectionist symbol processing. Artificial Intelligence, 46:1-4, 1990.
-
(1990)
Artificial Intelligence
, vol.46
, pp. 1-4
-
-
Hinton, G.1
-
158
-
-
0037327724
-
The ups and downs of Hebb synapses
-
G. Hinton. The ups and downs of Hebb synapses. Canadian Psychology, 44:10-13, 2003.
-
(2003)
Canadian Psychology
, vol.44
, pp. 10-13
-
-
Hinton, G.1
-
161
-
-
85032751458
-
Deep neural networks for acoustic modeling in speech recognition
-
November
-
G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29(6):82-97, November 2012.
-
(2012)
IEEE Signal Processing Magazine
, vol.29
, Issue.6
, pp. 82-97
-
-
Hinton, G.1
Deng, L.2
Yu, D.3
Dahl, G.4
Mohamed, A.5
Jaitly, N.6
Senior, A.7
Vanhoucke, V.8
Nguyen, P.9
Sainath, T.10
Kingsbury, B.11
-
163
-
-
33745805403
-
A fast learning algorithm for deep belief nets
-
G. Hinton, S. Osindero, and Y. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18:1527-1554, 2006.
-
(2006)
Neural Computation
, vol.18
, pp. 1527-1554
-
-
Hinton, G.1
Osindero, S.2
Teh, Y.3
-
164
-
-
33746600649
-
Reducing the dimensionality of data with neural networks
-
July
-
G. Hinton and R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504-507, July 2006.
-
(2006)
Science
, vol.313
, Issue.5786
, pp. 504-507
-
-
Hinton, G.1
Salakhutdinov, R.2
-
165
-
-
79961245273
-
Discovering binary codes for documents by learning deep generative models
-
G. Hinton and R. Salakhutdinov. Discovering binary codes for documents by learning deep generative models. Topics in Cognitive Science, pages 1-18, 2010.
-
(2010)
Topics in Cognitive Science
, pp. 1-18
-
-
Hinton, G.1
Salakhutdinov, R.2
-
166
-
-
84867720412
-
-
arXiv: 1207.0580v1
-
G. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv: 1207.0580v1, 2012.
-
(2012)
Improving Neural Networks by Preventing Co-adaptation of Feature Detectors
-
-
Hinton, G.1
Srivastava, N.2
Krizhevsky, A.3
Sutskever, I.4
Salakhutdinov, R.5
-
172
-
-
84889566627
-
Learning deep structured semantic models for web search using clickthrough data
-
P. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. Learning deep structured semantic models for web search using clickthrough data. Association for Computing Machinery (ACM) International Conference Information and Knowledge Management (CIKM), 2013.
-
(2013)
Association for Computing Machinery (ACM) International Conference Information and Knowledge Management (CIKM)
-
-
Huang, P.1
He, X.2
Gao, J.3
Deng, L.4
Acero, A.5
Heck, L.6
-
174
-
-
77956280276
-
Hierarchical bayesian language models for conversational speech recognition
-
November
-
S. Huang and S. Renals. Hierarchical bayesian language models for conversational speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 18(8):1941-1954, November 2010.
-
(2010)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.18
, Issue.8
, pp. 1941-1954
-
-
Huang, S.1
Renals, S.2
-
175
-
-
0034842339
-
Mipad: A multimodal interaction prototype
-
X. Huang, A. Acero, C. Chelba, L. Deng, J. Droppo, D. Duchene, J. Goodman, and H. Hon. Mipad: A multimodal interaction prototype. In Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP). 2001.
-
(2001)
Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP)
-
-
Huang, X.1
Acero, A.2
Chelba, C.3
Deng, L.4
Droppo, J.5
Duchene, D.6
Goodman, J.7
Hon, H.8
-
176
-
-
84906218045
-
Semi-supervised GMMand DNN acoustic model training with multi-system combination and confidence re-calibration
-
Y. Huang, D. Yu, Y. Gong, and C. Liu. Semi-supervised GMMand DNN acoustic model training with multi-system combination and confidence re-calibration. In Proceedings of Interspeech, pages 2360-2364. 2013.
-
(2013)
Proceedings of Interspeech
, pp. 2360-2364
-
-
Huang, Y.1
Yu, D.2
Gong, Y.3
Liu, C.4
-
186
-
-
85032751120
-
Parameter estimation of statistical models using convex optimization: An advanced method of discriminative training for speech and language processing
-
H. Jiang and X. Li. Parameter estimation of statistical models using convex optimization: An advanced method of discriminative training for speech and language processing. IEEE Signal Processing Magazine, 27(3):115-127, 2010.
-
(2010)
IEEE Signal Processing Magazine
, vol.27
, Issue.3
, pp. 115-127
-
-
Jiang, H.1
Li, X.2
-
187
-
-
0022691022
-
Maximum likelihood estimation for multivariate mixture observations of Markov chains
-
B. Juang, S. Levinson, and M. Sondhi. Maximum likelihood estimation for multivariate mixture observations of Markov chains. IEEE Transactions on Information Theory, 32:307-309, 1986.
-
(1986)
IEEE Transactions on Information Theory
, vol.32
, pp. 307-309
-
-
Juang, B.1
Levinson, S.2
Sondhi, M.3
-
192
-
-
85162460675
-
Learning convolutional feature hierarchies for visual recognition
-
K. Kavukcuoglu, P. Sermanet, Y. Boureau, K. Gregor, M. Mathieu, and Y. LeCun. Learning convolutional feature hierarchies for visual recognition. In Proceedings of Neural Information Processing Systems (NIPS). 2010.
-
(2010)
Proceedings of Neural Information Processing Systems (NIPS)
-
-
Kavukcuoglu, K.1
Sermanet, P.2
Boureau, Y.3
Gregor, K.4
Mathieu, M.5
Lecun, Y.6
-
193
-
-
77955803591
-
Enhanced phone posteriors for improving speech recognition systems
-
August
-
H. Ketabdar and H. Bourlard. Enhanced phone posteriors for improving speech recognition systems. IEEE Transactions on Audio, Speech, and Language Processing, 18(6):1094-1106, August 2010.
-
(2010)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.18
, Issue.6
, pp. 1094-1106
-
-
Ketabdar, H.1
Bourlard, H.2
-
195
-
-
84878379108
-
Scalable minimum bayes risk training of deep neural network acoustic models using distributed hessian-free optimization
-
B. Kingsbury, T. Sainath, and H. Soltau. Scalable minimum bayes risk training of deep neural network acoustic models using distributed hessian-free optimization. In Proceedings of Interspeech. 2012.
-
(2012)
Proceedings of Interspeech
-
-
Kingsbury, B.1
Sainath, T.2
Soltau, H.3
-
197
-
-
84890495150
-
Eigentriphones for context-dependent acoustic modeling
-
T. Ko and B. Mak. Eigentriphones for context-dependent acoustic modeling. IEEE Transactions on Audio, Speech, and Language Processing, 21(6):1285-1294, 2013.
-
(2013)
IEEE Transactions on Audio Speech, and Language Processing
, vol.21
, Issue.6
, pp. 1285-1294
-
-
Ko, T.1
Mak, B.2
-
199
-
-
84878534913
-
Integrating deep neural networks into structural classification approach based on weighted finite-state transducers
-
Y. Kubo, T. Hori, and A. Nakamura. Integrating deep neural networks into structural classification approach based on weighted finite-state transducers. In Proceedings of Interspeech. 2012.
-
(2012)
Proceedings of Interspeech
-
-
Kubo, Y.1
Hori, T.2
Nakamura, A.3
-
201
-
-
84887376692
-
Cross-lingual automatic speech recognition using tandem features
-
December
-
P. Lal and S. King. Cross-lingual automatic speech recognition using tandem features. IEEE Transactions on Audio, Speech, and Language Processing, 21(12):2506-2515, December 2013.
-
(2013)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.21
, Issue.12
, pp. 2506-2515
-
-
Lal, P.1
King, S.2
-
202
-
-
0025254722
-
A time-delay neural network architecture for isolated word recognition
-
K. Lang, A. Waibel, and G. Hinton. A time-delay neural network architecture for isolated word recognition. Neural Networks, 3(1):23-43, 1990.
-
(1990)
Neural Networks
, vol.3
, Issue.1
, pp. 23-43
-
-
Lang, K.1
Waibel, A.2
Hinton, G.3
-
207
-
-
84869479578
-
Structured output layer neural network language models for speech recognition
-
January
-
H. Le, I. Oparin, A. Allauzen, J.-L. Gauvain, and F. Yvon. Structured output layer neural network language models for speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 21(1):197-206, January 2013.
-
(2013)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.21
, Issue.1
, pp. 197-206
-
-
Le, H.1
Oparin, I.2
Allauzen, A.3
Gauvain, J.-L.4
Yvon, F.5
-
208
-
-
80053437034
-
On optimization methods for deep learning
-
Q. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, and A. Ng. On optimization methods for deep learning. In Proceedings of International Conference on Machine Learning (ICML). 2011.
-
(2011)
Proceedings of International Conference on Machine Learning (ICML)
-
-
Le, Q.1
Ngiam, J.2
Coates, A.3
Lahiri, A.4
Prochnow, B.5
Ng, A.6
-
209
-
-
84867135575
-
Building high-level features using large scale unsupervised learning
-
Q. Le, M. Ranzato, R. Monga, M. Devin, G. Corrado, K. Chen, J. Dean, and A. Ng. Building high-level features using large scale unsupervised learning. In Proceedings of International Conference on Machine Learning (ICML). 2012.
-
(2012)
Proceedings of International Conference on Machine Learning (ICML)
-
-
Le, Q.1
Ranzato, M.2
Monga, R.3
Devin, M.4
Corrado, G.5
Chen, K.6
Dean, J.7
Ng, A.8
-
211
-
-
0002263996
-
Convolutional networks for images, speech, and time series
-
In M. Arbib, editor MIT Press, Cambridge, Massachusetts
-
Y. LeCun and Y. Bengio. Convolutional networks for images, speech, and time series. In M. Arbib, editor, The Handbook of Brain Theory and Neural Networks, pages 255-258. MIT Press, Cambridge, Massachusetts, 1995.
-
(1995)
The Handbook of Brain Theory and Neural Networks
, pp. 255-258
-
-
Lecun, Y.1
Bengio, Y.2
-
212
-
-
0032203257
-
Gradient-based learning applied to document recognition
-
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86:2278-2324, 1998.
-
(1998)
Proceedings of the IEEE
, vol.86
, pp. 2278-2324
-
-
Lecun, Y.1
Bottou, L.2
Bengio, Y.3
Haffner, P.4
-
214
-
-
85009128804
-
From knowledge-ignorant to knowledge-rich modeling: A new speech research paradigm for next-generation automatic speech recognition
-
C.-H. Lee. From knowledge-ignorant to knowledge-rich modeling: A new speech research paradigm for next-generation automatic speech recognition. In Proceedings of International Conference on Spoken Language Processing (ICSLP), pages 109-111. 2004.
-
(2004)
Proceedings of International Conference on Spoken Language Processing (ICSLP)
, pp. 109-111
-
-
Lee, C.-H.1
-
216
-
-
80053540444
-
Unsupervised learning of hierarchical representations with convolutional deep belief networks
-
October
-
H. Lee, R. Grosse, R. Ranganath, and A. Ng. Unsupervised learning of hierarchical representations with convolutional deep belief networks. Communications of the Association for Computing Machinery (ACM), 54(10):95-103, October 2011.
-
(2011)
Communications of the Association for Computing Machinery (ACM)
, vol.54
, Issue.10
, pp. 95-103
-
-
Lee, H.1
Grosse, R.2
Ranganath, R.3
Ng, A.4
-
220
-
-
84897943848
-
An overview of noise-robust automatic speech recognition
-
J. Li, L. Deng, Y. Gong, and R. Haeb-Umbach. An overview of noise-robust automatic speech recognition. IEEE/Association for Computing Machinery (ACM) Transactions on Audio, Speech, and Language Processing, pages 1-33, 2014.
-
(2014)
IEEE/Association for Computing Machinery (ACM) Transactions on Audio, Speech, and Language Processing
, pp. 1-33
-
-
Li, J.1
Deng, L.2
Gong, Y.3
Haeb-Umbach, R.4
-
225
-
-
70349220094
-
A study on multilingual acoustic modeling for large vocabulary ASR
-
H. Lin, L. Deng, D. Yu, Y. Gong, A. Acero, and C.-H. Lee. A study on multilingual acoustic modeling for large vocabulary ASR. In Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP). 2009.
-
(2009)
Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP)
-
-
Lin, H.1
Deng, L.2
Yu, D.3
Gong, Y.4
Acero, A.5
Lee, C.-H.6
-
226
-
-
80052870284
-
Large-scale image classification: Fast feature extraction and SVM training
-
Y. Lin, F. Lv, S. Zhu, M. Yang, T. Cour, K. Yu, L. Cao, and T. Huang. Large-scale image classification: Fast feature extraction and SVM training. In Proceedings of Computer Vision and Pattern Recognition (CVPR). 2011.
-
(2011)
Proceedings of Computer Vision and Pattern Recognition (CVPR)
-
-
Lin, Y.1
Lv, F.2
Zhu, S.3
Yang, M.4
Cour, T.5
Yu, K.6
Cao, L.7
Huang, T.8
-
227
-
-
84901237776
-
Modeling spectral envelopes using restricted boltzmann machines and deep belief networks for statistical parametric speech synthesis
-
Z. Ling, L. Deng, and D. Yu. Modeling spectral envelopes using restricted boltzmann machines and deep belief networks for statistical parametric speech synthesis. IEEE Transactions on Audio Speech Language Processing, 21(10):2129-2139, 2013.
-
(2013)
IEEE Transactions on Audio Speech Language Processing
, vol.21
, Issue.10
, pp. 2129-2139
-
-
Ling, Z.1
Deng, L.2
Yu, D.3
-
229
-
-
84869440340
-
Articulatory control of HMMbased parametric speech synthesis using feature-space-switched multiple regression
-
January
-
Z. Ling, K. Richmond, and J. Yamagishi. Articulatory control of HMMbased parametric speech synthesis using feature-space-switched multiple regression. IEEE Transactions on Audio, Speech, and Language Processing, 21, January 2013.
-
(2013)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.21
-
-
Ling, Z.1
Richmond, K.2
Yamagishi, J.3
-
230
-
-
84880526709
-
Joint uncertainty decoding for noise robust subspace gaussian mixture models
-
L. Lu, K. Chin, A. Ghoshal, and S. Renals. Joint uncertainty decoding for noise robust subspace gaussian mixture models. IEEE Transactions on Audio, Speech, and Language Processing, 21(9):1791-1804, 2013.
-
(2013)
IEEE Transactions on Audio Speech, and Language Processing
, vol.21
, Issue.9
, pp. 1791-1804
-
-
Lu, L.1
Chin, K.2
Ghoshal, A.3
Renals, S.4
-
231
-
-
0001523807
-
A path-stack algorithm for optimizing dynamic regimes in a statistical hidden dynamical model of speech
-
J. Ma and L. Deng. A path-stack algorithm for optimizing dynamic regimes in a statistical hidden dynamical model of speech. Computer, Speech and Language, 2000.
-
(2000)
Computer, Speech and Language
-
-
Ma, J.1
Deng, L.2
-
232
-
-
0347968275
-
Efficient decoding strategies for conversational speech recognition using a constrained nonlinear state-space model
-
J. Ma and L. Deng. Efficient decoding strategies for conversational speech recognition using a constrained nonlinear state-space model. IEEE Transactions on Speech and Audio Processing, 11(6):590-602, 2003.
-
(2003)
IEEE Transactions on Speech and Audio Processing
, vol.11
, Issue.6
, pp. 590-602
-
-
Ma, J.1
Deng, L.2
-
233
-
-
0742307392
-
Target-directed mixture dynamic models for spontaneous speech recognition
-
J. Ma and L. Deng. Target-directed mixture dynamic models for spontaneous speech recognition. IEEE Transactions on Speech and Audio Processing, 12(1):47-58, 2004.
-
(2004)
IEEE Transactions on Speech and Audio Processing
, vol.12
, Issue.1
, pp. 47-58
-
-
Ma, J.1
Deng, L.2
-
234
-
-
84905286094
-
Rectifier nonlinearities improve neural network acoustic models
-
A. Maas, A. Hannun, and A. Ng. Rectifier nonlinearities improve neural network acoustic models. International Conference on Machine Learning (ICML) Workshop on Deep Learning for Audio, Speech, and Language Processing, 2013.
-
(2013)
International Conference on Machine Learning (ICML) Workshop on Deep Learning for Audio, Speech, and Language Processing
-
-
Maas, A.1
Hannun, A.2
Ng, A.3
-
235
-
-
84878409063
-
Recurrent neural networks for noise reduction in robust ASR
-
A. Maas, Q. Le, T. O'Neil, O. Vinyals, P. Nguyen, and P. Ng. Recurrent neural networks for noise reduction in robust ASR. In Proceedings of Interspeech. 2012.
-
(2012)
Proceedings of Interspeech
-
-
Maas, A.1
Le, Q.2
O'neil, T.3
Vinyals, O.4
Nguyen, P.5
Ng, P.6
-
237
-
-
84903700854
-
Scientists see promise in deep-learning programs
-
November 24
-
J. Markoff. Scientists see promise in deep-learning programs. New York Times, November 24 2012.
-
(2012)
New York Times
-
-
Markoff, J.1
-
241
-
-
84871369973
-
Learning lexicons from speech using a pronunciation mixture model
-
February
-
I. McGraw, I. Badr, and J. R. Glass. Learning lexicons from speech using a pronunciation mixture model. IEEE Transactions on Audio, Speech, and Language Processing, 21(2):357,366, February 2013.
-
(2013)
IEEE Transactions on Audio Speech, and Language Processing
, vol.21
, Issue.2
, pp. 357-366
-
-
McGraw, I.1
Badr, I.2
Glass, J.R.3
-
242
-
-
84906237242
-
Investigation of recurrentneural- network architectures and learning methods for spoken language understanding
-
G. Mesnil, X. He, L. Deng, and Y. Bengio. Investigation of recurrentneural- network architectures and learning methods for spoken language understanding. In Proceedings of Interspeech. 2013.
-
(2013)
Proceedings of Interspeech
-
-
Mesnil, G.1
He, X.2
Deng, L.3
Bengio, Y.4
-
243
-
-
84906273501
-
Improving low-resource CD-DNN-HMM using dropout and multilingual DNN training
-
Y. Miao and F. Metze. Improving low-resource CD-DNN-HMM using dropout and multilingual DNN training. In Proceedings of Interspeech. 2013.
-
(2013)
Proceedings of Interspeech
-
-
Miao, Y.1
Metze, F.2
-
248
-
-
79959829092
-
Recurrent neural network based language model
-
T. Mikolov, M. Karafiat, L. Burget, J. Cernocky, and S. Khudanpur. Recurrent neural network based language model. In Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP), pages 1045-1048. 2010.
-
(2010)
Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP)
, pp. 1045-1048
-
-
Mikolov, T.1
Karafiat, M.2
Burget, L.3
Cernocky, J.4
Khudanpur, S.5
-
256
-
-
84904867557
-
Playing arari with deep reinforcement learning
-
also arXiv:1312.5602v1
-
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller. Playing arari with deep reinforcement learning. Neural Information Processing Systems (NIPS) Deep Learning Workshop, 2013. also arXiv:1312.5602v1.
-
(2013)
Neural Information Processing Systems (NIPS) Deep Learning Workshop
-
-
Mnih, V.1
Kavukcuoglu, K.2
Silver, D.3
Graves, A.4
Antonoglou, I.5
Wierstra, D.6
Riedmiller, M.7
-
258
-
-
84055211743
-
Acoustic modeling using deep belief networks
-
January
-
A. Mohamed, G. Dahl, and G. Hinton. Acoustic modeling using deep belief networks. IEEE Transactions on Audio, Speech, & Language Processing, 20(1), January 2012.
-
(2012)
IEEE Transactions on Audio, Speech, & Language Processing
, vol.20
, Issue.1
-
-
Mohamed, A.1
Dahl, G.2
Hinton, G.3
-
260
-
-
79959840616
-
Investigation of full-sequence training of deep belief networks for speech recognition
-
A. Mohamed, D. Yu, and L. Deng. Investigation of full-sequence training of deep belief networks for speech recognition. In Proceedings of Interspeech. 2010.
-
(2010)
Proceedings of Interspeech
-
-
Mohamed, A.1
Yu, D.2
Deng, L.3
-
262
-
-
85032751546
-
Pushing the envelope- aside [speech recognition]
-
September
-
N. Morgan, Q. Zhu, A. Stolcke, K. Sonmez, S. Sivadas, T. Shinozaki, M. Ostendorf, P. Jain, H. Hermansky, D. Ellis, G. Doddington, B. Chen, O. Cretin, H. Bourlard, and M. Athineos. Pushing the envelope- aside [speech recognition]. IEEE Signal Processing Magazine, 22(5):81-88, September 2005.
-
(2005)
IEEE Signal Processing Magazine
, vol.22
, Issue.5
, pp. 81-88
-
-
Morgan, N.1
Zhu, Q.2
Stolcke, A.3
Sonmez, K.4
Sivadas, S.5
Shinozaki, T.6
Ostendorf, M.7
Jain, P.8
Hermansky, H.9
Ellis, D.10
Doddington, G.11
Chen, B.12
Cretin, O.13
Bourlard, H.14
Athineos, M.15
-
269
-
-
80053437179
-
Multimodal deep learning
-
J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Ng. Multimodal deep learning. In Proceedings of International Conference on Machine Learning (ICML). 2011.
-
(2011)
Proceedings of International Conference on Machine Learning (ICML)
-
-
Ngiam, J.1
Khosla, A.2
Kim, M.3
Nam, J.4
Lee, H.5
Ng, A.6
-
270
-
-
84898979068
-
-
arXiv:1312.5650v2
-
M. Norouzi, T. Mikolov, S. Bengio, J. Shlens, A. Frome, G. Corrado, and J. Dean. Zero-shot learning by convex combination of semantic embeddings. arXiv:1312.5650v2, 2013.
-
(2013)
Zero-shot Learning by Convex Combination of Semantic Embeddings
-
-
Norouzi, M.1
Mikolov, T.2
Bengio, S.3
Shlens, J.4
Frome, A.5
Corrado, G.6
Dean, J.7
-
271
-
-
4944221356
-
Layered representations for learning and inferring office activity from multiple sensory channels
-
N. Oliver, A. Garg, and E. Horvitz. Layered representations for learning and inferring office activity from multiple sensory channels. Computer Vision and Image Understanding, 96:163-180, 2004.
-
(2004)
Computer Vision and Image Understanding
, vol.96
, pp. 163-180
-
-
Oliver, N.1
Garg, A.2
Horvitz, E.3
-
275
-
-
80052069937
-
Probabilistic template-based chord recognition
-
November
-
L. Oudre, C. Fevotte, and Y. Grenier. Probabilistic template-based chord recognition. IEEE Transactions on Audio, Speech, and Language Processing, 19(8):2249-2259, November 2011.
-
(2011)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.19
, Issue.8
, pp. 2249-2259
-
-
Oudre, L.1
Fevotte, C.2
Grenier, Y.3
-
278
-
-
69849103259
-
Adaptive multimodal fusion by uncertainty compensation with application to audiovisual speech recognition
-
G. Papandreou, A. Katsamanis, V. Pitsikalis, and P. Maragos. Adaptive multimodal fusion by uncertainty compensation with application to audiovisual speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 17:423-435, 2009.
-
(2009)
IEEE Transactions on Audio Speech, and Language Processing
, vol.17
, pp. 423-435
-
-
Papandreou, G.1
Katsamanis, A.2
Pitsikalis, V.3
Maragos, P.4
-
282
-
-
0032639922
-
Initial evaluation of hidden dynamic models on conversational speech
-
P. Picone, S. Pike, R. Regan, T. Kamm, J. bridle, L. Deng, Z. Ma, H. Richards, and M. Schuster. Initial evaluation of hidden dynamic models on conversational speech. In Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP). 1999.
-
(1999)
Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP)
-
-
Picone, P.1
Pike, S.2
Regan, R.3
Kamm, T.4
Bridle, J.5
Deng, L.6
Ma, Z.7
Richards, H.8
Schuster, M.9
-
283
-
-
78049251448
-
Analysis of MLP-based hierarchical phone posterior probability estimators
-
February
-
J. Pinto, S. Garimella, M. Magimai-Doss, H. Hermansky, and H. Bourlard. Analysis of MLP-based hierarchical phone posterior probability estimators. IEEE Transactions on Audio, Speech, and Language Processing, 19(2), February 2011.
-
(2011)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.19
, Issue.2
-
-
Pinto, J.1
Garimella, S.2
Magimai-Doss, M.3
Hermansky, H.4
Bourlard, H.5
-
286
-
-
0029310084
-
Holographic reduced representations
-
May
-
T. Plate. Holographic reduced representations. IEEE Transactions on Neural Networks, 6(3):623-641, May 1995.
-
(1995)
IEEE Transactions on Neural Networks
, vol.6
, Issue.3
, pp. 623-641
-
-
Plate, T.1
-
287
-
-
84903722546
-
How the brain might work: The role of information and learning in understanding and replicating intelligence
-
In G. Jacovitt, A. Pettorossi, R. Consolo, and V. Senni, editors Lateran University Press
-
T. Poggio. How the brain might work: The role of information and learning in understanding and replicating intelligence. In G. Jacovitt, A. Pettorossi, R. Consolo, and V. Senni, editors, Information: Science and Technology for the New Century, pages 45-61. Lateran University Press, 2007.
-
(2007)
Information: Science and Technology for the New Century
, pp. 45-61
-
-
Poggio, T.1
-
288
-
-
0025519291
-
Recursive distributed representations
-
J. Pollack. Recursive distributed representations. Artificial Intelligence, 46:77-105, 1990.
-
(1990)
Artificial Intelligence
, vol.46
, pp. 77-105
-
-
Pollack, J.1
-
292
-
-
0031003679
-
Optimality: From neural networks to universal grammar
-
A. Prince and P. Smolensky. Optimality: From neural networks to universal grammar. Science, 275:1604-1610, 1997.
-
(1997)
Science
, vol.275
, pp. 1604-1610
-
-
Prince, A.1
Smolensky, P.2
-
293
-
-
0024610919
-
A tutorial on hidden markov models and selected applications in speech recognition
-
L. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. In Proceedings of the IEEE, pages 257-286. 1989.
-
(1989)
Proceedings of the IEEE
, pp. 257-286
-
-
Rabiner, L.1
-
299
-
-
0030419718
-
Construction of state-dependent dynamic parameters by maximum likelihood: Applications to speech recognition
-
C. Rathinavalu and L. Deng. Construction of state-dependent dynamic parameters by maximum likelihood: Applications to speech recognition. Signal Processing, 55(2):149-165, 1997.
-
(1997)
Signal Processing
, vol.55
, Issue.2
, pp. 149-165
-
-
Rathinavalu, C.1
Deng, L.2
-
301
-
-
85032751986
-
Single-channel multi-talker speech recognition - Graphical modeling approaches
-
S. Rennie, H. Hershey, and P. Olsen. Single-channel multi-talker speech recognition - graphical modeling approaches. IEEE Signal Processing Magazine, 33:66-80, 2010.
-
(2010)
IEEE Signal Processing Magazine
, vol.33
, pp. 66-80
-
-
Rennie, S.1
Hershey, H.2
Olsen, P.3
-
303
-
-
80053460450
-
Contractive autoencoders: Explicit invariance during feature extraction
-
S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio. Contractive autoencoders: Explicit invariance during feature extraction. In Proceedings of International Conference on Machine Learning (ICML), pages 833-840. 2011.
-
(2011)
Proceedings of International Conference on Machine Learning (ICML)
, pp. 833-840
-
-
Rifai, S.1
Vincent, P.2
Muller, X.3
Glorot, X.4
Bengio, Y.5
-
304
-
-
0028392167
-
An application of recurrent nets to phone probability estimation
-
A. Robinson. An application of recurrent nets to phone probability estimation. IEEE Transactions on Neural Networks, 5:298-305, 1994.
-
(1994)
IEEE Transactions on Neural Networks
, vol.5
, pp. 298-305
-
-
Robinson, A.1
-
305
-
-
84903710549
-
-
arXiv: 1309.1508v3
-
T. Sainath, L. Horesh, B. Kingsbury, A. Aravkin, and B. Ramabhadran. Accelerating hessian-free optimization for deep neural networks by implicit pre-conditioning and sampling. arXiv: 1309.1508v3, 2013.
-
(2013)
Accelerating Hessian-free Optimization for Deep Neural Networks by Implicit Pre-conditioning and Sampling
-
-
Sainath, T.1
Horesh, L.2
Kingsbury, B.3
Aravkin, A.4
Ramabhadran, B.5
-
306
-
-
84893654379
-
Improvements to deep convolutional neural networks for LVCSR
-
T. Sainath, B. Kingsbury, A. Mohamed, G. Dahl, G. Saon, H. Soltau, T. Beran, A. Aravkin, and B. Ramabhadran. Improvements to deep convolutional neural networks for LVCSR. In Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU). 2013.
-
(2013)
Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU)
-
-
Sainath, T.1
Kingsbury, B.2
Mohamed, A.3
Dahl, G.4
Saon, G.5
Soltau, H.6
Beran, T.7
Aravkin, A.8
Ramabhadran, B.9
-
311
-
-
84886829539
-
Optimization techniques to improve training speed of deep neural networks for large speech tasks
-
November
-
T. Sainath, B. Kingsbury, H. Soltau, and B. Ramabhadran. Optimization techniques to improve training speed of deep neural networks for large speech tasks. IEEE Transactions on Audio, Speech, and Language Processing, 21(11):2267-2276, November 2013.
-
(2013)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.21
, Issue.11
, pp. 2267-2276
-
-
Sainath, T.1
Kingsbury, B.2
Soltau, H.3
Ramabhadran, B.4
-
313
-
-
80053610626
-
Exemplar-based sparse representation features: From TIMIT to LVCSR
-
November
-
T. Sainath, B. Ramabhadran, M. Picheny, D. Nahamoo, and D. Kanevsky. Exemplar-based sparse representation features: From TIMIT to LVCSR. IEEE Transactions on Speech and Audio Processing, November 2011.
-
(2011)
IEEE Transactions on Speech and Audio Processing
-
-
Sainath, T.1
Ramabhadran, B.2
Picheny, M.3
Nahamoo, D.4
Kanevsky, D.5
-
320
-
-
84905273821
-
Continuous space translation models for phrase-based statistical machine translation
-
H. Schwenk. Continuous space translation models for phrase-based statistical machine translation. In Proceedings of Computional Linguistics. 2012.
-
(2012)
Proceedings of Computional Linguistics
-
-
Schwenk, H.1
-
324
-
-
84865801985
-
Conversational speech transcription using context-dependent deep neural networks
-
F. Seide, G. Li, and D. Yu. Conversational speech transcription using context-dependent deep neural networks. In Proceedings of Interspeech, pages 437-440. 2011.
-
(2011)
Proceedings of Interspeech
, pp. 437-440
-
-
Seide, F.1
Li, G.2
Yu, D.3
-
326
-
-
84872190545
-
Autoregressive models for statistical parametric speech synthesis
-
M. Shannon, H. Zen, and W. Byrne. Autoregressive models for statistical parametric speech synthesis. IEEE Transactions on Audio, Speech, Language Processing, 21(3):587-597, 2013.
-
(2013)
IEEE Transactions on Audio, Speech, Language Processing
, vol.21
, Issue.3
, pp. 587-597
-
-
Shannon, M.1
Zen, H.2
Byrne, W.3
-
327
-
-
0028195651
-
Waveform-based speech recognition using hidden filter models: Parameter selection and sensitivity to power normalization
-
H. Sheikhzadeh and L. Deng. Waveform-based speech recognition using hidden filter models: Parameter selection and sensitivity to power normalization. IEEE Transactions on on Speech and Audio Processing (ICASSP), 2:80-91, 1994.
-
(1994)
IEEE Transactions on on Speech and Audio Processing (ICASSP)
, vol.2
, pp. 80-91
-
-
Sheikhzadeh, H.1
Deng, L.2
-
328
-
-
84990946747
-
Learning semantic representations using convolutional neural networks for web search
-
Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil. Learning semantic representations using convolutional neural networks for web search. In Proceedings World Wide Web. 2014.
-
(2014)
Proceedings World Wide Web
-
-
Shen, Y.1
He, X.2
Gao, J.3
Deng, L.4
Mesnil, G.5
-
330
-
-
84881054791
-
Hermitian polynomial for speaker adaptation of connectionist speech recognition systems
-
M. Siniscalchi, J. Li, and C. Lee. Hermitian polynomial for speaker adaptation of connectionist speech recognition systems. IEEE Transactions on Audio, Speech, and Language Processing, 21(10):2152-2161, 2013a.
-
(2013)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.21
, Issue.10
, pp. 2152-2161
-
-
Siniscalchi, M.1
Li, J.2
Lee, C.3
-
331
-
-
84872967500
-
A bottom-up modular search approach to large vocabulary continuous speech recognition
-
M. Siniscalchi, T. Svendsen, and C.-H. Lee. A bottom-up modular search approach to large vocabulary continuous speech recognition. IEEE Transactions on Audio, Speech, Language Processing, 21, 2013.
-
(2013)
IEEE Transactions on Audio, Speech, Language Processing
, vol.21
-
-
Siniscalchi, M.1
Svendsen, T.2
Lee, C.-H.3
-
332
-
-
84875405186
-
Exploiting deep neural networks for detection-based speech recognition
-
M. Siniscalchi, D. Yu, L. Deng, and C.-H. Lee. Exploiting deep neural networks for detection-based speech recognition. Neurocomputing, 106:148-157, 2013.
-
(2013)
Neurocomputing
, vol.106
, pp. 148-157
-
-
Siniscalchi, M.1
Yu, D.2
Deng, L.3
Lee, C.-H.4
-
333
-
-
84873303660
-
Speech recognition using long-span temporal patterns in a deep network model
-
March
-
M. Siniscalchi, D. Yu, L. Deng, and C.-H. Lee. Speech recognition using long-span temporal patterns in a deep network model. IEEE Signal Processing Letters, 20(3):201-204, March 2013.
-
(2013)
IEEE Signal Processing Letters
, vol.20
, Issue.3
, pp. 201-204
-
-
Siniscalchi, M.1
Yu, D.2
Deng, L.3
Lee, C.-H.4
-
334
-
-
84055212007
-
Sparse multilayer perceptrons for phoneme recognition
-
January
-
G. Sivaram and H. Hermansky. Sparse multilayer perceptrons for phoneme recognition. IEEE Transactions on Audio, Speech, & Language Processing, 20(1), January 2012.
-
(2012)
IEEE Transactions on Audio, Speech, & Language Processing
, vol.20
, Issue.1
-
-
Sivaram, G.1
Hermansky, H.2
-
335
-
-
0025516779
-
Tensor product variable binding and the representation of symbolic structures in connectionist systems
-
P. Smolensky. Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial Intelligence, 46:159-216, 1990.
-
(1990)
Artificial Intelligence
, vol.46
, pp. 159-216
-
-
Smolensky, P.1
-
339
-
-
84905233165
-
-
Tutorial at Association of Computational Logistics (ACL), 2012, and North American Chapter of the Association of Computational Linguistics (NAACL)
-
R. Socher, Y. Bengio, and C. Manning. Deep learning for NLP. Tutorial at Association of Computational Logistics (ACL), 2012, and North American Chapter of the Association of Computational Linguistics (NAACL), 2013. http://www.socher.org/index.php/DeepLearning Tutorial.
-
(2013)
Deep Learning for NLP
-
-
Socher, R.1
Bengio, Y.2
Manning, C.3
-
342
-
-
84898938559
-
Zero-shot learning through cross-modal transfer
-
R. Socher, M. Ganjoo, H. Sridhar, O. Bastani, C. Manning, and A. Ng. Zero-shot learning through cross-modal transfer. In Proceedings of Neural Information Processing Systems (NIPS). 2013b.
-
(2013)
Proceedings of Neural Information Processing Systems (NIPS)
-
-
Socher, R.1
Ganjoo, M.2
Sridhar, H.3
Bastani, O.4
Manning, C.5
Ng, A.6
-
347
-
-
84926358845
-
Recursive deep models for semantic compositionality over a sentiment treebank
-
R. Socher, A. Perelygin, J. Wu, J. Chuang, C. Manning, A. Ng, and C. Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP). 2013.
-
(2013)
Proceedings of Empirical Methods in Natural Language Processing (EMNLP)
-
-
Socher, R.1
Perelygin, A.2
Wu, J.3
Chuang, J.4
Manning, C.5
Ng, A.6
Potts, C.7
-
351
-
-
85073226083
-
Preliminary investigation of boltzmann machine classifiers for speaker recognition
-
T. Stafylakis, P. Kenny, M. Senoussaoui, and P. Dumouchel. Preliminary investigation of boltzmann machine classifiers for speaker recognition. In Proceedings of Odyssey, pages 109-116. 2012.
-
(2012)
Proceedings of Odyssey
, pp. 109-116
-
-
Stafylakis, T.1
Kenny, P.2
Senoussaoui, M.3
Dumouchel, P.4
-
355
-
-
0036165806
-
An overlapping-feature based phonological model incorporating linguistic constraints: Applications to speech recognition
-
J. Sun and L. Deng. An overlapping-feature based phonological model incorporating linguistic constraints: Applications to speech recognition. Journal on Acoustical Society of America, 111(2):1086-1101, 2002.
-
(2002)
Journal on Acoustical Society of America
, vol.111
, Issue.2
, pp. 1086-1101
-
-
Sun, J.1
Deng, L.2
-
362
-
-
84890474716
-
Deep neural network features and semi-supervised training for low resource speech recognition
-
S. Thomas, M. Seltzer, K. Church, and H. Hermansky. Deep neural network features and semi-supervised training for low resource speech recognition. In Proceedings of Interspeech. 2013.
-
(2013)
Proceedings of Interspeech
-
-
Thomas, S.1
Seltzer, M.2
Church, K.3
Hermansky, H.4
-
364
-
-
84876687945
-
Speech synthesis based on hidden markov models
-
K. Tokuda, Y. Nankaku, T. Toda, H. Zen, H. Yamagishi, and K. Oura. Speech synthesis based on hidden markov models. Proceedings of the IEEE, 101(5):1234-1252, 2013.
-
(2013)
Proceedings of the IEEE
, vol.101
, Issue.5
, pp. 1234-1252
-
-
Tokuda, K.1
Nankaku, Y.2
Toda, T.3
Zen, H.4
Yamagishi, H.5
Oura, K.6
-
365
-
-
84886714036
-
Acoustic modeling with hierarchical reservoirs
-
November
-
F. Triefenbach, A. Jalalvand, K. Demuynck, and J.-P.Martens. Acoustic modeling with hierarchical reservoirs. IEEE Transactions on Audio, Speech, and Language Processing, 21(11):2439-2450, November 2013.
-
(2013)
IEEE Transactions on Audio Speech, and Language Processing
, vol.21
, Issue.11
, pp. 2439-2450
-
-
Triefenbach, F.1
Jalalvand, A.2
Demuynck, K.3
Martens, J.-P.4
-
370
-
-
79951668781
-
Extended VTS for noise-robust speech recognition
-
R. van Dalen and M. Gales. Extended VTS for noise-robust speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 19(4):733-743, 2011.
-
(2011)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.19
, Issue.4
, pp. 733-743
-
-
Van Dalen, R.1
Gales, M.2
-
375
-
-
79959575293
-
A connection between score matching and denoising autoencoder
-
P. Vincent. A connection between score matching and denoising autoencoder. Neural Computation, 23(7):1661-1674, 2011.
-
(2011)
Neural Computation
, vol.23
, Issue.7
, pp. 1661-1674
-
-
Vincent, P.1
-
376
-
-
79551480483
-
Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion
-
P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P. Manzagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11:3371-3408, 2010.
-
(2010)
Journal of Machine Learning Research
, vol.11
, pp. 3371-3408
-
-
Vincent, P.1
Larochelle, H.2
Lajoie, I.3
Bengio, Y.4
Manzagol, P.5
-
382
-
-
0024634603
-
Phoneme recognition using time-delay neural networks
-
A.Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. Lang. Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustical Speech, and Signal Processing, 37:328-339, 1989.
-
(1989)
IEEE Transactions on Acoustical Speech, and Signal Processing
, vol.37
, pp. 328-339
-
-
Waibel, A.1
Hanazawa, T.2
Hinton, G.3
Shikano, K.4
Lang, K.5
-
388
-
-
77955654853
-
Large scale image annotation: Learning to rank with joint word-image embeddings
-
J. Weston, S. Bengio, and N. Usunier. Large scale image annotation: Learning to rank with joint word-image embeddings. Machine Learning, 81(1):21-35, 2010.
-
(2010)
Machine Learning
, vol.81
, Issue.1
, pp. 21-35
-
-
Weston, J.1
Bengio, S.2
Usunier, N.3
-
390
-
-
84906237512
-
Investigations on hessian-free optimization for cross-entropy training of deep neural networks
-
S.Wiesler, J. Li, and J. Xue. Investigations on hessian-free optimization for cross-entropy training of deep neural networks. In Proceedings of Interspeech. 2013.
-
(2013)
Proceedings of Interspeech
-
-
Wiesler, S.1
Li, J.2
Xue, J.3
-
391
-
-
79951599228
-
A probabilistic interaction model for multi-pitch tracking with factorial hidden markov model
-
May
-
M. Wohlmayr, M. Stark, and F. Pernkopf. A probabilistic interaction model for multi-pitch tracking with factorial hidden markov model. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), May 2011.
-
(2011)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.19
, Issue.4
-
-
Wohlmayr, M.1
Stark, M.2
Pernkopf, F.3
-
392
-
-
0026692226
-
Stacked generalization
-
D. Wolpert. Stacked generalization. Neural Networks, 5(2):241-259, 1992.
-
(1992)
Neural Networks
, vol.5
, Issue.2
, pp. 241-259
-
-
Wolpert, D.1
-
393
-
-
84887037596
-
Optimization algorithms and applications for speech and language processing
-
November
-
S. J. Wright, D. Kanevsky, L. Deng, X. He, G. Heigold, and H. Li. Optimization algorithms and applications for speech and language processing. IEEE Transactions on Audio, Speech, and Language Processing, 21(11):2231-2243, November 2013.
-
(2013)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.21
, Issue.11
, pp. 2231-2243
-
-
Wright, S.J.1
Kanevsky, D.2
Deng, L.3
He, X.4
Heigold, G.5
Li, H.6
-
394
-
-
85032751865
-
A geometric perspective of large-margin training of gaussian models
-
November
-
L. Xiao and L. Deng. A geometric perspective of large-margin training of gaussian models. IEEE Signal Processing Magazine, 27(6):118-123, November 2010.
-
(2010)
IEEE Signal Processing Magazine
, vol.27
, Issue.6
, pp. 118-123
-
-
Xiao, L.1
Deng, L.2
-
395
-
-
0037313081
-
Equivalence of backpropagation and contrastive hebbian learning in a layered network
-
X. Xie and S. Seung. Equivalence of backpropagation and contrastive hebbian learning in a layered network. Neural computation, 15:441-454, 2003.
-
(2003)
Neural Computation
, vol.15
, pp. 441-454
-
-
Xie, X.1
Seung, S.2
-
396
-
-
84889257121
-
An experimental study on speech enhancement based on deep neural networks
-
Y. Xu, J. Du, L. Dai, and C. Lee. An experimental study on speech enhancement based on deep neural networks. IEEE Signal Processing Letters, 21(1):65-68, 2014.
-
(2014)
IEEE Signal Processing Letters
, vol.21
, Issue.1
, pp. 65-68
-
-
Xu, Y.1
Du, J.2
Dai, L.3
Lee, C.4
-
397
-
-
84906227589
-
Restructuring of deep neural network acoustic models with singular value decomposition
-
J. Xue, J. Li, and Y. Gong. Restructuring of deep neural network acoustic models with singular value decomposition. In Proceedings of Interspeech. 2013.
-
(2013)
Proceedings of Interspeech
-
-
Xue, J.1
Li, J.2
Gong, Y.3
-
398
-
-
66149085249
-
An integrative and discriminative technique for spoken utterance classification
-
S. Yamin, L. Deng, Y.Wang, and A. Acero. An integrative and discriminative technique for spoken utterance classification. IEEE Transactions on Audio, Speech, and Language Processing, 16:1207-1214, 2008.
-
(2008)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.16
, pp. 1207-1214
-
-
Yamin, S.1
Deng, L.2
Wang, Y.3
Acero, A.4
-
399
-
-
84906225757
-
A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR
-
Z. Yan, Q. Huo, and J. Xu. A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR. In Proceedings of Interspeech. 2013.
-
(2013)
Proceedings of Interspeech
-
-
Yan, Z.1
Huo, Q.2
Xu, J.3
-
400
-
-
84866881711
-
Combining a two-step CRF model and a joint source-channel model for machine transliteration
-
D. Yang and S. Furui. Combining a two-step CRF model and a joint source-channel model for machine transliteration. In Proceedings of Association for Computational Linguistics (ACL), pages 275-280. 2010.
-
(2010)
Proceedings of Association for Computational Linguistics (ACL)
, pp. 275-280
-
-
Yang, D.1
Furui, S.2
-
401
-
-
84903733224
-
A fast maximum likelihood nonlinear feature transformation method for GMM-HMM speaker adaptation
-
K. Yao, D. Yu, L. Deng, and Y. Gong. A fast maximum likelihood nonlinear feature transformation method for GMM-HMM speaker adaptation. Neurocomputing, 2013a.
-
(2013)
Neurocomputing
-
-
Yao, K.1
Yu, D.2
Deng, L.3
Gong, Y.4
-
402
-
-
84874226579
-
Adaptation of context-dependent deep neural networks for automatic speech recognition
-
K. Yao, D. Yu, F. Seide, H. Su, L. Deng, and Y. Gong. Adaptation of context-dependent deep neural networks for automatic speech recognition. In Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP). 2012.
-
(2012)
Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP)
-
-
Yao, K.1
Yu, D.2
Seide, F.3
Su, H.4
Deng, L.5
Gong, Y.6
-
404
-
-
84881043147
-
Noise model transfer: Novel approach to robustness against nonstationary noise
-
T. Yoshioka and T. Nakatani. Noise model transfer: Novel approach to robustness against nonstationary noise. IEEE Transactions on Audio, Speech, and Language Processing, 21(10):2182-2192, 2013.
-
(2013)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.21
, Issue.10
, pp. 2182-2192
-
-
Yoshioka, T.1
Nakatani, T.2
-
406
-
-
33644756784
-
On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates
-
L. Younes. On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates. Stochastics and Stochastic Reports, 65(3):177-228, 1999.
-
(1999)
Stochastics and Stochastic Reports
, vol.65
, Issue.3
, pp. 177-228
-
-
Younes, L.1
-
409
-
-
85032752267
-
Solving nonlinear estimation problems using splines
-
July
-
D. Yu and L. Deng. Solving nonlinear estimation problems using splines. IEEE Signal Processing Magazine, 26(4):86-90, July 2009.
-
(2009)
IEEE Signal Processing Magazine
, vol.26
, Issue.4
, pp. 86-90
-
-
Yu, D.1
Deng, L.2
-
410
-
-
79959828814
-
Deep-structured hidden conditional random fields for phonetic recognition
-
September
-
D. Yu and L. Deng. Deep-structured hidden conditional random fields for phonetic recognition. In Proceedings of Interspeech. September 2010.
-
(2010)
Proceedings of Interspeech
-
-
Yu, D.1
Deng, L.2
-
411
-
-
84865770736
-
Accelerated parallelizable neural networks learning algorithms for speech recognition
-
D. Yu and L. Deng. Accelerated parallelizable neural networks learning algorithms for speech recognition. In Proceedings of Interspeech. 2011.
-
(2011)
Proceedings of Interspeech
-
-
Yu, D.1
Deng, L.2
-
412
-
-
85032782045
-
Deep learning and its applications to signal and information processing
-
January
-
D. Yu and L. Deng. Deep learning and its applications to signal and information processing. IEEE Signal Processing Magazine, pages 145-154, January 2011.
-
(2011)
IEEE Signal Processing Magazine
, pp. 145-154
-
-
Yu, D.1
Deng, L.2
-
413
-
-
84862822032
-
Efficient and effective algorithms for training singlehidden- layer neural networks
-
D. Yu and L. Deng. Efficient and effective algorithms for training singlehidden- layer neural networks. Pattern Recognition Letters, 33:554-558, 2012.
-
(2012)
Pattern Recognition Letters
, vol.33
, pp. 554-558
-
-
Yu, D.1
Deng, L.2
-
415
-
-
66149101303
-
Robust speech recognition using cepstral minimum-mean-square-error noise suppressor
-
July
-
D. Yu, L. Deng, J. Droppo, J. Wu, Y. Gong, and A. Acero. Robust speech recognition using cepstral minimum-mean-square-error noise suppressor. IEEE Transactions on Audio, Speech, and Language Processing, 16(5), July 2008.
-
(2008)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.16
, Issue.5
-
-
Yu, D.1
Deng, L.2
Droppo, J.3
Wu, J.4
Gong, Y.5
Acero, A.6
-
416
-
-
68549140008
-
A novel framework and training algorithm for variable-parameter hidden markov models
-
D. Yu, L. Deng, Y. Gong, and A. Acero. A novel framework and training algorithm for variable-parameter hidden markov models. IEEE Transactions on Audio, Speech and Language Processing, 17(7):1348-1360, 2009.
-
(2009)
IEEE Transactions on Audio, Speech and Language Processing
, vol.17
, Issue.7
, pp. 1348-1360
-
-
Yu, D.1
Deng, L.2
Gong, Y.3
Acero, A.4
-
417
-
-
42949105203
-
Large-margin minimum classification error training: A theoretical risk minimization perspective
-
October
-
D. Yu, L. Deng, X. He, and A. Acero. Large-margin minimum classification error training: A theoretical risk minimization perspective. Computer Speech and Language, 22(4):415-429, October 2008.
-
(2008)
Computer Speech and Language
, vol.22
, Issue.4
, pp. 415-429
-
-
Yu, D.1
Deng, L.2
He, X.3
Acero, A.4
-
420
-
-
70349197671
-
Cross-lingual speech recognition under runtime resource constraints
-
D. Yu, L. Deng, P. Liu, J. Wu, Y. Gong, and A. Acero. Cross-lingual speech recognition under runtime resource constraints. In Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP). 2009b.
-
(2009)
Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP)
-
-
Yu, D.1
Deng, L.2
Liu, P.3
Wu, J.4
Gong, Y.5
Acero, A.6
-
421
-
-
84878405171
-
Large vocabulary speech recognition using deep tensor neural networks
-
D. Yu, L. Deng, and F. Seide. Large vocabulary speech recognition using deep tensor neural networks. In Proceedings of Interspeech. 2012c.
-
(2012)
Proceedings of Interspeech
-
-
Yu, D.1
Deng, L.2
Seide, F.3
-
422
-
-
84871387302
-
The deep tensor neural network with applications to large vocabulary speech recognition
-
D. Yu, L. Deng, and F. Seide. The deep tensor neural network with applications to large vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 21(2):388-396, 2013.
-
(2013)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.21
, Issue.2
, pp. 388-396
-
-
Yu, D.1
Deng, L.2
Seide, F.3
-
423
-
-
85008521116
-
Calibration of confidence measures in speech recognition
-
D. Yu, J.-Y. Li, and L. Deng. Calibration of confidence measures in speech recognition. IEEE Transactions on Audio, Speech and Language, 19:2461-2473, 2010.
-
(2010)
IEEE Transactions on Audio, Speech and Language
, vol.19
, pp. 2461-2473
-
-
Yu, D.1
Li, J.-Y.2
Deng, L.3
-
425
-
-
84865785753
-
Improved bottleneck features using pre-trained deep neural networks
-
D. Yu and M. Seltzer. Improved bottleneck features using pre-trained deep neural networks. In Proceedings of Interspeech. 2011.
-
(2011)
Proceedings of Interspeech
-
-
Yu, D.1
Seltzer, M.2
-
431
-
-
65249094352
-
Unsupervised adaptation with discriminative mapping transforms
-
K. Yu, M. Gales, and P. Woodland. Unsupervised adaptation with discriminative mapping transforms. IEEE Transactions on Audio, Speech, and Language Processing, 17(4):714-723, 2009.
-
(2009)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.17
, Issue.4
, pp. 714-723
-
-
Yu, K.1
Gales, M.2
Woodland, P.3
-
438
-
-
85008525798
-
Product of experts for statistical parametric speech synthesis
-
March
-
H. Zen, M. Gales, J. F. Nankaku, and Y. K. Tokuda. Product of experts for statistical parametric speech synthesis. IEEE Transactions on Audio, Speech, and Language Processing, 20(3):794-805,March 2012.
-
(2012)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.20
, Issue.3
, pp. 794-805
-
-
Zen, H.1
Gales, M.2
Nankaku, J.F.3
Tokuda, Y.K.4
-
439
-
-
78149260085
-
Continuous stochastic feature mapping based on trajectory HMMs
-
February
-
H. Zen, Y. Nankaku, and K. Tokuda. Continuous stochastic feature mapping based on trajectory HMMs. IEEE Transactions on Audio, Speech, and Language Processings, 19(2):417-430, February 2011.
-
(2011)
IEEE Transactions on Audio, Speech, and Language Processings
, vol.19
, Issue.2
, pp. 417-430
-
-
Zen, H.1
Nankaku, Y.2
Tokuda, K.3
-
442
-
-
84872300403
-
Deep belief networks based voice activity detection
-
X. Zhang and J. Wu. Deep belief networks based voice activity detection. IEEE Transactions on Audio, Speech, and Language Processing, 21(4):697-710, 2013.
-
(2013)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.21
, Issue.4
, pp. 697-710
-
-
Zhang, X.1
Wu, J.2
-
443
-
-
4544290173
-
Multi-sensory microphones for robust speech detection, enhancement and recognition
-
Z. Zhang, Z. Liu, M. Sinclair, A. Acero, L. Deng, J. Droppo, X. Huang, and Y. Zheng. Multi-sensory microphones for robust speech detection, enhancement and recognition. In Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP). 2004.
-
(2004)
Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP)
-
-
Zhang, Z.1
Liu, Z.2
Sinclair, M.3
Acero, A.4
Deng, L.5
Droppo, J.6
Huang, X.7
Zheng, Y.8
-
444
-
-
84865208051
-
Nonlinear compensation using the gauss-newton method for noise-robust speech recognition
-
Y. Zhao and B. Juang. Nonlinear compensation using the gauss-newton method for noise-robust speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 20(8):2191-2206, 2012.
-
(2012)
IEEE Transactions on Audio Speech, and Language Processing
, vol.20
, Issue.8
, pp. 2191-2206
-
-
Zhao, Y.1
Juang, B.2
|