-
1
-
-
77950917809
-
Discriminative training for automatic speech recognition: A survey
-
H. Jiang, "Discriminative training for automatic speech recognition: A survey," Comput. Speech, Lang., vol. 24, no. 4, pp. 589-608, 2010.
-
(2010)
Comput. Speech, Lang.
, vol.24
, Issue.4
, pp. 589-608
-
-
Jiang, H.1
-
2
-
-
85032750905
-
Discriminative learning in sequential pattern recognition - A unifying review for optimization-oriented speech recognition
-
Sep.
-
X. He, L. Deng, and W. Chou, "Discriminative learning in sequential pattern recognition - A unifying review for optimization-oriented speech recognition," IEEE Signal Process. Mag., vol. 25, no. 5, pp. 14-36, Sep. 2008.
-
(2008)
IEEE Signal Process. Mag.
, vol.25
, Issue.5
, pp. 14-36
-
-
He, X.1
Deng, L.2
Chou, W.3
-
3
-
-
84876672166
-
Machine learning paradigms for speech recognition: An overview
-
May
-
L. Deng and X. Li, "Machine learning paradigms for speech recognition: An overview," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 5, pp. 1060-1089, May 2013.
-
(2013)
IEEE Trans. Audio, Speech, Lang. Process.
, vol.21
, Issue.5
, pp. 1060-1089
-
-
Deng, L.1
Li, X.2
-
4
-
-
85162069624
-
Phone recognition with the mean-covariance restricted Boltzmann machine
-
G. E. Dahl, M. Ranzato, A. Mohamed, and G. E. Hinton, "Phone recognition with the mean-covariance restricted Boltzmann machine," Adv. Neural Inf. Process. Syst., no. 23, 2010.
-
(2010)
Adv. Neural Inf. Process. Syst.
, Issue.23
-
-
Dahl, G.E.1
Ranzato, M.2
Mohamed, A.3
Hinton, G.E.4
-
5
-
-
80051654263
-
Deep belief networks using discriminative features for phone recognition
-
A. Mohamed, T. Sainath, G. Dahl, B. Ramabhadran, G. Hinton, and M. Picheny, "Deep belief networks using discriminative features for phone recognition," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), May 2011, pp. 5060-5063.
-
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), May 2011
, pp. 5060-5063
-
-
Mohamed, A.1
Sainath, T.2
Dahl, G.3
Ramabhadran, B.4
Hinton, G.5
Picheny, M.6
-
7
-
-
80051616844
-
Large vocabulary continuous speech recognition with context-dependent DBN-HMMs
-
G. Dahl, D. Yu, L. Deng, and A. Acero, "Large vocabulary continuous speech recognition with context-dependent DBN-HMMs," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2011, pp. 4688-4691.
-
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2011
, pp. 4688-4691
-
-
Dahl, G.1
Yu, D.2
Deng, L.3
Acero, A.4
-
8
-
-
84858976070
-
Feature engineering in con-text- dependent deep neural networks for conversational speech transcription
-
F. Seide, G. Li, X. Chen, and D. Yu, "Feature engineering in con-text- dependent deep neural networks for conversational speech transcription," in Proc. IEEE Workshop Autom. Speech Recognition Understand. (ASRU), 2011, pp. 24-29.
-
Proc. IEEE Workshop Autom. Speech Recognition Understand. (ASRU), 2011
, pp. 24-29
-
-
Seide, F.1
Li, G.2
Chen, X.3
Yu, D.4
-
9
-
-
84255177123
-
Deep and wide: Multiple layers in automatic speech recognition
-
Jan.
-
N. Morgan, "Deep and wide: Multiple layers in automatic speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 7-13, Jan. 2012.
-
(2012)
IEEE Trans. Audio, Speech, Lang. Process.
, vol.20
, Issue.1
, pp. 7-13
-
-
Morgan, N.1
-
11
-
-
79959840616
-
Investigation of full-sequence training of deep belief networks for speech recognition
-
A. Mohamed, D. Yu, and L. Deng, "Investigation of full-sequence training of deep belief networks for speech recognition," in Proc. Interspeech, 2010, pp. 2846-2849.
-
Proc. Interspeech, 2010
, pp. 2846-2849
-
-
Mohamed, A.1
Yu, D.2
Deng, L.3
-
12
-
-
84867614591
-
Scalable stacking and learning for building deep architectures
-
L. Deng, D. Yu, and J. Platt, "Scalable stacking and learning for building deep architectures," in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., 2012, pp. 2133-2136.
-
Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., 2012
, pp. 2133-2136
-
-
Deng, L.1
Yu, D.2
Platt, J.3
-
13
-
-
84055222005
-
Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
-
Jan.
-
G. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 30-42, Jan. 2012.
-
(2012)
IEEE Trans. Audio, Speech, Lang. Process.
, vol.20
, Issue.1
, pp. 30-42
-
-
Dahl, G.1
Yu, D.2
Deng, L.3
Acero, A.4
-
14
-
-
84865801985
-
Conversational speech transcription using context-dependent deep neural networks
-
F. Seide, G. Li, and D. Yu, "Conversational speech transcription using context-dependent deep neural networks," in Proc. Interspeech, 2011, pp. 437-440.
-
Proc. Interspeech, 2011
, pp. 437-440
-
-
Seide, F.1
Li, G.2
Yu, D.3
-
15
-
-
84858972572
-
Making deep belief networks effective for large vocabulary continuous speech recognition
-
T. N. Sainath, B. Kingsbury, B. Ramabhadran, P. Fousek, P. Novak, and A. Mohamed, "Making deep belief networks effective for large vocabulary continuous speech recognition," in IEEE Workshop Autom. Speech Recogn. Understand. (ASRU), 2011, pp. 30-35.
-
IEEE Workshop Autom. Speech Recogn. Understand. (ASRU), 2011
, pp. 30-35
-
-
Sainath, T.N.1
Kingsbury, B.2
Ramabhadran, B.3
Fousek, P.4
Novak, P.5
Mohamed, A.6
-
16
-
-
84874485803
-
Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: Why DNN surpasses GMMs in acoustic modeling
-
J. Pan, C. Liu, Z. Wang, Y. Hu, and H. Jiang, "Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: Why DNN surpasses GMMs in acoustic modeling," in Proc. ISCSLP, 2012.
-
Proc. ISCSLP, 2012
-
-
Pan, J.1
Liu, C.2
Wang, Z.3
Hu, Y.4
Jiang, H.5
-
17
-
-
85032751458
-
Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
-
Nov.
-
G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82-97, Nov. 2012.
-
(2012)
IEEE Signal Process. Mag.
, vol.29
, Issue.6
, pp. 82-97
-
-
Hinton, G.1
Deng, L.2
Yu, D.3
Dahl, G.4
Mohamed, A.5
Jaitly, N.6
Senior, A.7
Vanhoucke, V.8
Nguyen, P.9
Sainath, T.10
Kingsbury, B.11
-
18
-
-
84912924284
-
Learning a minimally structured back propagation network to recognize speech
-
T. Landauer, C. Kamm, and S. Singhal, "Learning a minimally structured back propagation network to recognize speech," in Proc. 9th Annu. Conf. Cogn. Sci. Soc., 1987, pp. 531-536.
-
Proc. 9th Annu. Conf. Cogn. Sci. Soc., 1987
, pp. 531-536
-
-
Landauer, T.1
Kamm, C.2
Singhal, S.3
-
20
-
-
24144487688
-
Tandem connectionist feature extraction for conversational speech recognition
-
Berlin/Heidelberg, Germany: Springer
-
Q. Zhu, B. Chen, N. Morgan, and A. Stolcke, "Tandem connectionist feature extraction for conversational speech recognition," in Machine Learning for Multimodal Interaction. Berlin/Heidelberg, Germany: Springer, 2005, vol. 3361, pp. 223-231.
-
(2005)
Machine Learning for Multimodal Interaction
, vol.3361
, pp. 223-231
-
-
Zhu, Q.1
Chen, B.2
Morgan, N.3
Stolcke, A.4
-
21
-
-
0033709098
-
Tandem connectionist feature extraction for conventional HMM systems
-
H. Hermansky, D. P. Ellis, and S. Sharma, "Tandem connectionist feature extraction for conventional HMM systems," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2000, vol. 3, pp. 1635-1638.
-
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2000
, vol.3
, pp. 1635-1638
-
-
Hermansky, H.1
Ellis, D.P.2
Sharma, S.3
-
22
-
-
34547548235
-
Probabilistic and bottle-neck features for LVCSR of meetings
-
F. Grézl, M. Karafiát, S. Kontár, and J. Cernocky, "Probabilistic and bottle-neck features for LVCSR of meetings," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2007, vol. 4, pp. 757-800.
-
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2007
, vol.4
, pp. 757-800
-
-
Grézl, F.1
Karafiát, M.2
Kontár, S.3
Cernocky, J.4
-
23
-
-
79959842828
-
Binary coding of speech spectrograms using a deep auto-encoder
-
L. Deng, M. Seltzer, D. Yu, A. Acero, A. Mohamed, and G. Hinton, "Binary coding of speech spectrograms using a deep auto-encoder," in Proc. Interspeech, 2010.
-
Proc. Interspeech, 2010
-
-
Deng, L.1
Seltzer, M.2
Yu, D.3
Acero, A.4
Mohamed, A.5
Hinton, G.6
-
24
-
-
84890445451
-
Incoherent training of deep neural networks to de-correlate bottleneck features for speech recognition
-
Y. Bao, H. Jiang, L.-R. Dai, and C. Liu, "Incoherent training of deep neural networks to de-correlate bottleneck features for speech recognition," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2013, pp. 6980-6984.
-
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2013
, pp. 6980-6984
-
-
Bao, Y.1
Jiang, H.2
Dai, L.-R.3
Liu, C.4
-
25
-
-
84911475287
-
A Pipelined Neural Network Architecture For Speech Recognition
-
Book: Norwell, MA, USA: Kluwer
-
D. Zhang, L. Deng, and M. Elmasry, A Pipelined Neural Network Architecture For Speech Recognition, In Book: VLSI Artificial Neural Networks Engineering. Norwell, MA, USA: Kluwer, 1994.
-
(1994)
VLSI Artificial Neural Networks Engineering
-
-
Zhang, D.1
Deng, L.2
Elmasry, M.3
-
26
-
-
0028256706
-
Analysis of correlation structure for a neural predictive model with applications to speech recognition
-
L. Deng, K. Hassanein, and M. Elmasry, "Analysis of correlation structure for a neural predictive model with applications to speech recognition," Neural Netw., vol. 7, no. 2, pp. 331-339, 1994.
-
(1994)
Neural Netw.
, vol.7
, Issue.2
, pp. 331-339
-
-
Deng, L.1
Hassanein, K.2
Elmasry, M.3
-
28
-
-
0013344078
-
Training products of experts by minimizing contrastive divergence
-
G. Hinton, "Training products of experts by minimizing contrastive divergence," Neural Comput., vol. 14, pp. 1771-1800, 2002.
-
(2002)
Neural Comput.
, vol.14
, pp. 1771-1800
-
-
Hinton, G.1
-
29
-
-
84867585919
-
Understanding how deep belief networks perform acoustic modelling
-
A. Mohamed, G. Hinton, and G. Penn, "Understanding how deep belief networks perform acoustic modelling," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2012, pp. 4273-4276.
-
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2012
, pp. 4273-4276
-
-
Mohamed, A.1
Hinton, G.2
Penn, G.3
-
30
-
-
84874282188
-
Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM
-
J. Li, D. Yu, J.-T. Huang, and Y. Gong, " Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM," in Proc. IEEE Spoken Lang. Technol. Workshop (SLT), 2012, pp. 131-136.
-
Proc. IEEE Spoken Lang. Technol. Workshop (SLT), 2012
, pp. 131-136
-
-
Li, J.1
Yu, D.2
Huang, J.-T.3
Gong, Y.4
-
31
-
-
0002263996
-
Convolutional networks for images, speech, and time-series
-
M. A. Arbib, Ed. Cambridge, MA, USA: MIT Press
-
Y. LeCun and Y. Bengio, "Convolutional networks for images, speech, and time-series," in The Handbook of Brain Theory and Neural Networks, M. A. Arbib, Ed. Cambridge, MA, USA: MIT Press, 1995.
-
(1995)
The Handbook of Brain Theory and Neural Networks
-
-
LeCun, Y.1
Bengio, Y.2
-
32
-
-
0019152630
-
Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position
-
K. Fukushima, "Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position," Biol. Cybern., vol. 36, pp. 193-202, 1980.
-
(1980)
Biol. Cybern.
, vol.36
, pp. 193-202
-
-
Fukushima, K.1
-
33
-
-
84863380535
-
Unsupervised feature learning for audio classification using convolutional deep belief networks
-
H. Lee, P. Pham, Y. Largman, and A. Ng, "Unsupervised feature learning for audio classification using convolutional deep belief networks," in Proc. Adv. Neural Inf. Process. Syst. 22, 2009, pp. 1096-1104.
-
(2009)
Proc. Adv. Neural Inf. Process. Syst.
, vol.22
, pp. 1096-1104
-
-
Lee, H.1
Pham, P.2
Largman, Y.3
Ng, A.4
-
34
-
-
85007207023
-
Exploring hierarchical speech representations using a deep convolutional neural network
-
D. Hau and K. Chen, "Exploring hierarchical speech representations using a deep convolutional neural network," in Proc. 11th UK Workshop Comput. Intell. (UKCI '11), Manchester, U.K., 2011.
-
Proc. 11th UK Workshop Comput. Intell. (UKCI '11), Manchester, U.K., 2011
-
-
Hau, D.1
Chen, K.2
-
35
-
-
0024634603
-
Phoneme recognition using time-delay neural networks
-
Mar.
-
A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. Lang, "Phoneme recognition using time-delay neural networks," IEEE Trans. Acoust., Speech, Signal Process., vol. 37, no. 3, pp. 328-339, Mar. 1989.
-
(1989)
IEEE Trans. Acoust., Speech, Signal Process.
, vol.37
, Issue.3
, pp. 328-339
-
-
Waibel, A.1
Hanazawa, T.2
Hinton, G.3
Shikano, K.4
Lang, K.5
-
36
-
-
85083953021
-
Feature learning in deep neural networks - Studies on speech recognition tasks
-
D. Yu, M. L. Seltzer, J. Li, J.-T. Huang, and F. Seide, "Feature learning in deep neural networks - studies on speech recognition tasks," in Proc. Int. Conf. Learn. Represent., 2013.
-
Proc. Int. Conf. Learn. Represent., 2013
-
-
Yu, D.1
Seltzer, M.L.2
Li, J.3
Huang, J.-T.4
Seide, F.5
-
37
-
-
84867605836
-
Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition
-
O. Abdel-Hamid, A. Mohamed, H. Jiang, and G. Penn, "Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Mar. 2012, pp. 4277-4280.
-
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Mar. 2012
, pp. 4277-4280
-
-
Abdel-Hamid, O.1
Mohamed, A.2
Jiang, H.3
Penn, G.4
-
38
-
-
84906214784
-
Exploring convolutional neural network structures and optimization techniques for speech recognition
-
O. Abdel-Hamid, L. Deng, and D. Yu, "Exploring convolutional neural network structures and optimization techniques for speech recognition," in Proc. Interspeech, 2013.
-
Proc. Interspeech, 2013
-
-
Abdel-Hamid, O.1
Deng, L.2
Yu, D.3
-
39
-
-
84890545163
-
A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion
-
L. Deng, O. Abdel-Hamid, and D. Yu, "A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), May 2013, pp. 6669-6673.
-
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), May 2013
, pp. 6669-6673
-
-
Deng, L.1
Abdel-Hamid, O.2
Yu, D.3
-
40
-
-
84890525984
-
Deep convolutional neural networks for LVCSR
-
T. N. Sainath, A.-R. Mohamed, B. Kingsbury, and B. Ramabhadran, "Deep convolutional neural networks for LVCSR," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), May 2013, pp. 8614-8618.
-
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), May 2013
, pp. 8614-8618
-
-
Sainath, T.N.1
Mohamed, A.-R.2
Kingsbury, B.3
Ramabhadran, B.4
-
41
-
-
80051654263
-
Deep belief networks using discriminative features for phone recognition
-
A. Mohamed, T. Sainath, G. Dahl, B. Ramabhadran, G. Hinton, and M. Picheny, "Deep belief networks using discriminative features for phone recognition," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2011, pp. 5060-5063.
-
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2011
, pp. 5060-5063
-
-
Mohamed, A.1
Sainath, T.2
Dahl, G.3
Ramabhadran, B.4
Hinton, G.5
Picheny, M.6
-
42
-
-
84908677215
-
-
Tech. Rep. cs.NE, Feb. arXiv
-
H. Sak, A. Senior, and F. Beaufays, "Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition," Tech. Rep. 14021128v1 [cs.NE], Feb. 2014, arXiv.
-
(2014)
Long Short-term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition
-
-
Sak, H.1
Senior, A.2
Beaufays, F.3
-
43
-
-
84890526837
-
New types of deep neural network learning for speech recognition and related applications: An overview
-
L. Deng, G. Hinton, and B. Kingsbury, "New types of deep neural network learning for speech recognition and related applications: An overview," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2013, pp. 8599-8603.
-
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2013
, pp. 8599-8603
-
-
Deng, L.1
Hinton, G.2
Kingsbury, B.3
-
44
-
-
78049408551
-
Evaluation of pooling operations in convolutional architectures for object recognition
-
Springer-Verlag ser. ICANN'10
-
D. Scherer, A. Müller, and S. Behnke, "Evaluation of pooling operations in convolutional architectures for object recognition," in Proc. 20th Int. Conf. Artif. Neural Netw.: Part III, Berlin/ Heidelberg, Germany, 2010, pp. 92-101, Springer-Verlag ser. ICANN'10.
-
Proc. 20th Int. Conf. Artif. Neural Netw.: Part III, Berlin/ Heidelberg, Germany, 2010
, pp. 92-101
-
-
Scherer, D.1
Müller, A.2
Behnke, S.3
-
45
-
-
71149119164
-
Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations
-
H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng, "Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations," in Proc. 26th Annu. Int. Conf. Mach. Learn., 2009, pp. 609-616.
-
Proc. 26th Annu. Int. Conf. Mach. Learn., 2009
, pp. 609-616
-
-
Lee, H.1
Grosse, R.2
Ranganath, R.3
Ng, A.Y.4
-
46
-
-
84055211743
-
Acoustic modeling using deep belief networks
-
Jan.
-
A. Mohamed, G. Dahl, and G. Hinton, "Acoustic modeling using deep belief networks," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 14-22, Jan. 2012.
-
(2012)
IEEE Trans. Audio, Speech, Lang. Process.
, vol.20
, Issue.1
, pp. 14-22
-
-
Mohamed, A.1
Dahl, G.2
Hinton, G.3
-
47
-
-
0024768209
-
Speaker-independent phone recognition using hidden Markov models
-
Nov.
-
K. F. Lee and H. W. Hon, "Speaker-independent phone recognition using hidden Markov models," IEEE Trans. Audio, Speech, Lang. Process., vol. 37, no. 11, pp. 1641-1648, Nov. 1989.
-
(1989)
IEEE Trans. Audio, Speech, Lang. Process.
, vol.37
, Issue.11
, pp. 1641-1648
-
-
Lee, K.F.1
Hon, H.W.2
|