SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn , Issue , 2014, Pages 1214-1218

Beyond cross-entropy: Towards better frame-level objective functions for deep neural network training in automatic speech recognition

(4) Huang, Zhen a Li, Jinyu b Weng, Chao a Lee, Chin Hui a

a GEORGIA INSTITUTE OF TECHNOLOGY (United States)

b MICROSOFT (United States)

Author keywords

Boosting difficult samples; Cross entropy; Deep neural network; Log posterior ratio

Indexed keywords

CHEMICAL ACTIVATION; CONTINUOUS SPEECH RECOGNITION; NEURAL NETWORKS; SPEECH COMMUNICATION;

AUTOMATIC SPEECH RECOGNITION; CROSS ENTROPY; DEEP NEURAL NETWORKS; LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION; LOG POSTERIOR RATIO; OBJECTIVE FUNCTIONS; POSTERIOR PROBABILITY; TARGET PREDICTION;

ENTROPY;

EID: 84910061470 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (14)

References (27)

1
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
- G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, " IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82-97, 2012.
- (2012) IEEE Signal Processing Magazine , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.E.⁴ Mohamed, A.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.N.¹⁰ Kingsbury, B.¹¹

2
- 84865801985
- Conversational speech transcription using context-dependent deep neural networks
- F. Seide, G. Li, and D. Yu, "Conversational speech transcription using context-dependent deep neural networks, " in Proc. Interspeech, 2011, pp. 437-440.
- (2011) Proc. Interspeech , pp. 437-440
- Seide, F.¹ Li, G.² Yu, D.³

3
- 84858972572
- Making deep belief networks effective for large vocabulary continuous speech recognition
- T. N. Sainath, B. Kingsbury, B. Ramabhadran, P. Fousek, P. Novak, and A. Mohamed, "Making deep belief networks effective for large vocabulary continuous speech recognition, " in Proc. ASRU, 2011, pp. 30-35.
- (2011) Proc. ASRU , pp. 30-35
- Sainath, T.N.¹ Kingsbury, B.² Ramabhadran, B.³ Fousek, P.⁴ Novak, P.⁵ Mohamed, A.⁶

4
- 84055222005
- Contextdependent pre-trained deep neural networks for largevocabulary speech recognition
- G. E. Dahl, D. Yu, L. Deng, and A. Acero, "Contextdependent pre-trained deep neural networks for largevocabulary speech recognition, " IEEE Trans. Audio, Speech, and Language Processing, vol. 20, no. 1, pp. 30- 42, 2012.
- (2012) IEEE Trans. Audio, Speech, and Language Processing , vol.20 , Issue.1 , pp. 30-42
- Dahl, G.E.¹ Yu, D.² Deng, L.³ Acero, A.⁴

5
- 84055211743
- Acoustic modeling using deep belief networks
- A. Mohamed, G. E. Dahl, and G. E. Hinton, "Acoustic modeling using deep belief networks, " IEEE Trans. Audio, Speech, and Language Processing, vol. 20, no. 1, pp. 14- 22, 2012.
- (2012) IEEE Trans. Audio, Speech, and Language Processing , vol.20 , Issue.1 , pp. 14-22
- Mohamed, A.¹ Dahl, G.E.² Hinton, G.E.³

6
- 84878379108
- Scalable minimum bayes risk training of deep neural network acoustic models using distributed hessian-free optimization
- B. Kingsbury, T. N. Sainath, and H. Soltau, "Scalable minimum bayes risk training of deep neural network acoustic models using distributed Hessian-free optimization." in Proc. Interspeech, 2012.
- (2012) Proc. Interspeech
- Kingsbury, B.¹ Sainath, T.N.² Soltau, H.³

7
- 84906274730
- Sequence-discriminative training of deep neural networks
- K. Vesely, A. Ghoshal, L. Burget, and D. Povey, "Sequence-discriminative training of deep neural networks, " in Proc. Interspeech, 2013, pp. 2345-2349.
- (2013) Proc. Interspeech , pp. 2345-2349
- Vesely, K.¹ Ghoshal, A.² Burget, L.³ Povey, D.⁴

8
- 84906225757
- A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR
- Z. Yan, Q. Huo, and J. Xu, "A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR, " in Proc. Interspeech, 2013.
- (2013) Proc. Interspeech
- Yan, Z.¹ Huo, Q.² Xu, J.³

9
- 84951490428
- Review of neural networks for speech recognition
- R. P. Lippmann, "Review of neural networks for speech recognition, " Neural computation, vol. 1, no. 1, pp. 1-38, 1989.
- (1989) Neural Computation , vol.1 , Issue.1 , pp. 1-38
- Lippmann, R.P.¹

10
- 0003573244
- Springer
- H. A. Bourlard and N. Morgan, Connectionist speech recognition: A hybrid approach. Springer, 1994, vol. 247.
- (1994) Connectionist Speech Recognition: A Hybrid Approach , vol.247
- Bourlard, H.A.¹ Morgan, N.²

11
- 0027683813
- Shared-distribution hidden markov models for speech recognition
- M.-Y. Hwang and X. Huang, "Shared-distribution hidden Markov models for speech recognition, " IEEE Trans. Speech and Audio Processing, vol. 1, no. 4, pp. 414-420, 1993.
- (1993) IEEE Trans. Speech and Audio Processing , vol.1 , Issue.4 , pp. 414-420
- Hwang, M.-Y.¹ Huang, X.²

12
- 84906237512
- Investigations on hessianfree optimization for cross-entropy training of deep neural networks
- S. Wiesler, J. Li, and J. Xue, "Investigations on Hessianfree optimization for cross-entropy training of deep neural networks, " in Proc. Interspeech, 2013.
- (2013) Proc. Interspeech
- Wiesler, S.¹ Li, J.² Xue, J.³

13
- 84893676344
- Rectifier nonlinearities improve neural network acoustic models
- A. L. Maas, A. Y. Hannun, and A. Y. Ng, "Rectifier nonlinearities improve neural network acoustic models, " in Proc. ICML, 2013.
- (2013) Proc. ICML
- Maas, A.L.¹ Hannun, A.Y.² Ng, A.Y.³

14
- 84893651518
- Deep maxout neural networks for speech recognition
- M. Cai, Y. Shi, and J. Liu, "Deep maxout neural networks for speech recognition, " in Proc. ASRU, 2013, pp. 291- 296.
- (2013) Proc. ASRU , pp. 291-296
- Cai, M.¹ Shi, Y.² Liu, J.³

15
- 84905270524
- Investigation of maxout networks for speech recognition
- P. Swietojanski, J. Li, and J. T. Huang, "Investigation of maxout networks for speech recognition, " in Proc. ICASSP, 2014.
- (2014) Proc. ICASSP
- Swietojanski, P.¹ Li, J.² Huang, J.T.³

16
- 84890542079
- Kl-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition
- D. Yu, K. Yao, H. Su, G. Li, and F. Seide, "KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition, " in Proc. ICASSP, 2013, pp. 7893-7897.
- (2013) Proc. ICASSP , pp. 7893-7897
- Yu, D.¹ Yao, K.² Su, H.³ Li, G.⁴ Seide, F.⁵

17
- 84866054643
- MIT Press, Cambridge, MA, USA
- D. E. Rumelhart, G. E. Hinton, and R. J.Williams, Learning representations by back-propagating errors. MIT Press, Cambridge, MA, USA, 1988.
- (1988) Learning Representations by Back-propagating Errors
- Rumelhart, D.E.¹ Hinton, G.E.² Williams, R.J.³

18
- 14344259207
- Solving large scale linear prediction problems using stochastic gradient descent algorithms
- T. Zhang, "Solving large scale linear prediction problems using stochastic gradient descent algorithms, " in Proc. ICML, 2004, pp. 919-926.
- (2004) Proc. ICML , pp. 919-926
- Zhang, T.¹

19
- 80053446822
- Optimal distributed online prediction
- O. Dekel, R. Gilad-Bachrach, O. Shamir, and L. Xiao, "Optimal distributed online prediction, " in Proc. ICML, 2011, pp. 713-720.
- (2011) Proc. ICML , pp. 713-720
- Dekel, O.¹ Gilad-Bachrach, R.² Shamir, O.³ Xiao, L.⁴

20
- 79961226155
- The difficulty of training deep architectures and the effect of unsupervised pre-training
- D. Erhan, P.-A. Manzagol, Y. Bengio, S. Bengio, and P. Vincent, "The difficulty of training deep architectures and the effect of unsupervised pre-training, " in Proc. AISTATS, 2009, pp. 153-160.
- (2009) Proc. AISTATS , pp. 153-160
- Erhan, D.¹ Manzagol, P.-A.² Bengio, Y.³ Bengio, S.⁴ Vincent, P.⁵

21
- 33746600649
- Reducing the dimensionality of data with neural networks
- G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural networks, " Science, vol. 313, no. 5786, pp. 504-507, 2006.
- (2006) Science , vol.313 , Issue.5786 , pp. 504-507
- Hinton, G.E.¹ Salakhutdinov, R.R.²

22
- 0031139839
- Minimum classification error rate methods for speech recognition
- B. H. Juang, W. Chou, and C.-H. Lee, "Minimum classification error rate methods for speech recognition, " IEEE Trans. Speech and Audio Processing, vol. 5, no. 3, pp. 257-265, 1997.
- (1997) IEEE Trans. Speech and Audio Processing , vol.5 , Issue.3 , pp. 257-265
- Juang, B.H.¹ Chou, W.² Lee, C.-H.³

23
- 34547506259
- Soft margin estimation of hidden Markov model parameters
- J. Li, M. Yuan, and C.-H. Lee, "Soft margin estimation of hidden Markov model parameters." in Proc. Interspeech, 2006.
- (2006) Proc. Interspeech
- Li, J.¹ Yuan, M.² Lee, C.-H.³

24
- 64149098818
- Approximate test risk bound minimization through soft margin estimation
- J. Li, M. Yuan, and C.-H. Lee, "Approximate test risk bound minimization through soft margin estimation, " IEEE Trans. Audio, Speech, and Language Processing, vol. 15, no. 8, pp. 2393-2404, 2007.
- (2007) IEEE Trans. Audio, Speech, and Language Processing , vol.15 , Issue.8 , pp. 2393-2404
- Li, J.¹ Yuan, M.² Lee, C.-H.³

25
- 84858953642
- The kaldi speech recognition toolkit
- D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely, "The Kaldi speech recognition toolkit, " in Proc. ASRU, 2011.
- (2011) Proc. ASRU
- Povey, D.¹ Ghoshal, A.² Boulianne, G.³ Burget, L.⁴ Glembek, O.⁵ Goel, N.⁶ Hannemann, M.⁷ Motlicek, P.⁸ Qian, Y.⁹ Schwarz, P.¹⁰ Silovsky, J.¹¹ Stemmer, G.¹² Vesely, K.¹³

26
- 84859053384
- Switchboard-1 release 2
- Philadelphia
- J. J. Godfrey and E. Holliman, "Switchboard-1 release 2, " Linguistic Data Consortium, Philadelphia, 1997.
- (1997) Linguistic Data Consortium
- Godfrey, J.J.¹ Holliman, E.²

27
- 84910084579
- 2000 Nist evaluation of conversational speech recognition over the telephone: English and mandarin performance results
- J. Fiscus, W. M. Fisher, A. F. Martin, M. A. Przybocki, and D. S. Pallett, "2000 NIST evaluation of conversational speech recognition over the telephone: English and mandarin performance results, " in Proc. Speech Transcription Workshop, 2000.
- (2000) Proc. Speech Transcription Workshop
- Fiscus, J.¹ Fisher, W.M.² Martin, A.F.³ Przybocki, M.A.⁴ Pallett, D.S.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.